Design of a Kolmogorov-Arnold Network Hardware Accelerator

Mammadzada, Fuad

Design of a Kolmogorov-Arnold Network Hardware Accelerator

Mark

Mammadzada, Fuad ^LU (2025) EITM02 20251
Department of Electrical and Information Technology

Abstract: The exponential growth of Big Data, the Internet of Things (IoT), and Large Language Models (LLMs) has significantly increased the computational and energy demands of modern computing systems. In response, research has increasingly focused on alternative machine learning paradigms and hardware architectures that reduce model complexity and computational load. One such paradigm is the Kolmogorov-Arnold Network (KAN), a novel neural architecture that replaces conventional trainable weights with B-spline-based activation functions.

This thesis presents one of the first comprehensive hardware implementations of KAN inference. To maximize performance, several algorithmic optimizations are introduced, including quantization techniques for... (More); The exponential growth of Big Data, the Internet of Things (IoT), and Large Language Models (LLMs) has significantly increased the computational and energy demands of modern computing systems. In response, research has increasingly focused on alternative machine learning paradigms and hardware architectures that reduce model complexity and computational load. One such paradigm is the Kolmogorov-Arnold Network (KAN), a novel neural architecture that replaces conventional trainable weights with B-spline-based activation functions.

This thesis presents one of the first comprehensive hardware implementations of KAN inference. To maximize performance, several algorithmic optimizations are introduced, including quantization techniques for static grid B-splines and simplified Look-Up Tables (LUTs) for basis function evaluation. An inference algorithm is developed to exploit the inherent sparsity of B-spline activations through dynamic coefficient bypassing, substantially reducing both memory bandwidth requirements and the number of operations.

On the hardware side, the design features a dedicated Comparator Chain for grid interval detection, a memory layout optimized for sequential coefficient access, and a Processing Element (PE) architecture for first-order B-spline evaluation. The full system is implemented as a custom accelerator integrated with a RISC MicroBlaze processor on a Xilinx Artix-7 FPGA. Experimental results confirm functional correctness and demonstrate trade-offs in performance, area, and scalability.

This work lays the foundation for future exploration of KAN-based models in resource-constrained hardware, offering a scalable platform for spline-driven machine learning. (Less)
Popular Abstract: Modern Artificial Intelligence (AI) breakthroughs have been spectacular. For example, DeepMind’s AlphaFold can predict protein structures, and large language models like ChatGPT can write essays - but these advances come at a cost. Training GPT-3 reportedly consumed 1287 megawatt-hours of electricity (about the annual usage of 120 homes). Even running such models on demand uses lots of power: a single ChatGPT query has about 5× the energy footprint of a simple Google search. In short, today’s AI is powerful but power-hungry.

Architectural innovations have played a crucial role in advancing the performance of state-of-the-art AI models while keeping costs manageable. Different architectures offer unique advantages and are optimized for... (More); Modern Artificial Intelligence (AI) breakthroughs have been spectacular. For example, DeepMind’s AlphaFold can predict protein structures, and large language models like ChatGPT can write essays - but these advances come at a cost. Training GPT-3 reportedly consumed 1287 megawatt-hours of electricity (about the annual usage of 120 homes). Even running such models on demand uses lots of power: a single ChatGPT query has about 5× the energy footprint of a simple Google search. In short, today’s AI is powerful but power-hungry.

Architectural innovations have played a crucial role in advancing the performance of state-of-the-art AI models while keeping costs manageable. Different architectures offer unique advantages and are optimized for specific types of tasks. For instance, the Multi-Layer Perceptron (MLP) - a network of neural layers connected by millions of weights - serves as the backbone for many AI systems, particularly in function approximation and tabular data processing. Convolutional Neural Networks (CNNs) excel at handling spatial data, making them ideal for applications like image classification and video analysis.

Kolmogorov-Arnold Networks (KANs) offer a fresh perspective. The KAN architecture replaces each weight in a traditional MLP with a tunable single-input function along that connection. In other words, every “weight” becomes a smooth mathematical curve that the network learns, rather than a single number.

This work trains KANs on image-classification tasks and designs a custom hardware accelerator to run them, aiming to see whether KANs may compete with traditional neural networks by reducing parameter count and energy cost. The accelerator exploits a unique KAN characteristic - only small segments of each curve are relevant for any given input, enabling it to bypass unnecessary computations, thereby saving processing time and energy. Further optimizations, like using look-up tables, led to a compact design that can theoretically scale in parameter count without increasing the computation cost.

From identifying optimal architectures for KANs in image classification to applying algorithmic optimizations and designing custom hardware capable of running on actual silicon, this work presents one of the first comprehensive investigations into KANs and their efficient implementation. The insights and innovations shared here aspire to drive further advances at the intersection of algorithm and hardware design, ultimately supporting smarter, faster, and more energy-efficient systems. (Less)

Please use this url to cite or link to this publication: http://lup.lub.lu.se/student-papers/record/9203149

author

Mammadzada, Fuad ^LU

supervisor

Joachim Rodrigues ^LU

organization

Department of Electrical and Information Technology

course

EITM02 20251

year

2025

type

H2 - Master's Degree (Two Years)

subject

Technology and Engineering

keywords

machine learning, computer architecture, systemverilog, artificial intelligence, kolmogorov-arnold theorem, kolmogorov-arnold network, low-power, hardware accelerators

report number

LU/LTH-EIT 2025-1079

language

English

id

9203149

date added to LUP

2025-06-24 09:30:56

date last changed

2025-06-24 09:30:56

@misc{9203149,
  abstract     = {{The exponential growth of Big Data, the Internet of Things (IoT), and Large Language Models (LLMs) has significantly increased the computational and energy demands of modern computing systems. In response, research has increasingly focused on alternative machine learning paradigms and hardware architectures that reduce model complexity and computational load. One such paradigm is the Kolmogorov-Arnold Network (KAN), a novel neural architecture that replaces conventional trainable weights with B-spline-based activation functions. 

This thesis presents one of the first comprehensive hardware implementations of KAN inference. To maximize performance, several algorithmic optimizations are introduced, including quantization techniques for static grid B-splines and simplified Look-Up Tables (LUTs) for basis function evaluation. An inference algorithm is developed to exploit the inherent sparsity of B-spline activations through dynamic coefficient bypassing, substantially reducing both memory bandwidth requirements and the number of operations. 

On the hardware side, the design features a dedicated Comparator Chain for grid interval detection, a memory layout optimized for sequential coefficient access, and a Processing Element (PE) architecture for first-order B-spline evaluation. The full system is implemented as a custom accelerator integrated with a RISC MicroBlaze processor on a Xilinx Artix-7 FPGA. Experimental results confirm functional correctness and demonstrate trade-offs in performance, area, and scalability. 

This work lays the foundation for future exploration of KAN-based models in resource-constrained hardware, offering a scalable platform for spline-driven machine learning.}},
  author       = {{Mammadzada, Fuad}},
  language     = {{eng}},
  note         = {{Student Paper}},
  title        = {{Design of a Kolmogorov-Arnold Network Hardware Accelerator}},
  year         = {{2025}},
}

LUP Student Papers

LUND UNIVERSITY LIBRARIES

Design of a Kolmogorov-Arnold Network Hardware Accelerator