Kolmogorov-Arnold Networks

568 points · sumo43 · 20 days ago

142 comments

GistNoesis · 20 days ago

I quickly skimmed the paper, got inspired to simplify it, and created some Pytorch Layer :

https://github.com/GistNoesis/FourierKAN/

The core is really just a few lines.

In the paper they use some spline interpolation to represent 1d function that they sum. Their code seemed aimed at smaller sizes. Instead I chose a different representation, aka fourier coefficients that are used to interpolate the functions of individual coordinates.

It should give an idea of Kolmogorov-Arnold networks representation power, it should probably converge easier than their spline version but spline version have less operations.

Of course, if my code doesn't work, it doesn't mean theirs doesn't.

Feel free to experiment and publish paper if you want.

Show replies

krasin · 20 days ago

I've spent some time playing with their Jupyter notebooks. The most useful (to me, anyway) is their Example_3_classfication.ipynb ([1]).

It works as advertised with the parameters selected by the authors, but if we modified the network shape in the second half of the tutorial (Classification formulation) from (2, 2) to (2, 2, 2), it fails to generalize. The training loss gets down to 1e-9, while test loss stays around 3e-1. Getting to larger network sizes does not help either.

I would really like to see a bigger example with many more parameters and more data complexity and if it could be trained at all. MNIST would be a good start.

Update: I increased the training dataset size 100x, and that helps with the overfitting, but now I can't get training loss below 1e-2. Still iterating on it; a GPU acceleration would really help - right now, my progress is limited by the speed of my CPU.

1. https://github.com/KindXiaoming/pykan/blob/master/tutorials/...

Show replies

esafak · 19 days ago

There exists a Kolmogorov-Arnold inspired model in classical statistics called GAMs (https://en.wikipedia.org/wiki/Generalized_additive_model), developed by Hastie and Tibshirani as an extension of GLMs (https://en.wikipedia.org/wiki/Generalized_linear_model).

GLMs in turn generalize logistic-, linear and other popular regression models.

Neural GAMs with learned basis functions have already been proposed, so I'm a bit surprised that the prior art is not mentioned in this new paper. Previous applications focused more on interpretability.

Show replies

montebicyclelo · 20 days ago

The success we're seeing with neural networks is tightly coupled with the ability to scale - the algorithm itself works at scale (more layers), but it also scales well with hardware, (neural nets mostly consist of matrix multiplications, and GPUs have specialised matrix multiplication acceleration) - one of the most impactful neural network papers, AlexNet, was impactful because it showed that NNs could be put on the GPU, scaled and accelerated, to great effect.

It's not clear from the paper how well this algorithm will scale, both in terms of the algorithm itself (does it still train well with more layers?), and ability to make use of hardware acceleration, (e.g. it's not clear to me that the structure, with its per-weight activation functions, can make use of fast matmul acceleration).

It's an interesting idea, that seems to work well and have nice properties on a smaller scale; but whether it's a good architecture for imagenet, LLMs, etc. is not clear at this stage.

Show replies

cs702 · 20 days ago

It's so refreshing to come across new AI research different from the usual "we modified a transformer in this and that way and got slightly better results on this and that benchmark." All those new papers proposing incremental improvements are important, but... everyone is getting a bit tired of them. Also, anecdotal evidence and recent work suggest we're starting to run into fundamental limits inherent to transformers, so we may well need new alternatives.[a]

The best thing about this new work is that it's not an either/or proposition. The proposed "learnable spline interpolations as activation functions" can be used in conventional DNNs, to improve their expressivity. Now we just have to test the stuff to see if it really works better.

Very nice. Thank you for sharing this work here!

---

[a] https://news.ycombinator.com/item?id=40179232

Show replies