Ask HN: How to learn AI from first principles?

145 points · HardikVala · 22 days ago

A variant of this question seems to get asked every 6 mo. but so far, I haven't seen this question tackled directly: If I want to learn the concepts and fundamentals of AI from first principles, what educational resources should I use?

I'm not interested in hands-on guides (eg. how to train a DNN classifier in TensorFlow) or LLM-centric resources.

So far, I've put together the following curriculum:

1 Artificial Intelligence: A Modern Approach (https://aima.cs.berkeley.edu/) - Great for learning the breadth of foundational concepts, eg. local search algorithms, building up to modern AI.

2 Probabilistic Machine Learning: An Introduction (https://probml.github.io/pml-book/book1.html) - Going more in-depth into ML.

3 Dive into Deep Learning (https://d2l.ai/) - Going deep into DL, including contemporary ideas like Transformers and Diffusion models.

4. Neural networks and Deep Learning (http://neuralnetworksanddeeplearning.com/) could also be a great resource but the content probably overlaps significantly with 3.

Would anybody add/update/remove anything? (Don't have to limit recommendations to textbooks. Also open to courses, papers, etc.)

Sorry for the semi-redundant post.


56 comments
noduerme · 21 days ago
The following is not a take that will get you a job or teach you precisely how LLMs work, because you can look that up yourself. However, it may inspire you and you may create something that has a better-than-lottery-ticket chance of being an improvement over the AI status quo:

Without reading about how it's done now, just think about how you think a neural network should function. It ostensibly has input, output, and something in the middle. Maybe its input is a 64x64 pixel handwritten character, and its output is a unicode number. In between the input pixels (a 64x64 array) and the output, are a bunch of neurons. Layers of neurons. That talk to each other and learn or un-learn (are rewarded or punished).

Build that. Build a cube where one side is a pixel grid and the other side delivers a number. Decide how the neurons influence each other and how they train their weights to deliver the result at the other end. However you think it should go. Just raw code it with arrays in whatever dimensions you want and make it work; you can do it in Javascript or BASIC. link them however you want. Don't worry about performance, because you can assume that whatever marginally works can be tested on a massive scale and show "impressive" results.

Show replies

InkCanon · 22 days ago
The question depends what you mean by first principles. Usage of the phrase "first principles" has sprawled into many different things since (I think) Musk first mentioned it as a way to learn. The original, philosophical meaning of first principles meant a fundamental truth which could be used to derive others. Much of the philosophising of thinkers like Aristotle or Descartes was to uncover these truths (eg I think, therefore I am). In physics and other sciences, it means calculations using established laws, rather than approximations or assumptions. Then it got borrowed into certain circles of the tech crowd with the vague meaning of thinking about what's important or true and ignoring the rest. Then it trickled down into the learning/self help world as a hack of some sort to learn. If we take the original meaning of first principles, there aren't a great deal of absolute truths in machine learning. It is a very empirical, approximated and engineering oriented endeavor. Most of the research involves thinking of a new approach, building it and trying it on new datasets.

The other big question is why you want to learn it. If you want to learn ML in itself, than anything including the search algorithms (which used to be considered core to ML a long time ago) you mentioned is part of that. But if you want to learn ML to contribute to modern developments like LLMs, then search algorithms are virtually useless. If you aren't going to be engineering any ML or ML products, what you want is to gain some insight into it's future and the business of it. So learning things like transformer architecture is going to be far more unhelpful than say, reading about the economics of compute clusters.

Given the empirical/engineering quality of current ML, I'd say building it from scratch is really good for getting the handful of possible first principles (the fundamental functions involved, data cleaning, training, etc)

Show replies

CamperBob2 · 21 days ago
Watch Karpathy's 'Zero to Hero' videos on YouTube.

If you want a historical perspective, which is very worthwhile, start by reading about the mid-century work of McCullough and Pitts, and Minsky, Papert and their colleagues at MIT CSAIL after that.

There will be a dry spell after Minsky and Papert because of their conclusion that the OG neural-network topology that everyone was familiar with, the so-called "perceptron", was a dead end. That conclusion was premature to say the least, but in any event the hardware and training techniques weren't available to support any serious progress.

Adding hidden layers and nonlinear activation functions to the perceptron network seemed promising, in that they worked around some of Minsky's technical objections. The multi-layer perceptron was now a "universal approximator" capable of modeling any linear or nonlinear function. In retrospect that should have been considered a bigger deal than it was, but the MLP was still a PIA to train, and it didn't seem very useful at the scales achievable in hardware at the time. Anything a neural net could do, specialized code could usually do better and cheaper.

Then, in the circa-2010 timeframe, AlexNet dusted off some of the older ideas and used them to win image-recognition benchmark competitions, not by a small margin but by blowing everybody else into the weeds. That brought the multi-layer perceptron back into vogue, and almost everything that has happened since can be traced back to that work.

The Karpathy videos are the best intro to the MLP concept I've run across. Understanding the MLP is the key prereq if you want to understand current-gen AI from first principles.

grepLeigh · 20 days ago
As a learning exercise, I enjoyed Neural Networks From Scratch: https://nnfs.io/

There's also a world of statistics and machine learning outside of deep learning. I think the best way to get started on that end is an undergrad survey course like CS189: https://people.eecs.berkeley.edu/~jrs/189/

Show replies

jmholla · 20 days ago
It isn't first principles, but I would recommend 3blue1brown's ongoing series about neural networks [0]. I think there's a benefit to seeing the high level overview helps understand the purpose of the pieces as your learning them; it can help with motivation. Or watching overviews like this after the fact it can help bridge connections theory may not elucidate.

[0]: https://www.3blue1brown.com/topics/neural-networks

Show replies