Happy new year everyone!
In this most recent post, we will establish a mental model for how a neural network works by building one from scratch! In a future issue we will do napkin math on performance, as establishing the first-principle understanding is plenty of ground to cover for today.
Neural nets are increasingly dominating the field of machine learning / artificial intelligence: the most sophisticated models for computer vision (e.g. CLIP), natural language processing (e.g. GPT-3), translation (e.g. Google Translate), and more are based on neural nets. When these artificial neural nets reach some arbitrary threshold of neurons, we call it deep learning.
A visceral example of Deep Learning’s unreasonable effectiveness comes from an interview with Jeff Dean who leads AI at Google. He explains how 500 lines of Tensorflow outperformed the previous ~500,000 lines of code for Google Translate’s extremely complicated model. Blew my mind.