Neural Network Foundations
Perceptrons, backpropagation, and deep learning basics.
Activation Functions
Nonlinear transforms between layers – ReLU, sigmoid, tanh, and why the choice matters for gradient flow and expressivity.
Backpropagation
Computing gradients layer by layer via the chain rule – the algorithm that makes deep learning computationally feasible.
Batch Normalization
Normalizing layer inputs within each mini-batch – stabilizing training, enabling higher learning rates, and acting as regularization.
Dropout and Regularization
Randomly zeroing activations during training – an implicit ensemble that prevents co-adaptation of neurons.
Optimizers
SGD, momentum, RMSProp, Adam, and AdamW – adaptive methods that navigate loss landscapes faster than vanilla gradient descent.
Perceptrons and Multilayer Networks
From single linear classifiers to universal function approximators – stacking layers creates representational power.
Universal Approximation Theorem
A single hidden layer with enough neurons can approximate any continuous function – but finding those weights is the hard part.
Weight Initialization
Xavier, He, and orthogonal initialization – breaking symmetry and controlling signal magnitude at the start of training.