Convolutional Neural Networks
CNN architectures, convolution operations, and design principles.
AlexNet
AlexNet won the 2012 ImageNet Large Scale Visual Recognition Challenge with 16.4% top-5 error, demonstrating that deep convolutional neural networks trained on GPUs could dramatically outperform traditional computer vision methods.
Convolution in Neural Networks
A convolution layer slides small learned filters across an input, producing feature maps that detect local patterns through weight sharing and local connectivity.
DenseNet
DenseNet connects every layer to every other layer within a dense block, maximizing feature reuse and achieving strong accuracy with substantially fewer parameters than ResNet.
Depthwise Separable Convolutions
Depthwise separable convolutions factorize a standard convolution into a spatial depthwise convolution and a channel-wise pointwise convolution, reducing computation by 8–9x with minimal accuracy loss.
EfficientNet
EfficientNet uses compound scaling to uniformly scale network depth, width, and resolution with a fixed ratio, achieving state-of-the-art accuracy-efficiency tradeoffs from a neural architecture search baseline (B0) up to B7.
Inception (GoogLeNet)
The Inception architecture uses parallel multi-scale convolution branches within each module and 1 \times 1 convolutions for dimensionality reduction, achieving 6.7% top-5 error on ImageNet with only 6.8 million parameters.
MobileNet
MobileNet is a family of efficient CNN architectures built on depthwise separable convolutions, designed for mobile and embedded deployment with tunable width and resolution multipliers.
Neural Architecture Search
Neural Architecture Search (NAS) automates the design of neural network architectures by searching over a defined space of possible configurations, optimizing for accuracy, latency, or other objectives.
Pooling Layers
Pooling layers reduce the spatial dimensions of feature maps by summarizing local regions, providing translation invariance and computational savings.
Receptive Field
The receptive field of a neuron is the region of the input image that can influence its activation, growing with network depth through successive convolutions and pooling operations.
ResNet
ResNet introduced skip connections that enable identity mappings, allowing successful training of networks up to 152 layers deep and achieving 3.57% top-5 error on ImageNet.
VGGNet
VGGNet demonstrated that network depth with uniform 3 \times 3 convolutions is a critical factor for representation quality, achieving 7.3% top-5 error on ImageNet with the VGG-16 and VGG-19 architectures.