Sequence Models
RNNs, LSTMs, GRUs, and sequence-to-sequence models.
Attention Mechanism
Attention allows a decoder to dynamically focus on different parts of the encoder’s output at each generation step, replacing the fixed-size bottleneck vector with a weighted combination of all source representations.
Bidirectional RNNs
Bidirectional RNNs process a sequence in both forward and backward directions, producing representations that capture both past and future context at every time step.
Convolutional Models for Text
CNNs applied to NLP use 1D convolutions over word embeddings to detect local n-gram patterns, offering parallelizable computation and strong performance for text classification, though with a limited receptive field compared to recurrent models.
Gated Recurrent Units
GRUs simplify the LSTM gating mechanism by merging the cell state and hidden state into a single vector controlled by two gates, achieving comparable performance with fewer parameters.
Long Short-Term Memory
LSTMs introduce a gated cell state that acts as a controlled information highway, solving the vanishing gradient problem that cripples vanilla RNNs on long sequences.
Recurrent Neural Networks
RNNs process sequences one element at a time, maintaining a hidden state that accumulates information from previous time steps – the first neural architecture designed for sequential data like language.
Sequence-to-Sequence Models
The encoder-decoder architecture maps variable-length input sequences to variable-length output sequences by compressing the input into a fixed-size context vector, then generating the output one token at a time.