The Scaling Era
GPT-3, scaling laws, and the emergence of capabilities.
GPT-3
OpenAI’s 175-billion-parameter language model demonstrated that massive scale unlocks in-context learning, allowing a single model to perform diverse tasks from just a few examples in the prompt.
Kaplan Scaling Laws
Kaplan et al. discovered that language model loss follows smooth power-law relationships with model size, dataset size, and compute, providing a quantitative roadmap for building ever-larger models.
Chinchilla and Compute-Optimal Training
DeepMind’s Chinchilla paper overturned the prevailing wisdom on model scaling, proving that a 70B model trained on 1.4 trillion tokens could beat models 2-8x its size by simply using more training data.
PaLM
Google’s 540-billion-parameter Pathways Language Model demonstrated that a single dense Transformer, trained across 6,144 TPU v4 chips, could achieve breakthrough performance on reasoning, code, and multilingual tasks simultaneously.
Codex and Code Generation
OpenAI’s Codex, a GPT-3 model fine-tuned on 54 million GitHub repositories, proved that language models could write functional code and launched the AI-assisted programming revolution through GitHub Copilot.
Emergent Abilities of Large Language Models
Certain capabilities — like few-shot arithmetic, chain-of-thought reasoning, and word unscrambling — appear to emerge unpredictably at specific model scales, sparking a fierce debate about whether these phase transitions are real or artifacts of how we measure.
LaMDA and Conversational AI
Google’s 137-billion-parameter dialogue model, trained on 1.56 trillion words of conversation data and optimized for safety, factual grounding, and conversational quality, became unexpectedly famous when a Google engineer claimed it was sentient.
The Scaling Hypothesis Debate
The contested idea that intelligence is an emergent property of sufficient scale — that making models bigger and training them on more data will eventually produce general intelligence — became the defining intellectual debate of the LLM era.