The Scaling Era

GPT-3, scaling laws, and the emergence of capabilities.

GPT-3

OpenAI’s 175-billion-parameter language model demonstrated that massive scale unlocks in-context learning, allowing a single model to perform diverse tasks from just a few examples in the prompt.

Kaplan Scaling Laws

Kaplan et al. discovered that language model loss follows smooth power-law relationships with model size, dataset size, and compute, providing a quantitative roadmap for building ever-larger models.

Chinchilla and Compute-Optimal Training

DeepMind’s Chinchilla paper overturned the prevailing wisdom on model scaling, proving that a 70B model trained on 1.4 trillion tokens could beat models 2-8x its size by simply using more training data.

PaLM

Google’s 540-billion-parameter Pathways Language Model demonstrated that a single dense Transformer, trained across 6,144 TPU v4 chips, could achieve breakthrough performance on reasoning, code, and multilingual tasks simultaneously.

Codex and Code Generation

OpenAI’s Codex, a GPT-3 model fine-tuned on 54 million GitHub repositories, proved that language models could write functional code and launched the AI-assisted programming revolution through GitHub Copilot.

Emergent Abilities of Large Language Models

Certain capabilities — like few-shot arithmetic, chain-of-thought reasoning, and word unscrambling — appear to emerge unpredictably at specific model scales, sparking a fierce debate about whether these phase transitions are real or artifacts of how we measure.

LaMDA and Conversational AI

Google’s 137-billion-parameter dialogue model, trained on 1.56 trillion words of conversation data and optimized for safety, factual grounding, and conversational quality, became unexpectedly famous when a Google engineer claimed it was sentient.

The Scaling Hypothesis Debate

The contested idea that intelligence is an emergent property of sufficient scale — that making models bigger and training them on more data will eventually produce general intelligence — became the defining intellectual debate of the LLM era.