Reasoning & Inference Scaling

Chain-of-thought, o1, and inference-time compute.

OpenAI o1: Trained Reasoning

OpenAI o1 was the first model explicitly trained to reason through reinforcement learning on chain-of-thought, proving that thinking longer at inference time could dramatically improve performance on hard problems.

The o-Series Evolution: o1 to o4-mini (and Beyond)

OpenAI’s o-series evolved from o1’s proof-of-concept in reasoning through o3 and o4-mini, achieving dramatic improvements in capability and cost efficiency across five models in eight months, before its reasoning advances were fully absorbed into the GPT-5 line.

DeepSeek-R1: Open Reasoning from Pure RL

DeepSeek-R1 demonstrated that sophisticated reasoning capabilities could emerge from pure reinforcement learning without supervised fine-tuning, matching OpenAI o1 at a fraction of the cost and releasing everything under an open license.

Test-Time Compute Scaling: Thinking Longer Beats Training Bigger

Test-time compute scaling is the paradigm that allocating more computation during inference (letting a model think longer) can be more cost-effective than training a larger model, opening a second axis for improving AI capabilities.

The Reasoning Paradigm Shift

AI reasoning evolved in three phases, from chain-of-thought prompting tricks in 2022, through search-based improvements in 2023, to fully trained reasoning via reinforcement learning in 2024, transforming reasoning from a fragile prompt hack into a robust learned capability.

Hybrid Thinking Models: On-Demand Reasoning

Hybrid thinking models give users the ability to toggle reasoning on and off and set thinking budgets, combining the speed of traditional LLMs with the depth of reasoning models in a single system.