The 2023 Model Boom

GPT-4, open models, and the explosion of LLM development.

LLaMA 1

Meta AI’s LLaMA proved that smaller models trained on more data could outperform much larger ones, and its leaked weights ignited the open-source AI movement.

The Alpaca Effect

Stanford’s $600 fine-tuning of LLaMA triggered a Cambrian explosion of open-source instruction-tuned models, proving that capable AI assistants could be built on a graduate student budget.

LLaMA 2

Meta’s LLaMA 2 was the first truly commercially licensed open-weight language model, combining 2 trillion tokens of training with extensive RLHF alignment to narrow the gap between open and closed AI.

Mistral 7B

A Paris-based startup released a 7.3-billion-parameter model via a torrent magnet link with no paper and no marketing, and it outperformed every open model twice its size.

Mixtral 8x7B

Mistral AI’s sparse Mixture of Experts model used 46.7 billion total parameters but only 12.9 billion per forward pass, matching LLaMA 2 70B quality at a fraction of the inference cost and proving MoE was practical for the open-source community.

Falcon

The Technology Innovation Institute’s Falcon models proved that exceptional data curation alone — without novel architectures or proprietary text — could produce world-class language models, briefly topping the Hugging Face Open LLM Leaderboard.

Claude 1 and 2

Anthropic’s Claude models brought Constitutional AI from theory to product, establishing the “safety-first” brand in commercial AI and pioneering the long-context paradigm with 100K and eventually 200K token windows.

Gemini 1

Google DeepMind’s Gemini was the first natively multimodal large model — trained from the ground up on text, images, audio, and video — and represented Google’s consolidated answer to GPT-4 after a year of playing catch-up.