This site demonstrates one possible use of this domain. For acquisition, partnership, or investment inquiries, please use our contact link. (brainmatter.com)
Artificial Intelligence — Scaling Laws and Compute
Scaling

Scaling Laws and Compute

Capabilities of modern AI improve predictably with model size, dataset size, and training compute — a finding with deep implications for research, economics, and policy.

10 min read Updated April 9, 2026
By Dr. Ira S. Pastor· Editor-in-ChiefReviewed by BrainMatter Science Review Board

Key facts

  • Kaplan et al. (2020) established neural scaling laws across 7+ orders of magnitude.
  • Chinchilla (2022) revised optimal model/data scaling to ~20 tokens per parameter.
  • Frontier training compute has grown ~4–5x per year since 2010.
  • Frontier 2025 training runs are estimated at ~10^26 FLOPs.
  • Inference-time scaling emerged as a second productive axis in 2024.
  • High-quality public text data is projected to be exhausted between 2026 and 2032.

Kaplan and Chinchilla Laws

Kaplan et al. (OpenAI, 2020) showed that test loss decreases as a smooth power law in compute, parameters, and data — across more than seven orders of magnitude.

Chinchilla (Hoffmann et al., DeepMind, 2022) refined this: for a fixed compute budget, optimal performance requires scaling model size and training tokens together, roughly 20 tokens per parameter. Most pre-Chinchilla models were significantly under-trained.

The Compute Trajectory

Compute used for frontier training has grown roughly 4–5x per year since 2010 — far faster than Moore's Law (~1.4x per year). Epoch AI tracks this trend across hundreds of notable training runs.

Frontier 2025 training runs are estimated at 10^26 FLOPs, costing hundreds of millions of dollars. Single-cluster scale has surpassed 100,000 H100-equivalent GPUs (xAI Colossus, OpenAI/Microsoft Stargate plans).

Inference-Time Scaling

Since 2024, a second scaling axis has emerged: spending more compute at inference time produces better answers on hard problems. OpenAI's o1/o3 and DeepSeek-R1 demonstrate that reasoning-trained models scale predictably with thinking tokens.

This shifts the economic frontier: inference compute now rivals training compute in importance, and per-query cost can vary by orders of magnitude depending on reasoning depth.

Limits to Scaling

Data exhaustion: high-quality public text is finite; Villalobos et al. (Epoch AI, 2024) project depletion of high-quality language data by 2026–2032. Synthetic data and multimodal sources extend the runway.

Energy and chips: a single 100K-GPU cluster draws ~150 MW. US grid expansion and TSMC advanced-node capacity are now binding constraints. Capital availability — frontier labs are raising tens of billions annually — completes the limiting trio.

Frequently asked

Will scaling alone produce AGI?

+

Contested. Some researchers project continued capability gains will produce AGI by scaling alone; others believe new architectural and learning-algorithm ideas will be required. Both camps include serious researchers.

How much compute does GPT-4 use?

+

Public estimates suggest ~2x10^25 FLOPs of training compute, equivalent to tens of thousands of A100/H100 GPUs over months and a total cost in the high tens of millions of dollars.

What is inference-time scaling?

+

Using more compute per query — by sampling many candidates, running explicit chain-of-thought, or tree search — to improve answer quality on hard problems. Now standard in reasoning models.

Sources & further reading

Back to Artificial Intelligence hub