This site demonstrates one possible use of this domain. For acquisition, partnership, or investment inquiries, please use our contact link. (brainmatter.com)

Artificial General Intelligence

Scaling Laws for Neural Language Models

Kaplan et al. · 2020 · OpenAI Technical Report

Showed that LLM performance follows smooth, predictable power-law relationships with compute, data, and parameters.

Research objective

Characterize how language-model loss scales with model size, dataset size, and compute budget.

Methodology

Trained Transformer language models spanning 7 orders of magnitude in size and compute, measuring cross-entropy loss on held-out data.

Key findings

  • Loss scales as a power law in parameters, data, and compute when not bottlenecked.
  • Larger models are more sample-efficient than smaller ones.
  • Optimal allocation of compute can be predicted in advance.

Strengths

  • Empirical, reproducible, and actionable for capacity planning.
  • Catalyzed the strategic decision to invest in larger and larger models.

Limitations

  • Later refined by Chinchilla (2022), which showed Kaplan undertrained on data.
  • Power laws describe trends, not capability emergence.

Practical implications

  • Motivated the era of frontier model scaling.
  • Set the framework that AGI labs use to forecast capabilities.

Related entities

Related research