This site demonstrates one possible use of this domain. For acquisition, partnership, or investment inquiries, please use our contact link. (brainmatter.com)
Intelligence Benchmarks Center
Benchmarks

Intelligence Benchmarks Center

Reference explainers for how we measure intelligence — in humans, in AI systems, and at the frontier of AGI evaluation.

Key takeaways

  • No single test measures intelligence; benchmarks always measure a slice.
  • Human and AI benchmarks measure different things even when they share names.
  • AGI evaluation is unsolved and remains the most consequential open problem in the field.

What this center covers

Human intelligence metrics, working-memory measures, the broader cognitive-testing landscape, the major AI benchmarks (MMLU, GPQA, ARC, SWE-Bench, Humanity's Last Exam), and why AGI evaluation is harder than any of them.

How to read benchmark scores

Treat any single number as a sample, not a verdict. The construct, the contamination risk, the test-set size, and the population matter as much as the headline accuracy.

Frequently asked questions

Why so many benchmarks?

+

Because intelligence is multi-dimensional. Each benchmark targets a slice; no benchmark covers all of them.

Are AI benchmarks comparable to IQ tests?

+

Only loosely. IQ tests were designed for humans and rely on assumptions about test-taker behaviour and prior exposure that don't apply to AI.

Continue exploring