This site demonstrates one possible use of this domain. For acquisition, partnership, or investment inquiries, please use our contact link. (brainmatter.com)

AI benchmark · Fluid reasoning

Abstraction and Reasoning Corpus for AGI

Short name: ARC-AGI · Introduced 2019 · François Chollet

Visual puzzle benchmark designed to resist memorization and test fluid abstraction.

What it measures

Generalization to novel tasks from a handful of examples - Chollet's operational definition of intelligence as skill-acquisition efficiency.

Format

Grid-based visual reasoning tasks (input → output transformations). Each task provides 2–5 training pairs and a held-out test grid. Hundreds of tasks, each unique.

Scoring

Exact-match accuracy on test grids. Human performance ~80–85%; pure LLMs historically <10%.

Notable results

  • ARC Prize 2024: o3 (high-compute) reached ~75–87% - first system to approach human level, at extreme cost.
  • Standard frontier LLMs without scaffolding remain in the 20–40% range.

Strengths

  • Specifically designed to be uncontaminable.
  • Forces compositional, program-like reasoning.
  • Strong correlation with human-meaningful intelligence.

Limitations

  • Narrow modality (2D grids).
  • Solutions can be brute-forced with sufficient inference compute.

Related entities

Other AI benchmarks