In brief
- ARC-AGI-3 exposes a massive gap between AGI claims and reality, with top AI models scoring below 1% while humans achieve perfect performance.
- The benchmark tests true generalization—requiring agents to explore, plan, and learn from scratch in unknown environments rather than recall…
Read Full Article at Source