How Do We Measure What AI Can Do?
Evaluating AI systems is like trying to grade a moving target—hard and expensive. These systems are acing tests faster than we can make harder ones, and the tools to measure progress are underfunded. To keep up, organizations worldwide are designing tougher challenges, but AI keeps surprising us, even on the most advanced exams. It's a reminder of just how fast this technology is evolving.