GPT-5.5 verslaat Claude Fable 5 op nieuwe AI-benchmark voor complexe werkprocessen
Back to overview
AISummary generated by AI from the original source
OpenAI's GPT-5.5 has outperformed Anthropic's Claude Fable 5 on Agents' Last Exam, a demanding new benchmark developed by UC Berkeley researchers to evaluate AI systems on complex, long-term professional tasks. The test, created with input from over 300 domain experts, assesses whether artificial intelligence can handle economically valuable workflows that require sustained reasoning and
Read full article
1 views