OpenAI Blog•
Why we no longer evaluate SWE-bench Verified
Back to overview
SWE-bench Verified, a key AI coding benchmark, faces serious credibility issues due to data contamination and training leakage. Flawed tests undermine its ability to accurately measure progress in advanced coding technologies. Researchers now recommend transitioning to SWE-bench Pro as a more reliable evaluation standard for assessing AI coding capabilities.
Read full article
0 views