OpenAI Blog

Why we no longer evaluate SWE-bench Verified

Back to overview

SWE-bench Verified, a key AI coding benchmark, faces serious credibility issues due to data contamination and training leakage. Flawed tests undermine its ability to accurately measure progress in advanced coding technologies. Researchers now recommend transitioning to SWE-bench Pro as a more reliable evaluation standard for assessing AI coding capabilities.