
As of early 2025, OpenAI has not officially released GPT-5 or its coding benchmarks. However, based on leaked reports and industry speculation, GPT-5 is expected to achieve approximately 85-90% on HumanEval (compared to GPT-4’s 67%) and 75-80% on CodeContests, representing a significant leap in code generation capabilities.
GPT-5 will likely be evaluated on established coding benchmarks including HumanEval (function-level Python problems), MBPP (Mostly Basic Python Problems), CodeContests (competitive programming), and potentially newer multi-language assessments. These benchmarks test everything from basic syntax to complex algorithmic reasoning. Early insider reports suggest GPT-5 could approach or exceed 85% on HumanEval, a substantial improvement over GPT-4’s 67% baseline score.
The expected improvements center on three areas: better handling of complex multi-file codebases, reduced hallucination in API usage, and stronger debugging capabilities. If leaked benchmarks prove accurate, GPT-5 could represent a 20-30% relative improvement over GPT-4 on coding tasks, with particularly strong gains in languages beyond Python like Rust and C++.
OpenAI has not confirmed a GPT-5 release date. Most industry analysts predict a late 2025 or early 2026 launch, at which point official benchmark results will be published. Until then, all performance claims should be treated as speculative.
Discover more content from our partner network.