GPT-5.5

The Specs
Model: GPT-5.5 (gpt-5.5 on the OpenAI API once it rolls out, plus gpt-5.5-pro). Ships in three consumer surfaces: default GPT-5.5, GPT-5.5 Thinking, and GPT-5.5 Pro. API reasoning effort levels: xhigh, high, medium, low, non-reasoning.
Model type: Text + vision multimodal (same text/image input stack as the GPT-5 family, with computer-use screen reading in Codex). No native image, audio, or video output.
Ship date: April 23, 2026 (ChatGPT and Codex rollout; API “very soon”)
Maker: OpenAI
Pricing: Included in ChatGPT Plus ($20/mo), Pro ($200/mo), Business, and Enterprise. API pricing announced but not live: $5 / $30 per million input / output tokens for gpt-5.5, a 2x jump from GPT-5.4’s $2.50 / $15. gpt-5.5-pro comes in at $30 / $180 per million, unchanged from GPT-5.4 Pro. Fast mode in Codex runs 1.5x faster for 2.5x the cost. Batch and Flex at half the standard API rate, Priority at 2.5x.
Available on: ChatGPT for Plus, Pro, Business, and Enterprise users (GPT-5.5 Thinking for all paid tiers, GPT-5.5 Pro limited to Pro / Business / Enterprise). Codex in the CLI, IDE extensions, and the web product, across Plus, Pro, Business, Enterprise, Edu, and Go plans with a 400K context window. API (Responses and Chat Completions) “coming very soon” at the pricing above with a 1M context window.
Headline benchmarks: Terminal-Bench 2.0 at 82.7% (Opus 4.7: 69.4%, Gemini 3.1 Pro: 68.5%). SWE-Bench Pro at 58.6% (Opus 4.7 still leads at 64.3%). OpenAI’s internal Expert-SWE eval, where tasks have a 20-hour median human completion time, at 73.1% (up from GPT-5.4’s 68.5%). GDPval wins-or-ties at 84.9% (Opus 4.7: 80.3%, Gemini 3.1 Pro: 67.3%). OSWorld-Verified at 78.7% (narrowly edges Opus 4.7’s 78.0%). FrontierMath Tier 4 at 35.4% (Opus 4.7: 22.9%, Gemini 3.1 Pro: 16.7%). CyberGym at 81.8% (Opus 4.7: 73.1%, Anthropic’s Claude Mythos: 83.1%). Tau2-Bench Telecom at 98.0% without prompt tuning. Artificial Analysis has GPT-5.5 (xhigh) taking the #1 spot on their Intelligence Index by 3 points, breaking a three-way tie with Opus and Gemini 3.1 Pro. AA-Omniscience accuracy 57% (highest ever recorded), hallucination rate 86% (vs Opus 4.7 at 36%, Gemini 3.1 Pro Preview at 50%).
More details: Introducing GPT-5.5 (OpenAI)
What shipped
OpenAI released GPT-5.5 on Thursday morning and pitched it as “a new class of intelligence for real work” plus the next step toward the long-teased Altman / Brockman “superapp.” The model rolls out to Plus, Pro, Business, and Enterprise users in ChatGPT and Codex, with GPT-5.5 Pro layered on top for the Pro / Business / Enterprise tiers. The API is not live at launch. OpenAI says it will follow “very soon” at $5 / $30 per million input / output tokens and a 1M context window, double GPT-5.4’s per-token price. Greg Brockman framed the release on the press call as “a real step forward towards the kind of computing that we expect in the future,” and chief scientist Jakub Pachocki said the last two years of model progress have been “surprisingly slow.” They are not subtle about the vibe.
The pitch the benchmark card is built to support: more intelligence at the same latency, ~40% fewer output tokens per Codex task, state-of-the-art performance on most agentic coding and knowledge-work evals. Terminal-Bench 2.0 at 82.7% is a decisive lead on Opus 4.7 and Gemini 3.1 Pro. Expert-SWE at 73.1% introduces a brand-new OpenAI internal benchmark built around 20-hour coding tasks, which matches the workload Codex users have been running for months. GDPval across 44 occupations at 84.9% maps to real-world knowledge work, and OSWorld-Verified at 78.7% is the first time OpenAI’s mainline model has nudged ahead of Anthropic on full desktop computer use. The visible catches: Opus 4.7 still leads SWE-Bench Pro by 5.7 points (64.3% vs 58.6%), Gemini 3.1 Pro still edges BrowseComp at 85.9% to 84.4%, and Artificial Analysis flagged a hallucination rate of 86% on their independent AA-Omniscience eval, against Opus 4.7 at 36% and Gemini 3.1 Pro Preview at 50%. The model knows more than anything else tested, and it’ll confidently answer a question it doesn’t know the answer to at nearly two and a half times the rate of the best competitor.
What’s new
GPT-5.5 reuses the GPT-5 family architecture (router across variants, xhigh to non-reasoning effort levels) and reads more like a post-training plus inference upgrade than a new base model. That said, it ships four capabilities that change how it feels to use, not just how it benchmarks.