Model Drop

GPT Image 2

April 21, 2026

View on Substack

Today: GPT Image 2, which OpenAI just shipped into ChatGPT and the API.

Model ID: gpt-image-2

Max Resolution: 2K standard, 4K beta via API

Aspect Ratios: 3:1 (ultra-wide) to 1:3 (ultra-tall)

Pricing: $8 / $30 per million image input / output tokens, or roughly $0.006–$0.211 per image

Modes: Instant and Thinking

Knowledge Cutoff: December 2025

Available on: ChatGPT (all tiers), Codex app, API in early May

What moved

Headline numbers worth noting:

~99% text rendering accuracy (up from ~90-95% on GPT Image 1.5)
Generation speed roughly 2x faster
Up to 8 consistent images per prompt in Thinking mode
First OpenAI image model with integrated reasoning and real-time web search

The framing from OpenAI is that images are “a language, not decoration.” The model reasons through layout before rendering. For marketers, designers, and anyone producing content at scale, this is an upgrade that moves AI image generation from novelty into production infrastructure with legible text and more intelligent prompt adherence.

Screenshots generated with the model are near indistinguishable from real ones

Partner and early-tester reactions point in the same direction. VentureBeat said the outputs exceeded Google’s Nano Banana 2 in UI and screenshot fidelity. The Decoder called it a “breakthrough” on par with Nano Banana Pro’s core thinking capability. Text rendering, the longest-running failure mode in AI image generation, is the thing everyone actually noticed first.

What changed under the hood

New architecture. GPT Image 2 is not built on GPT-4o’s image pipeline. Research Lead Boyuan Chen called it a “generalist model” or “GPT for images;” a standalone system designed from scratch. Community testers watching the April 4 LM Arena leak (codenames: maskingtape-alpha, gaffertape-alpha, packingtape-alpha) flagged a likely shift from two-stage to single-pass inference.
Reasoning integration. Thinking mode searches the web, transforms uploaded documents into visual explainers, verifies outputs, and plans layout before rendering the first pixel. The result is images that reflect intent rather than literal prompt parsing.
World knowledge. Training skewed heavily toward real-world references: actual UI screenshots, storefronts, interface layouts, public figures. Prompts like “average engineer’s screen” produce believable monitors instead of generic keyword collages.
Provenance baked in. C2PA metadata and next-gen watermarking are embedded by default. This makes a defensible paper trail for enterprise use (though OpenAI acknowledges metadata is not a silver bullet).

New settings

Instant Mode: Fast, standard quality. Default for everyone.
Thinking Mode: Reasoning, web search, up to 8 consistent images. Plus, Pro, Business, and Enterprise only.
Interactive Editing: Refine through conversation. Context retained across edits.
Flexible Aspect Ratios: 3:1 to 1:3, specified in prompt or preset.
Multi-Image Generation: Up to 8 per prompt, with character and object continuity.

First impressions

Launch-day reception skewed positive across the tech press, with specific praise for text rendering and compositional complexity.

The positive

Carl Franzen at VentureBeat got early access and ran it on hard cases:

“ChatGPT Images 2.0 is the first image model from OpenAI and one of only two (Nano Banana 2 being the other) that can seemingly accurately reproduce a map of the extent of the Aztec, Maya, and Inca empires at their respective heights along with a fully legible legend.”

He stated its “seemingly flawlessly” on maps, slides, infographics, and manga.

Matthias Bastian at The Decoder called it

“…a breakthrough that could fundamentally reshape graphic generation…”

and flagged something concrete: Image 2 passes their long-standing benchmark prompt (a hyperrealistic DSLR photo of a horse riding an astronaut as a spacesuit saddle) on both Instant and Thinking modes, with Thinking nailing the DSLR look. Competitors have failed this for years.

The model excels at advertisement generation

Amanda Silberling at TechCrunch made the practical case. When she asked for a Mexican restaurant menu, she got something

“…that could immediately be used in a restaurant without customers noticing that something’s off.”

(Two years ago, DALL-E 3 couldn’t spell enchilada.)

The negative

David Gewirtz at ZDNET got early access and documented a persistent weaknes: the model could not accurately reproduce the ZDNET logo across multiple attempts.

“On its first try, it rendered the Z in ZDNET with a slight droop.”

Across a second session it dug up a pre-2022 logo that does not appear on the current homepage. Brand fidelity is the easiest thing to fail publicly on, and Image 2 fails it.

Ece Yildirim at Gizmodo ran the launch-day framing back on OpenAI by borrowing the company’s own analogy:

“If we think of Dall-e as cave drawings, and Images 1.0 as ancient art, then Images 2.0 is the Renaissance.”

But she claims it’s a renaissance of smarter, more precise slop. She also pulled a sharper receipt from the Arena-leak images OpenAI confirmed during the livestream. The world map demo includes made-up countries (”Ciger,” “Mharee”) and relocates Nairobi into Saudi Arabia.

The OpenAI developer community, on the structural complaint: Thinking mode, web search, and the features that actually make Image 2 a 2 are locked behind Plus, Pro, and Business. Free users get a better default model, not a better experience.

Jake’s take

The same features that make GPT Image 2 a real production tool make it the best disinformation engine ever shipped. Fake UI screenshots. Fake news article layouts. Fake social posts with real timestamps and real-looking avatars. Fake Bloomberg terminal screens, fake leaked emails, fake court filings, fake receipts, fake Slack threads, fake campaign flyers in the voter’s native script. Every one of those is dense text laid over a known visual vocabulary, which is the exact workload OpenAI optimized for. It is not an accident that the model is good at screenshots and signage. It’s the whole point.

The Decoder already surfaced a leaked example during testing: a fake screenshot of Satya Nadella cheerfully showing off a chart that claims Google Chrome is downloaded most often through Microsoft Edge. Trivially believable at a glance on an X feed.

Multiply that one image by every political operative, every pump-and-dump, every harassment campaign, every state media operation, every bored teenager with a grudge. OpenAI’s Adele Li claims that ChatGPT has safeguards that other platforms don’t, which, sure (and btw the model is now in the API).

OpenAI’s answer is C2PA metadata and watermarking. C2PA strips the moment you screenshot a generation, crop it, or upload it to any platform that recompresses the file. Li said in the same briefing that metadata “is not a silver bullet.” She’s right. That was also the argument against shipping. They shipped anyway.

GPT Image 2 is SOTA, and if you’re shipping marketing or product work, you’re going to ship faster and cleaner with it. But prepare for your feed, your inbox, and your family group chat to be unrecognizable by the end of the year.

The model is an excellent problem.