Opus 4.7 releases as OpenAI prepares "Spud" model for a direct response

Opus 4.7 releases as OpenAI prepares "Spud" model for a direct response
Opus 4.7 releases as OpenAI prepares "Spud" model for a direct response

🤔 “Is Claude Code better than Codex? What even is Codex?” Share Handy AI with your coworkers and friends to help them understand the crazy world of modern artificial intelligence (and save you some time).

Share Handy AI

last week’s top stories

🤖 Anthropic ships Claude Opus 4.7. Anthropic released Claude Opus 4.7, targeting hard software engineering and long-running agentic work. SWE-bench Verified climbed to 87.6%, CursorBench hit 70%, and XBOW visual acuity jumped from 54.5% to 98.5%. The real downsides: a tokenizer change that burns up to 35% more tokens per task (that is a price increase, regardless of how Anthropic frames it), a significant MRCR regression for long-document retrieval, and deliberately reduced cybersecurity capabilities under Project Glasswing.

🤫 Spud pretraining is done, and it’s gunning for Mythos. A leaked internal OpenAI memo confirms the model codenamed Spud completed pretraining around March 24, with Sora’s compute budget redirected to it. OpenAI is positioning it as a direct counter to Anthropic’s Mythos. What it ships as, GPT-5.5 or GPT-6, is still unconfirmed. Polymarket puts 78% odds on a release before April 30. Read more

🏛️ Dario met with the White House, and they want Mythos access. Anthropic CEO Dario Amodei met with White House Chief of Staff Susie Wiles and Treasury Secretary Scott Bessent on Friday in what Axios called “peace talks,” after the Trump administration severed ties with Anthropic last month. The Pentagon had designated Anthropic a supply chain risk after the company refused terms allowing the military to use Claude for autonomous weapons and mass surveillance. White House officials asked specifically about access to Mythos, Anthropic’s next flagship model that hasn’t shipped yet. Read more

💻 OpenAI turns Codex Desktop into a superapp. OpenAI updated Codex Desktop with computer use (it can now see, click, and type in any app on your Mac with its own cursor), an in-app browser, image generation via gpt-image-1.5, and 90+ new plugins covering Atlassian, GitLab, Microsoft Suite, and more. Multiple agents can run in parallel in the background without disrupting your own work. OpenAI’s internal framing is a unified desktop superapp merging ChatGPT, Codex, and its Atlas browser into one surface. Read more

🖥️ Claude Code Desktop gets a complete rebuild. Anthropic released a redesigned Claude Code desktop app, built around parallel sessions in a single window. A new sidebar manages multiple Claude Code tasks side by side, with an integrated terminal, in-app file editor, rebuilt diff viewer, and expanded preview pane for HTML, PDFs, and local app servers. Three view modes (Verbose, Normal, Summary) let you dial from full transparency on tool calls down to results only. Read more

🎨 Anthropic launches Claude Design. Anthropic released Claude Design on, an Anthropic Labs product that lets you create prototypes, slides, one-pagers, and other visual work in collaboration with Claude Opus 4.7. It can read a company’s codebase and design files to apply that team’s design system to everything it produces, with exports to PDF, URL, PPTX, or Canva. Available in research preview for Pro, Max, Team, and Enterprise subscribers. Figma’s stock fell 7% on the announcement. Read more


🧪 AI Research of the Week

QCalEval: Benchmarking Vision-Language Models for Quantum Processor Calibration
From the NVIDIA Quantum Research team

Jake’s Take: Calibrating a quantum processor typically means compensating for hardware flaws and environmental noise. Today, physicists spend hours reading spectroscopy plots and pulse response charts to retune each qubit by hand.

The evaluation method proposed in this paper, QCalEval, puts vision-language models through the same exercise and scores them against human experts. NVIDIA built the benchmark to back up Ising Calibration (their 35B open-source vision model from that released this month), which cleared Gemini 3.1 Pro, Claude Opus 4.6, and GPT-5.4 on every task.

Self-serving? Sure (that’s Nvidia’s secret sauce these days). But quantum hardware will never scale if every new chip release ties up a physicist for weeks, and vision-language models are the candidate to fix that. QCalEval separates the models reading the diagnostics from the ones pattern-matching on lab imagery.


and then, even more news…

🧬 OpenAI ships GPT-Rosalind for life sciences. GPT-Rosalind, OpenAI’s first domain-specialized frontier model, launched with access limited to vetted enterprise partners including Amgen, Moderna, and Thermo Fisher. Best-of-ten RNA sequence predictions ranked above the 95th percentile of human experts on unpublished sequences. The access gate is tighter than anything OpenAI has shipped before: trusted institutions, secure environments, approved-user lists. The model is named for Rosalind Franklin, whose X-ray diffraction data made the double helix possible and who never received the Nobel Prize for it.

Read more