New Gemini from Google and new Composer from Cursor; plus, the Pope wants you to take AI seriously

🤔 “The Pope is doing what now?”
Share Handy AI with your coworkers and friends to help them understand the crazy world of modern artificial intelligence and make the right decisions.
what to know for now
🤖 Google I/O turned into a party for agentic Gemini. Sundar opened the keynote framing the conference as Google’s pivot from chatbots to agents, then spent the next two hours backing it up: an upgraded Antigravity orchestration layer for agent-first development, the seventh-gen TPU 8i powering it, SynthID and Content Credentials shipping into Search and Chrome, and the Build with Gemini XPRIZE Hackathon dangling $2M for whoever puts the new stack to work. Read more
🎯 Andrej Karpathy joins Anthropic’s pre-training team. Karpathy announced the move on May 19, the same day Google I/O kicked off; he’ll launch a new team inside Anthropic focused on using Claude itself to accelerate pre-training research. This is the OpenAI co-founder who ran Tesla Autopilot, returned to OpenAI for a victory lap, left to build an education startup, and has now picked a side in the lab war. Read more
⚡ Gemini 3.5 Flash beats Google’s own 3.1 Pro on coding, agentic, and multimodal benchmarks. The new Flash runs 4x faster on output tokens than rival frontier models, scored ahead of 3.1 Pro on nearly every benchmark Google reported, and is already live across the Gemini app, AI Mode in Search, Antigravity, Gemini API, and Gemini Enterprise. In internal tests it builds an operating system from scratch, end to end, with minimal scaffolding. 3.5 Pro is in internal use and lands next month; the Flash-beats-Pro reversal is the bigger story, because it tells you Google’s gain came from training and scaffolding, not parameter count.
🎥 Gemini Omni rolled out as the first frontier model that reasons across text, image, audio, and video in a single pass. Omni takes any combination of those modalities as input and produces video that holds physics, character consistency, and scene memory across cuts — and Pichai’s pitch on stage was “create anything from any input.” Gemini Omni Flash is rolling now to AI Plus, Pro, and Ultra through the Gemini app and Google Flow, and free to YouTube Shorts and YouTube Create users this week; the API ships in a few weeks. If you’re still benchmarking Veo against Sora, you’re playing the last war: Omni is a world model, not a video generator.
⛪ Pope Leo XIV released his first encyclical, “Magnifica Humanitas,” a 235-page treatise on AI, and presented it alongside Anthropic co-founder Chris Olah at the Vatican. The 42,000-word document calls for robust AI regulation, warns that control of the technology cannot remain in the hands “of a few,” and demands the most rigorous ethical constraints on military uses of AI. Anthropic co-founder Chris Olah stood next to Leo at the release and welcomed the criticism, saying external checks on AI labs are fundamental to the technology going well. This is the Catholic Church’s first religious doctrine of the AI era, and they chose Anthropic as their technical co-author. Read more
⚖️ A federal jury threw out Elon Musk’s lawsuit against Sam Altman in under two hours. The Oakland jury found Musk waited too long to sue and unanimously rejected every claim, including the aiding-and-abetting count against Microsoft, ending the three-week trial without ever ruling on the merits of the breach-of-fiduciary-duty argument. Musk hit X within hours promising an appeal and calling the verdict “a calendar technicality.” Read more
🧪 AI Research of the Week
Mathematical discovery at scale with AlphaProof Nexus
From Google DeepMind
Jake’s Take: DeepMind wrapped Gemini 3.1 Pro in a Lean-backed agent loop where the model proposes proofs, formally verifies them, and iterates on the failures, then turned it loose on the Erdős catalog and the Online Encyclopedia of Integer Sequences. It closed 9 of 353 open Erdős problems and 44 of 492 open OEIS conjectures (two of which had been sitting unsolved for 56 years) at a few hundred dollars of inference each.
This result from Google lands the same week OpenAI announced a separate, general-purpose reasoning model disproved Erdős’s 1946 planar unit distance conjecture, with sign-off from Noga Alon and Thomas Bloom. This is two of the three top labs producing publishable open-problem results within the same seven days, using opposite methods: formal verification at scale versus general reasoning at frontier capability.
People have long said that “AI can’t do real math,” but this appears to be a changing with the next wave of frontier models (which, to be clear, are yet to be released). The interesting question now are about throughput (how many open problems per dollar) and taste (which problems are worth pointing the system at). Mathematicians will have to decide if this is an acceptable new way to work.
what to know for later
🧑💻 Cursor shipped Composer 2.5. Composer 2.5 is built on Moonshot’s open Kimi K2.5 checkpoint, scores 79.8% on SWE-Bench Multilingual and 63.2% on CursorBench v3.1, and matches Claude Opus 4.7 and GPT-5.5 on those benchmarks for roughly $0.50/M input and $2.50/M output. Cursor also announced a from-scratch model in training with SpaceXAI at 10x more compute.