ChatGPT adds an agent mode that can perform tasks for you

ChatGPT adds an agent mode that can perform tasks for you
ChatGPT adds an agent mode that can perform tasks for you

Get bigger weekly updates! Free subscribers receive the top stories each week, while Paid subscribers will get a few extra stories. All support for Handy AI directly helps us maintain the newsletter and keep the information flowing.

last week’s top stories

🤖 ChatGPT rolls out new agent mode for autonomous task completion. OpenAI introduced a ChatGPT “agent” feature that can take actions online and perform multi-step tasks, not just chat. The agent mode lets ChatGPT navigate websites, run code, log in with user permission, and generate outputs like spreadsheets or slides autonomously. Read more

💬 Musk’s xAI launches controversial Grok AI companions. Elon Musk’s AI company xAI added new personal “companions” to its Grok chatbot app, including an anime-inspired virtual girlfriend and a misanthropic panda character. The Grok companions are AI avatars with distinct personalities – one (“Ani”) engages in flirtatious, NSFW roleplay, while another (“Bad Rudy”) encourages violent fantasies. Read more

🍿 Netflix uses AI visual effects for first time to cut costs. Netflix revealed it employed generative AI to create a VFX scene in its new sci-fi show El Eternauta, marking the first use of AI in a Netflix production. Co-CEO Ted Sarandos said an AI tool helped render a building collapse sequence 10× faster than traditional methods, enabling blockbuster-quality effects on a TV. Netflix insists the tech “helped creators make the series better, not just cheaper,” but the move comes amid Hollywood’s debate over AI’s impact on jobs in VFX and filmmaking. Read more

🎬 Runway debuts Act-Two, a next-gen AI motion capture model. AI video startup Runway launched Act-Two, an upgraded motion-capture AI that can animate a character using just a single video of a human performance. Act-Two improves greatly on Runway’s earlier model; it now supports detailed head, face, body and hand tracking, meaning an actor’s subtle expressions and gestures can be transferred to any 3D character or style with high fidelity. The aim is to reduce reliance on expensive mocap suits and VFX work. Read more

🥇 OpenAI’s model wins gold at Int’l Math Olympiad. An experimental OpenAI reasoning model achieved “gold medal” performance on the 2025 International Math Olympiad, solving 5 of 6 extremely difficult problems under official contest conditions. OpenAI’s team ran the model in two 4.5‑hour sessions without external tools, and it scored 35/42 points (a result on par with the top human competitors). Read more

🦄 Vibe coding startup Lovable joins the unicorn club. Fast-growing Swedish AI startup Lovable (which lets people build apps via chatting) raised a $200 million Series A led by Accel at a $1.8 billion valuation. Only eight months old, Lovable has 2.3 million users (180k paying) and reached $75 million ARR by helping non-coders prototype web and mobile apps via AI. Read more

🧪 AI-powered “self-driving” lab discovers materials 10× faster. Scientists at NC State built an autonomous chemistry lab that runs experiments continuously via AI, accelerating data collection by an order of magnitude. Unlike traditional labs that do one trial at a time, this system varies chemical reactions in real-time and streams sensor data every half-second. Read more

👑 Moonshot AI’s Kimi K2 model claims open-source crown. Beijing-based startup Moonshot AI released Kimi-K2, a 1-trillion-parameter open-source language model that’s topping benchmarks and even rivaling closed models. K2 demonstrated exceptional coding abilities (65.8% on a GitHub bug-fix test, beating GPT-4.1) and strong reasoning, and Moonshot open-sourced its weights on HuggingFace for anyone to use. Read more

🧪 OpenAI tests “o3 alpha” model with next-gen coding skills. A mysterious new “o3‑alpha” model appeared in OpenAI’s ChatGPT service, indicating an upcoming upgrade focused on coding and reasoning. The alpha model (spotted as “o3-alpha-responses-2025-07-17”) outperformed the current GPT-4-based “o3” on tasks like web design and even building simple web games from scratch. Read more


🧪 AI Research of the Week

Scaling text-rich image understanding via code-guided synthetic multimodal data generation
From University of Pennsylvania & Allen Institute for AI

Jake's Take: CoSyn is an interesting take on a generative image model. It writes its own code that draws charts, tables, diagrams, even nutrition labels, then auto-generates quiz-style instructions about every picture.

This code → image → question loop produced 400k images and 2.7 million Q-A pairs in a run. Then, a (compact) 7-billion-parameter vision-language model (trained only on that synthetic set!) outclassed all open-source peers and even edged closed giants like GPT-4V and Gemini-Flash across seven word-heavy visual benchmarks. Adding just 7k more fake nutrition-label examples let the same model leapfrog most rivals on a brand-new NutritionQA test set.

The takeaway here should be that this study is solid proof that synthetic instruction data sets can help scale a robust model quickly, positing the question of whether human hand-labeled instruction sets are on the way out.


and then, even more news…

📈 TSMC hits $1 trillion valuation on surging AI chip demand. Taiwan’s chipmaking giant TSMC saw its market cap top $1 trillion last week (a first for an Asian firm) amid booming orders for AI processors. Its stock has jumped ~50% since April as TSMC raised its 2025 revenue growth forecast to ~30%, signaling optimism that exploding AI model demand will fuel its advanced 3nm and 2nm chip business. Read more

🔮 GPT-5 rumors swirl after Alpha test leaks. The AI community spotted hints that GPT-5 is in internal testing; a commit on a public biosecurity repo referenced “GPT-5-reasoning-alpha-2025-07-13”, suggesting OpenAI’s next model is being used for high-stakes research already. Additionally, a new model called “o3 Alpha” popped up on OpenAI’s web arena, showing unprecedented coding prowess (like one-shot generating complex web apps) and raising speculation it may be part of GPT-5’s development. Read more

🚫 Meta refuses to sign the EU’s voluntary AI code of practice. Meta (Facebook’s parent) said it won’t sign the EU’s new voluntary AI guidelines, breaking with peers like Microsoft. The EU’s code asks AI firms to document training data, respect copyright, and audit AI systems ahead of the upcoming AI Act. Meta’s policy chief blasted it as “overreach” creating legal uncertainties and warned it would “throttle development of frontier models in Europe”. Read more

🤝 Cognition snags Windsurf after Google’s $2.4B talent grab. The AI startup Cognition agreed to acquire what’s left of Windsurf, a coding assistant platform, days after Google swooped in and hired away Windsurf’s top team. OpenAI had tried to buy Windsurf for $3 billion, but that deal collapsed over partner conflicts; Google then paid $2.4 billion for a non-exclusive license to Windsurf’s tech and to bring its CEO and engineers into DeepMind. Cognition will now pick up Windsurf’s remaining product assets and customers, showing how coveted AI “agentic IDE” tools have become in Big Tech’s talent war. Read more

🚇 China’s delivery robots ride the subway to restock stores. In Shenzhen, 7-Eleven has deployed a fleet of cute autonomous robots that hop on the metro trains to deliver goods. About 41 of these four-wheeled robots, made by Vanke, navigate city streets, elevators, and subway lines to bring snacks and inventory to over 100 convenience stores, even during off-peak transit hours. Read more

🪱 Harvard builds a creepy “wormbot” swarm that tangles together. Inspired by slimy California blackworms, Harvard engineers unveiled foot-long soft robots that twist and knot themselves into blobs to move collectively. Each worm-like robot has an air bladder that, when inflated, makes it curl; multiple bots then intertwine, forming an entangled mass that can crawl as a unit on land or even in water. This wriggling robot swarm can squeeze through gaps and climb obstacles by acting as a physical team, offering a nightmare-fuel yet innovative approach to search-and-rescue robotics (the project even won a Best Paper award at ICRA 2025). Read more

🚀 Stealth AI startup Thinking Machines Lab nears first product launch. Thinking Machines Lab (founded by ex-OpenAI CTO Mira Murati) has raised a staggering $2 billion seed round at a $12 billion valuation without a product on the market. Murati announced that in the next couple of months TML will finally reveal its first AI product, which she says will be a multimodal “AI that works how you naturally interact” and will include a major open-source component for developers. Read more

🏗 Meta to build massive AI superclusters for “future AI”. Mark Zuckerberg announced Meta will invest hundreds of billions of dollars to build several AI supercomputing data centers as part of its push toward “superintelligent” AI. The first mega–data center, nicknamed “Prometheus”, comes online in 2026 and will be Meta’s first multi-gigawatt-scale AI cluster, with a second (“Hyperion”) planned to scale up to 5 GW. Zuckerberg said just one of these titan complexes will span a footprint comparable to Manhattan. Read more

🛡 Pentagon awards $200M AI contracts to OpenAI and xAI. The U.S. Defense Department struck deals (worth up to $200 million each) with OpenAI, xAI and others to prototype advanced AI systems for national security. Under the initiative, the DoD will collaborate with leading AI labs to develop “agentic AI” workflows. Notably, Elon Musk’s xAI also rolled out a “Grok for Government” offering, making its latest Grok 4 model available to federal agencies and defense users. Read more

🚀 SpaceX reportedly to invest $2 billion in Musk’s xAI. Elon Musk may be using SpaceX’s cash to fuel his AI ambitions; according to a WSJ report, SpaceX has agreed to put $2 billion into xAI as part of a $5B fundraising round for the startup. The deal (not officially confirmed yet) would value xAI, which recently merged with Musk’s social app X/Twitter, at around $113 billion. Read more