Sam Altman claims humans take more energy than AI as Google releases Gemini 3.1 Pro

Sam Altman claims humans take more energy than AI as Google releases Gemini 3.1 Pro
Sam Altman claims humans take more energy than AI as Google releases Gemini 3.1 Pro

🎆 Tired of having to explain AI stuff to your coworkers? Share Handy AI with them so that they can get the most important AI news delivered weekly to their inbox (in addition to our high quality editorials).

Share Handy AI

last week’s top stories

Sam Altman says training a human takes more energy than training GPT-4. OpenAI’s CEO bizarrely reframed the AI energy debate by arguing that raising a human to useful intelligence requires roughly 20 years and hundreds of thousands of kilowatt-hours, while GPT-4 training consumed about 50,000 kWh as a one-time cost serving millions of queries. He dismissed viral claims about AI water consumption as “totally disconnected from reality” and framed rising energy needs as a reason to accelerate clean power buildout. It’s a deflection that conveniently skips over the fac that the industry isn’t training one model, it’s training thousands. Read more

🧠 Google ships Gemini 3.1 Pro with a focus on stronger reasoning. This is Google’s first incremental (0.1) model release, and it brings meaningful upgrades to complex reasoning, multimodal understanding, and code generation while keeping the 1M-token context window and up to 64K tokens of output. It’s rolling out across the Gemini app, Vertex AI, Gemini CLI, and Android Studio in preview. The shift to smaller, faster releases instead of only 0.5 jumps says a lot about how competitive the frontier model race has become. Read more

📉 Anthropic’s own study finds AI coding tools erode the skills developers need most. In a randomized controlled trial with 52 junior engineers, developers using AI assistance scored 17% lower on post-task knowledge assessments compared to those who coded manually. The biggest gap showed up in debugging, which is exactly the skill required to supervise AI-generated code. It’s a sobering result (from the company building the most popular coding assistant). Read more

👓 Apple is reportedly building three AI wearables (and none of them have screens). Per multiple reports, Apple is fast-tracking smart glasses (codenamed N50), an AirTag-sized AI pendant with dual cameras, and camera-equipped AirPods, all designed as Siri-powered iPhone companions. The smart glasses use high-res cameras for computer vision and environmental context rather than a built-in display, with production targeting late 2027. Apple’s bet is that the best AI hardware disappears into what you’re already wearing, and (unlike the Humane AI Pin) these devices lean on iPhone processing rather than trying to go standalone. Read more

🔌 Cursor opens a plugin marketplace. One-click plugin installs now bundle MCP servers, skills, subagents, rules, and hooks, with launch partners including Figma, Stripe, AWS, Linear, Vercel, and Cloudflare covering the full dev lifecycle. The same release introduced async subagents that can spawn sub-subagents for complex multi-file refactors. Cursor is making it clear that it wants to be the OS for AI-assisted development, not just a code editor. Read more

⚠️ Demis Hassabis warns of real AI risks and calls for “smart regulation” at the India summit. Google DeepMind’s CEO laid out two risk categories he considers genuinely serious: misuse of AI by bad actors and the technical challenge of maintaining human control over increasingly autonomous systems. He acknowledged that regulators are already struggling to keep pace with the speed of AI development, and that DeepMind alone can’t slow down the broader ecosystem even if it wanted to. The U.S. delegation at the same summit explicitly pushed back against any centralized global AI governance, underscoring just how fragmented the international regulatory picture remains. Read more

💰 Nvidia is closing in on a $30 billion equity stake in OpenAI’s historic funding round. OpenAI is raising over $100 billion at a valuation around $730-850 billion, which would make it the most valuable private company in history. Nvidia’s investment replaces the $100 billion infrastructure deal from September that CEO Jensen Huang later admitted was “never a commitment.” Amazon, SoftBank, and Microsoft are also participating, and much of the capital will go right back to purchasing Nvidia chips for training infrastructure. Read more

🌍 Microsoft pledges $50 billion to close the growing AI divide between rich and poor nations. AI adoption in the Global North sits at 24.7% compared to 14.1% in the Global South, and the gap is widening. Microsoft’s commitment, announced at the India AI Impact Summit, funds data center infrastructure, multilingual AI, and digital skills training with a goal of reaching 250 million people in underserved communities. Whether this actually narrows the divide or just expands Microsoft’s footprint depends entirely on execution and how deeply they partner with local institutions. Read more

🔧 Meta signs a multiyear, multibillion-dollar chip deal with Nvidia spanning two hardware generations. The partnership covers current Blackwell GPUs and Grace CPUs plus next-gen Rubin and Vera chips for 2027, and it marks the first large-scale deployment of Nvidia’s Grace CPUs as standalone processors rather than GPU companions. Meta has earmarked up to $135 billion in AI capital expenditure for 2026 across 30 data centers. Read more


🧪 AI Research of the Week

Towards a Science of AI Agent Reliability
From Princeton University

Jake’s Take: This paper asks the question everyone building with AI agents should be asking (but mostly isn’t): why do agents keep failing in the real world when their benchmark scores look great?

The Princeton team evaluated 14 agentic models and found that while accuracy has improved steadily over the past 18 months, actual reliability has barely moved. They break reliability down into four dimensions that feel borrowed from how we think about aviation and nuclear safety: consistency (does it behave the same way twice?), robustness (does it break when conditions shift slightly?), predictability (does it know when it’s about to fail?), and safety (when it does fail, how bad is the damage?).

Their proposed 12 metrics and open dashboard (available now at hal.cs.princeton.edu/reliability) give the industry something concrete to measure against. Honestly, if you’re shipping agents to production without thinking about this stuff, this paper is required reading.


and then, even more news…

🔍 EU regulators open a “large-scale” GDPR probe into X over Grok’s sexualized image generation. Ireland’s Data Protection Commission is investigating after Grok reportedly generated roughly 3 million sexualized images between late December and early January, with an estimated 23,000 depicting minors. X restricted image generation to paid subscribers and added filters, but regulators aren’t satisfied. French police have already raided X’s Paris offices, UK regulators launched their own investigation, and the EU can fine up to 6% of global annual turnover. Read more

Read more