OpenAI's blitz week — o3-mini, Deep Research, and a National Lab partnership

February 3, 2025

what to know for now

🧠 OpenAI rolls out o3-mini. OpenAI has released o3-mini, a smaller but highly capable AI model, to all Plus, Team, and Pro ChatGPT users, with limited access for free users. The model excels at breaking down complex problems for better reasoning. OpenAI has also been recruiting PhD students to enhance model capabilities, focusing on scientific coding challenges. Read more

🎵 AI-assisted Beatles track wins Grammy. The Beatles’ 2023 song Now and Then won Best Rock Performance at the Grammys, marking the first AI-assisted track to earn the award. While some criticized its AI involvement, McCartney clarified that no synthetic elements were added — only existing recordings were restored. Read more

🕵️ OpenAI launches Deep Research mode. OpenAI's new "deep research" feature provides responses in 5–30 minutes. The agent, available to Pro users, aims to function at a research analyst level. The model powering deep research scored 26.6% on the "Humanity’s Last Exam" benchmark, significantly outperforming GPT-4o’s 3.3%. Read more

🎼 Riffusion launches free AI music tool. San Francisco-based Riffusion released a web platform powered by its AI model, Fuzz, which generates music from text, audio, or images. Unlike Suno, which focuses on structured song generation with vocals, Riffusion emphasizes real-time personalization and user-specific learning. Read more

🧪 AI Research of the Week

Humanity's Last Exam
From Center for AI Safety, Scale AI
Jake’s Take: The paper introduces Humanity’s Last Exam (HLE), a rigorous multi-modal benchmark designed to measure the capabilities of models at the limits of human knowledge. Existing benchmarks have become ineffective due to modern models achieving near-perfect accuracy; HLE consists of 3,000 questions spanning diverse disciplines, developed by experts to be unambiguous, non-trivial, and resistant to simple lookup. Evaluations show that state-of-the-art AI models perform poorly on HLE, with accuracy below 10% and high calibration errors.
This benchmark should help see past the illusion of AI progress in academic reasoning and forces a reckoning: models have still not reached expert-level knowledge, and evaluation metrics must evolve faster than model capabilities.

what to know for later

🔬 OpenAI partners with U.S. labs. OpenAI is collaborating with national laboratories to enhance scientific research and national security using its AI models. Around 15,000 scientists will access these models via the Venado supercomputer, improving research in physics, cybersecurity, and energy. Oversight measures ensure responsible AI use while supporting U.S. innovation efforts. Read more

📜 US Copyright Office limits AI copyright. The US Copyright Office (USCO) has ruled that fully AI-generated content is ineligible for copyright protection. Works incorporating substantial human creativity, however, may qualify. The USCO sees no need for new legislation, stating that existing copyright law is sufficient. Read more

🧩 OpenAI accuses DeepSeek of model distillation. OpenAI claims Chinese AI startup DeepSeek may have inappropriately trained its models on OpenAI outputs, violating its terms of service. DeepSeek's R1 model rivals OpenAI’s o1 at a fraction of the cost, raising concerns about unauthorized knowledge transfer. OpenAI is investigating and working with the U.S. government to protect its technology. Read more

🤖 OpenAI launches ChatGPT Gov for U.S. government. ChatGPT Gov, a security-enhanced version of ChatGPT Enterprise designed for U.S. government use, operates within Microsoft's Azure cloud environments, allowing agencies to process sensitive data securely. OpenAI reports over 90,000 government employees have already used ChatGPT for tasks like policy drafting and document translation. Read more