OpenAI's blitz week โ€” o3-mini, Deep Research, and a National Lab partnership

OpenAI's blitz week โ€” o3-mini, Deep Research, and a National Lab partnership
OpenAI's blitz week โ€” o3-mini, Deep Research, and a National Lab partnership

what to know for now

๐Ÿง  OpenAI rolls out o3-mini. OpenAI has released o3-mini, a smaller but highly capable AI model, to all Plus, Team, and Pro ChatGPT users, with limited access for free users. The model excels at breaking down complex problems for better reasoning. OpenAI has also been recruiting PhD students to enhance model capabilities, focusing on scientific coding challenges. Read more

๐ŸŽต AI-assisted Beatles track wins Grammy. The Beatlesโ€™ 2023 song Now and Then won Best Rock Performance at the Grammys, marking the first AI-assisted track to earn the award. While some criticized its AI involvement, McCartney clarified that no synthetic elements were added โ€” only existing recordings were restored. Read more

๐Ÿ•ต๏ธ OpenAI launches Deep Research mode. OpenAI's new "deep research" feature provides responses in 5โ€“30 minutes. The agent, available to Pro users, aims to function at a research analyst level. The model powering deep research scored 26.6% on the "Humanityโ€™s Last Exam" benchmark, significantly outperforming GPT-4oโ€™s 3.3%. Read more

๐ŸŽผ Riffusion launches free AI music tool. San Francisco-based Riffusion released a web platform powered by its AI model, Fuzz, which generates music from text, audio, or images. Unlike Suno, which focuses on structured song generation with vocals, Riffusion emphasizes real-time personalization and user-specific learning. Read more

๐Ÿงช AI Research of the Week

Humanity's Last Exam
From Center for AI Safety, Scale AI

Jakeโ€™s Take: The paper introduces Humanityโ€™s Last Exam (HLE), a rigorous multi-modal benchmark designed to measure the capabilities of models at the limits of human knowledge. Existing benchmarks have become ineffective due to modern models achieving near-perfect accuracy; HLE consists of 3,000 questions spanning diverse disciplines, developed by experts to be unambiguous, non-trivial, and resistant to simple lookup. Evaluations show that state-of-the-art AI models perform poorly on HLE, with accuracy below 10% and high calibration errors.

This benchmark should help see past the illusion of AI progress in academic reasoning and forces a reckoning: models have still not reached expert-level knowledge, and evaluation metrics must evolve faster than model capabilities.

what to know for later

๐Ÿ”ฌ OpenAI partners with U.S. labs. OpenAI is collaborating with national laboratories to enhance scientific research and national security using its AI models. Around 15,000 scientists will access these models via the Venado supercomputer, improving research in physics, cybersecurity, and energy. Oversight measures ensure responsible AI use while supporting U.S. innovation efforts. Read more

๐Ÿ“œ US Copyright Office limits AI copyright. The US Copyright Office (USCO) has ruled that fully AI-generated content is ineligible for copyright protection. Works incorporating substantial human creativity, however, may qualify. The USCO sees no need for new legislation, stating that existing copyright law is sufficient. Read more

๐Ÿงฉ OpenAI accuses DeepSeek of model distillation. OpenAI claims Chinese AI startup DeepSeek may have inappropriately trained its models on OpenAI outputs, violating its terms of service. DeepSeek's R1 model rivals OpenAIโ€™s o1 at a fraction of the cost, raising concerns about unauthorized knowledge transfer. OpenAI is investigating and working with the U.S. government to protect its technology. Read more

๐Ÿค– OpenAI launches ChatGPT Gov for U.S. government. ChatGPT Gov, a security-enhanced version of ChatGPT Enterprise designed for U.S. government use, operates within Microsoft's Azure cloud environments, allowing agencies to process sensitive data securely. OpenAI reports over 90,000 government employees have already used ChatGPT for tasks like policy drafting and document translation. Read more