OpenAI and Mistral focus on making AI smaller and cheaper

what to know for now
💡 OpenAI introduces affordable GPT-4o Mini for developers. GPT-4o Mini, cheaper and more capable than GPT-3.5, aims to broaden AI accessibility. It supports text, vision, and soon multimodal inputs, achieving 82% on MMLU benchmark. Available for ChatGPT users and API developers. Read more
🚀 Mistral launches faster, efficient LLM for code generation. Codestral Mamba offers linear time inference and models sequences of infinite length. Tested on benchmarks like HumanEval, outperforming models like CodeLlama 7B and CodeGemma-1.1 7B, with a 22B version available under commercial licenses. Read more
🤝 Mistral and NVIDIA release customizable Mistral NeMo 12B model. Mistral NeMo, with 12 billion parameters, excels in chatbots, multilingual tasks, coding, and summarization. It features 128K context length and optimized deployment on NVIDIA hardware. Read more
🛠️ Groq's open-source models surpass AI giants in specialized tasks. Llama-3-Groq-70B-Tool-Use tops Berkeley Function Calling Leaderboard, outperforming models from OpenAI, Google, and Anthropic. Using ethical synthetic data, it achieves 90.76% accuracy, promoting privacy-friendly AI development. Read more
🧪 AI Research of the Week
Prover-Verifier Games improve legibility of LLM outputs
From OpenAIJake’s Take: This research introduces a method to improve the legibility of outputs from large language models (LLMs) by using a Prover-Verifier Game. The technique involves training helpful “provers” to produce correct and clear solutions, and sneaky provers to generate convincing but incorrect solutions, which helps verifiers become robust in discerning correctness. The study demonstrates that this training method enhances both the accuracy and legibility of LLM-generated solutions, particularly in solving grade-school math problems.
Implementing these kinds of “games” in LLM training could significantly enhance the transparency and trustworthiness of AI outputs, facilitating their adoption in high-stakes applications where clear and reliable reasoning is crucial.
what to know for later
🧠 OpenAI's Strawberry project aims to enhance AI reasoning capabilities. Strawberry seeks to enable autonomous internet navigation and deep research, a major step beyond current AI abilities. Post-training methods, similar to Stanford's STaR, could improve long-horizon tasks, leveraging a specialized dataset. Read more
🗽 Trump allies plan AI "Manhattan Projects" to advance military tech. Proposals include creating industry-led agencies to protect AI models and repeal Biden's AI regulations. Plans align with "Make America First in AI," echoing Trump's 2016 tech leadership commitment. Read more
🎓 Andrej Karpathy launches AI-focused education platform Eureka Labs. Eureka Labs aims to integrate AI teaching assistants into course materials, starting with AI course LLM101n. AI-generated assistants may enhance learning by supporting human teachers in guiding students. Read more
🎤 YouTube Music debuts sound search, tests AI conversational radio. Sound search lets users find songs by singing, humming, or playing music. Rolling out on Android and iOS, it allows queries from a 100 million-song catalog. AI conversational radio creates custom stations based on user descriptions. Read more