ChatGPT's eyes arrive soon, new models from Mistral and DeepSeek

November 22, 2024

what to know for now

🌐 Mistral launches Pixtral Large, rivals ChatGPT features. Pixtral Large, a 124B-parameter multimodal AI, advances document, image, and text capabilities, now integrated into Mistral's free chatbot, Le Chat. Enhanced features include web search, image generation, and automation tools. Read more

🧠 DeepSeek unveils advanced reasoning model R1-Lite-Preview. DeepSeek's R1-Lite-Preview surpasses OpenAI's o1-preview in reasoning benchmarks like AIME and MATH, showcasing advanced "chain-of-thought" capabilities. Accessible via DeepSeek Chat, future open-source releases are planned. Read more

📚 OpenAI's ChatGPT guide sparks skepticism. OpenAI launched a free course to help K-12 teachers integrate ChatGPT into classrooms, emphasizing lesson planning and AI literacy. Educators question privacy, ethics, and AI's educational value, citing contradictory guidance. Read more

🎨 FLUX.1 enhances AI image editing tools. Black Forest Labs unveiled FLUX.1 Tools, offering inpainting, outpainting, and structural guidance for text-to-image workflows. Open-source and professional versions cater to developers and enterprises. Read more

🧪 AI Research of the Week

Adding Error Bars to Evals: A Statistical Approach to Language Model Evaluations
From Anthropic
Jake’s Take: This paper introduces a statistical framework for evaluating language models, emphasizing rigorous experimental design and analysis methods. It critiques current practices for over-reliance on simplistic metrics like single-point state-of-the-art scores and advocates for the inclusion of statistical measures such as confidence intervals and error bars. Anthropic proposes techniques to minimize noise and enhance the reliability of model comparisons, including paired analysis, clustered standard errors, and variance reduction strategies like resampling and next-token probabilities.
The paper may help encourage the AI industry to rethink evaluation metrics, replacing shallow benchmarks with statistically robust methodologies that could dismantle superficial claims of superiority (and maybe prevent the innumerous Tweets claiming dominance).

what to know for later

📸 Live Camera coming to ChatGPT soon. OpenAI's latest beta hints at "Live Camera" integration for ChatGPT, featuring real-time video analysis and visual recognition. This expands Advanced Voice Mode, enabling dynamic visual interactions like object identification and landmark details. Read more

📈 Amazon injects $4B into Anthropic growth. Amazon's total investment in AI startup Anthropic reaches $8 billion, reinforcing its position as a minority investor. AWS will now serve as Anthropic's primary cloud and AI training partner, leveraging AWS Trainium and Inferentia chips. Read more

🖥️ OpenAI explores browser market disruption. OpenAI considers developing a web browser integrating ChatGPT and has explored deals with companies like Conde Nast and Priceline to power search features. This could challenge Google's dominance, already under DOJ scrutiny over Chrome. Read more

🧬 Evo redefines DNA interpretation and design. Arc Institute’s Evo, a biological foundation model trained on DNA, predicts and designs sequences over one million bases. Published in Science, it pioneers genome design and engineering possibilities. Read more