AI Newsletter

AI Newsletter

Share this post

AI Newsletter
AI Newsletter
⚡AI Agents Weekly: o3, Gemini 2.0 Flash Thinking, Multi AI Agents in Production, Generative Physics Engine
Copy link
Facebook
Email
Notes
More

⚡AI Agents Weekly: o3, Gemini 2.0 Flash Thinking, Multi AI Agents in Production, Generative Physics Engine

o3, Gemini 2.0 Flash Thinking, Multi AI Agents in Production, Generative Physics Engine

elvis's avatar
elvis
Dec 21, 2024
∙ Paid
15

Share this post

AI Newsletter
AI Newsletter
⚡AI Agents Weekly: o3, Gemini 2.0 Flash Thinking, Multi AI Agents in Production, Generative Physics Engine
Copy link
Facebook
Email
Notes
More
1
Share

In today’s issue:

  • OpenAI announced o3

  • Google announces Gemini 2.0 Flash Thinking

  • A generative and universal physics engine for robotics and AI applications

  • CrewAI Report on Multi AI Agents in Production

  • A new benchmark for evaluating AI agents on real-world professional tasks

  • LLM alignment faking, AI agents for feedback and discovery, and more.



Top Stories

OpenAI Announces o3

OpenAI announces a new frontier model called o3. Here's a summary of the main takeaways from the announcement:

  • OpenAI announces two new models o3 and o3-mini.

  • These models are not publicly available yet. Sam Altman, CEO of OpenAI, mentioned that they plan to launch o3-mini around the end of January and the full o3 model shortly after that. A safety program application is available for those who want to test the models for safety.

  • o3 is significantly better at coding. At the most aggressive high test-time compute setting, o3 achieves an Elo score of 2727 on the Codeforces.

  • o3 achieves 96.7% accuracy on AIME 2024, and 87.7% on GPQA Diamond. Claims state-of-the-art performances on both these benchmarks.

  • o3 gets over 25% on what OpenAI researchers consider one of the toughest benchmarks out there (EpochAI Frontier Math). The previous SoTA was 2.0%.

  • o3 achieves SoTA on ARC-AGI. When o3 is asked to think longer with high-compute, it scores 87.5%.

  • o3-mini will support reasoning with adaptive thinking time with three different modes (low, medium, and high). o3-mini performance on Codeforces with significant cost-efficiency compared to o1.

  • o3-mini (low) achieves comparable latency to gpt-4o on math benchmarks.

  • o3-mini will come with all the API features as o1.

Official Announcement

Keep reading with a 7-day free trial

Subscribe to AI Newsletter to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 elvis
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share

Copy link
Facebook
Email
Notes
More