🤖AI Agents Weekly: Gemini 3 Flash, GPT Image 1.5, Mistral OCR 3, GPT-5.2-Codex,NVIDIA Nemotron 3, Budget-aware Agent Scaling
Gemini 3 Flash, GPT Image 1.5, Mistral OCR 3, GPT-5.2-Codex,NVIDIA Nemotron 3, Budget-aware Agent Scaling
In today’s issue:
Gemini 3 Flash matches GPT-5.2, 3x faster
Mistral OCR 3 extracts from any document
GPT Image 1.5 edits 4x faster
Physics law discovered in LLM agents
GPT-5.2-Codex achieves SOTA coding
Grok Voice Agent API launches
NVIDIA Nemotron 3 open models released
Claude Organization Skills launches
Meta releases SAM Audio
Budget-aware agent scaling study released
And all the top AI dev news, papers, and tools.
Top Stories
Gemini 3 Flash
Google released Gemini 3 Flash, a speed-optimized frontier model that matches Gemini 3 Pro and GPT-5.2 performance while being 3x faster. It’s now the default model in the Gemini app and powers AI mode in Google Search.
Benchmark performance: Scores 90.4% on GPQA Diamond (PhD-level reasoning), 81.2% on MMMU-Pro (multimodal reasoning), and 33.7% on Humanity’s Last Exam without tools. Outperforms Gemini 3 Pro on SWE-bench Verified for coding tasks.
Efficiency gains: Uses 30% fewer tokens on average than Gemini 2.5 Pro for thinking tasks. Pricing is $0.50 per million input tokens and $3.00 per million output tokens.
Multimodal capabilities: Accepts text, images, video, audio, and PDFs with up to 1 million input tokens. Can watch videos, analyze images, and process audio to generate content.
Developer access: Available via Gemini API, Google AI Studio, Antigravity, Gemini CLI, Android Studio, and Vertex AI. Already adopted by JetBrains, Figma, Cursor, Harvey, and Latitude.
