🤖 AI Agents Weekly: Microsoft's Seven MAI Models, Gemma 4 12B, NVIDIA Nemotron 3 Ultra, Agents' Last Exam, Devin Desktop, and More

Microsoft's Seven MAI Models, Gemma 4 12B, NVIDIA Nemotron 3 Ultra, Agents' Last Exam, Devin Desktop, and More

Jun 06, 2026

∙ Paid

In today’s issue:

Microsoft ships seven new MAI models
MAI-Thinking-1 takes on Claude Sonnet
Gemma 4 12B runs agents on a laptop
NVIDIA opens 550B Nemotron 3 Ultra
Anthropic warns of recursive self-improvement
Agents’ Last Exam stumps frontier agents
Claude Platform gets an ant CLI
Cognition launches Devin Desktop
Nous ships Hermes Desktop
Codex builds iOS apps end-to-end
ChatGPT memory learns to dream
Multi-agent computer use beats solo CUAs
Economy of Minds prices agent actions
LEAP solves all 12 Putnam problems
A harness rewrites itself for +19 SWE points

And all the top AI dev news, papers, and tools.

Top Stories

Microsoft Launches Seven In-House MAI Models

Microsoft AI unveiled a family of seven models trained from scratch, led by MAI-Thinking-1, its first reasoning model, in a bid for long-term self-sufficiency from OpenAI.

MAI-Thinking-1: A 35B reasoning model that scores 97% on AIME and 53% on SWE-Bench Pro, with early testers preferring it side-by-side over Claude Sonnet 4.6 on overall quality.
A full stack: The launch also ships MAI-Image-2.5 and Flash, MAI-Transcribe-1.5, MAI-Voice-2 and Flash, and MAI-Code-1-Flash for code generation.
Clean training: Every model was trained on commercially licensed data with no distillation from third-party labs, which Microsoft frames as a hedge against legal risk for enterprise customers.
Why it matters: Suleyman positions the release as a “hill-climbing machine,” a shared training infrastructure meant to keep Microsoft on the frontier as compute scales, and a direct shot at its biggest enterprise rival.

MAI-Thinking-1 ships with a detailed 109-page technical report.

Blog | Tech Report

Gemma 4 12B Brings Agentic Reasoning to Your Laptop

Gemma 4 12B

Google released Gemma 4 12B, a unified, encoder-free multimodal open model that brings agentic reasoning, vision, and native audio to consumer hardware under an Apache 2.0 license.

Encoder-free design: Vision inputs pass through a single lightweight matrix multiplication and audio is projected directly into the same space as text tokens, dropping separate modality encoders.
Runs locally: Fits in 16GB of VRAM or unified memory, small enough for a laptop, with support across LM Studio, Ollama, and Google AI Edge Gallery.
Punches up: Reaches performance nearing Google’s larger 26B MoE model at less than half the memory footprint, and is the first mid-sized Gemma with native audio input.
Community traction: The release topped Hacker News, with builders showing it running on a 10-year-old Xeon CPU.

Blog

This post is for paid subscribers

#nojs-banner { position: fixed; bottom: 0; left: 0; padding: 16px 16px 16px 32px; width: 100%; box-sizing: border-box; background: red; color: white; font-family: -apple-system, "Segoe UI", Roboto, Helvetica, Arial, sans-serif, "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol"; font-size: 13px; line-height: 13px; } #nojs-banner a { color: inherit; text-decoration: underline; } This site requires JavaScript to run correctly. Please turn on JavaScript or unblock scripts