⚡AI Agents Weekly: o4, Gemini 2.5 Flash, Embed 4, GUI-R1, FastAPI-MCP
o4, Gemini 2.5 Flash, Embed 4, GUI-R1, FastAPI-MCP
In today’s issue:
OpenAI releases new models, o4-mini, o3, and GPT-4.1
Google announced Gemini 2.5 Flash
How to build an agentic search system from scratch
Firecrawl announced a new agentic web scraper
Cohere has released Embed 4
OpenAI releases new practical guide for building agents
A generalist R1-style vision-language model for GUI agents
Tadata Inc. released FastAPI-MCP
How to build an agent with Go and Claude
Top AI dev news and much more
Top Stories
GPT-4.1, o4-mini, and o3
OpenAI has released its most advanced models to date—o3 and o4-mini—marking a huge leap in multimodal and agentic reasoning capabilities. These models integrate tool use, allowing them to autonomously decide when and how to leverage resources such as web search, code execution, and image generation.
o3, in particular, sets new state-of-the-art performance benchmarks across domains like coding, mathematics, science, and visual problem solving, outperforming earlier models (like o1 and o3-mini) in both accuracy and flexibility. Meanwhile, o4-mini delivers high-efficiency reasoning, excelling on tasks like AIME math problems while remaining cost-effective and suitable for high-throughput applications. Both models exhibit improved instruction following, natural dialogue, and personalized memory-aware responses, setting a new standard for general-purpose AI agents.
o3 demonstrates substantial gains from reinforcement learning, showing that additional compute enables deeper, more accurate reasoning. It excels particularly in agentic tool use, such as generating visual explanations, solving complex equations, or navigating long workflows. Multimodal capabilities now include "thinking with images," allowing the models to interpret and reason about visual content (e.g., blurry photos, diagrams) with best-in-class performance on benchmarks like MathVista and CharXiv.
OpenAI has also introduced Codex CLI, a command-line interface for local AI-assisted coding, and a $1M grant program to support innovations built on o3 and o4-mini. These models are now available to ChatGPT Plus, Pro, and Team users, with expanded API access and a forthcoming o3-pro model on the horizon.
OpenAI also launched the GPT-4.1 model family (GPT-4.1, mini, and nano), boasting major improvements in coding, instruction following, and long-context reasoning, supporting up to 1M tokens and outperforming GPT-4o across the board. GPT-4.1 achieves 54.6% on SWE-bench, 38.3% on MultiChallenge, and sets new SOTA on Video-MME, while the nano model offers high performance with minimal latency and cost (as low as $0.10 per million input tokens). These models power more reliable agent workflows, offer strong vision capabilities (e.g., 74.8% on MMMU), and enable scalable applications across legal, financial, and software engineering domains.
With lower prices, faster response times, and strong real-world benchmarks, GPT-4.1 sets a new standard for developers building advanced, multi-modal, and agentic systems.
Keep reading with a 7-day free trial
Subscribe to NLP Newsletter to keep reading this post and get 7 days of free access to the full post archives.