🧑🔬AI Agents Weekly: Grok 3, AI Co-Scientist, The AI CUDA Engineer, Agents Leaderboard, AI Dev News
Grok 3, AI Co-Scientist, The AI CUDA Engineer, Agents Leaderboard, AI Dev News
In today’s issue:
xAI announced their new Grok 3 reasoning model
Google introduces AI co-scientist, a multi-agent AI powered by Gemini 2.0
The AI CUDA Engineer is a new agentic system to optimize CUDA kernels
The Agent Leaderboard crown Gemini 2.0 Flash on function calling
LangChain has released the LangMem SDK
New survey on LLM-powered agents for recommender systems
AI dev news, multimodal foundation model for AI agents, and much more.
Top Stories
Grok 3
xAI introduces Grok 3, its most advanced AI model yet, designed to enhance reasoning, mathematics, coding, and world knowledge through large-scale reinforcement learning.
Key highlights:
Massive Compute and Training Advances – Grok 3 was trained on Colossus supercluster, using 10x the compute of previous state-of-the-art models. Its reinforcement learning (RL) techniques enable long-form reasoning, error correction, and multiple solution explorations.
Performance Across Benchmarks – Grok 3 (Think) leads across major academic and real-world benchmarks:
AIME 2025 Math Competition: 93.3% (highest among competitors)
GPQA (graduate-level reasoning): 84.6%
LiveCodeBench (code generation): 79.4%
MMLU (multimodal understanding): 78%
Grok 3 Mini – A cost-efficient reasoning model optimized for STEM, achieving 95.8% on AIME 2024 and 80.4% on LiveCodeBench.
Grok 3’s "Think" Mode – Users can activate "Think" mode to inspect the model’s reasoning process, improving transparency and trust.
DeepSearch AI Agent – A new “truth-seeking” AI that synthesizes real-time web knowledge, providing concise and fact-verified insights beyond standard browser searches.
API & Enterprise Rollout – Grok 3 and DeepSearch API will soon be available and offer enhanced tool use and code execution capabilities.
AI Co-Scientist
Google introduces AI co-scientist, a multi-agent AI system built with Gemini 2.0 to help accelerate scientific breakthroughs.
Key highlights:
Keep reading with a 7-day free trial
Subscribe to NLP Newsletter to keep reading this post and get 7 days of free access to the full post archives.