🧑‍🔬AI Agents Weekly: Grok 3, AI Co-Scientist, The AI CUDA Engineer, Agents Leaderboard, AI Dev News

Grok 3, AI Co-Scientist, The AI CUDA Engineer, Agents Leaderboard, AI Dev News

Feb 22, 2025

∙ Paid

In today’s issue:

xAI announced their new Grok 3 reasoning model
Google introduces AI co-scientist, a multi-agent AI powered by Gemini 2.0
The AI CUDA Engineer is a new agentic system to optimize CUDA kernels
The Agent Leaderboard crown Gemini 2.0 Flash on function calling
LangChain has released the LangMem SDK
New survey on LLM-powered agents for recommender systems
AI dev news, multimodal foundation model for AI agents, and much more.

Top Stories

Grok 3

xAI introduces Grok 3, its most advanced AI model yet, designed to enhance reasoning, mathematics, coding, and world knowledge through large-scale reinforcement learning.

Key highlights:

Massive Compute and Training Advances – Grok 3 was trained on Colossus supercluster, using 10x the compute of previous state-of-the-art models. Its reinforcement learning (RL) techniques enable long-form reasoning, error correction, and multiple solution explorations.
Performance Across Benchmarks – Grok 3 (Think) leads across major academic and real-world benchmarks:
- AIME 2025 Math Competition: 93.3% (highest among competitors)
- GPQA (graduate-level reasoning): 84.6%
- LiveCodeBench (code generation): 79.4%
- MMLU (multimodal understanding): 78%
Grok 3 Mini – A cost-efficient reasoning model optimized for STEM, achieving 95.8% on AIME 2024 and 80.4% on LiveCodeBench.
Grok 3’s "Think" Mode – Users can activate "Think" mode to inspect the model’s reasoning process, improving transparency and trust.
DeepSearch AI Agent – A new “truth-seeking” AI that synthesizes real-time web knowledge, providing concise and fact-verified insights beyond standard browser searches.
API & Enterprise Rollout – Grok 3 and DeepSearch API will soon be available and offer enhanced tool use and code execution capabilities.

Blog

AI Co-Scientist

Google introduces AI co-scientist, a multi-agent AI system built with Gemini 2.0 to help accelerate scientific breakthroughs.