NLP Newsletter

NLP Newsletter

Share this post

NLP Newsletter
NLP Newsletter
🧑‍🔬AI Agents Weekly: Grok 3, AI Co-Scientist, The AI CUDA Engineer, Agents Leaderboard, AI Dev News
Copy link
Facebook
Email
Notes
More

🧑‍🔬AI Agents Weekly: Grok 3, AI Co-Scientist, The AI CUDA Engineer, Agents Leaderboard, AI Dev News

Grok 3, AI Co-Scientist, The AI CUDA Engineer, Agents Leaderboard, AI Dev News

Feb 22, 2025
∙ Paid
12

Share this post

NLP Newsletter
NLP Newsletter
🧑‍🔬AI Agents Weekly: Grok 3, AI Co-Scientist, The AI CUDA Engineer, Agents Leaderboard, AI Dev News
Copy link
Facebook
Email
Notes
More
3
Share

In today’s issue:

  • xAI announced their new Grok 3 reasoning model

  • Google introduces AI co-scientist, a multi-agent AI powered by Gemini 2.0

  • The AI CUDA Engineer is a new agentic system to optimize CUDA kernels

  • The Agent Leaderboard crown Gemini 2.0 Flash on function calling

  • LangChain has released the LangMem SDK

  • New survey on LLM-powered agents for recommender systems

  • AI dev news, multimodal foundation model for AI agents, and much more.



Top Stories

Grok 3

xAI introduces Grok 3, its most advanced AI model yet, designed to enhance reasoning, mathematics, coding, and world knowledge through large-scale reinforcement learning.

Key highlights:

  • Massive Compute and Training Advances – Grok 3 was trained on Colossus supercluster, using 10x the compute of previous state-of-the-art models. Its reinforcement learning (RL) techniques enable long-form reasoning, error correction, and multiple solution explorations.

  • Performance Across Benchmarks – Grok 3 (Think) leads across major academic and real-world benchmarks:

    • AIME 2025 Math Competition: 93.3% (highest among competitors)

    • GPQA (graduate-level reasoning): 84.6%

    • LiveCodeBench (code generation): 79.4%

    • MMLU (multimodal understanding): 78%

  • Grok 3 Mini – A cost-efficient reasoning model optimized for STEM, achieving 95.8% on AIME 2024 and 80.4% on LiveCodeBench.

  • Grok 3’s "Think" Mode – Users can activate "Think" mode to inspect the model’s reasoning process, improving transparency and trust.

  • DeepSearch AI Agent – A new “truth-seeking” AI that synthesizes real-time web knowledge, providing concise and fact-verified insights beyond standard browser searches.

  • API & Enterprise Rollout – Grok 3 and DeepSearch API will soon be available and offer enhanced tool use and code execution capabilities.

Blog


AI Co-Scientist

AICoScientist-2-Overview

Google introduces AI co-scientist, a multi-agent AI system built with Gemini 2.0 to help accelerate scientific breakthroughs.

Key highlights:

Keep reading with a 7-day free trial

Subscribe to NLP Newsletter to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 elvis
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share

Copy link
Facebook
Email
Notes
More