🤖 AI Agents Weekly: Grok 4, Context Engineering Guide, Kimi K2, SmolLM3, MedGemma 27B, AI SDK 5

Grok 4, Context Engineering Guide, Kimi K2, SmolLM3, MedGemma 27B, AI SDK 5

Jul 12, 2025

∙ Paid

In today’s issue:

xAI releases Grok 4
DAIR.AI releases new Context Engineering Guide
Moonshot AI introduces Kimi K2, a 1T Mixture-of-Experts model
Hugging Face releases SmolLM3
Mistral AI and All Hands AI launch two code-centric LLMs
New RL-driven memory agent
Google introduces Batch Mode for the Gemini API
AI SDK 5 enters beta
Google has released MedGemma 27B
Top AI dev news, research papers, and much more.

Top Stories

Grok 4

Source: https://x.com/ArtificialAnlys/status/1943166841150644622/photo/1

xAI introduces Grok 4, a frontier model trained with large-scale reinforcement learning using the 200k-GPU Colossus cluster. Unlike its predecessor Grok 3, which pioneered RL-tuned reasoning, Grok 4 extends this to pretraining scale with a 6× boost in compute efficiency and diversified high-quality training data across math, code, science, and more. It natively learns to use tools like code interpreters and real-time search, enabling high-quality, self-directed information retrieval and deeper reasoning.

Grok 4 Heavy, the most powerful variant, scores 50.7% on the text-only "Humanity’s Last Exam" (HLE), the first model to cross the 50% threshold on this expert-level benchmark. It also leads on USAMO’25 (61.9%) and ARC-AGI v2 (15.9%, nearly double Claude Opus 4).
The model dominates competitive math and reasoning tasks: 100% on AIME’25, 96.7% on HMMT, and 88.4% on GPQA. On LiveCodeBench, it ties or beats Gemini 2.5 and o3 with 79.4%.
Grok 4’s native tool use includes real-time web and X (formerly Twitter) search, demonstrated via fully autonomous multi-step tracing of viral puzzle posts, revealing deep retrieval and analysis capabilities.
The Grok 4 API offers 256k context, multimodal capabilities, and enterprise-grade security (SOC 2 Type 2, GDPR, CCPA), with Grok 4 Voice mode now enabling camera-assisted scene understanding in conversation.

Blog | Docs

AI Newsletter

🤖 AI Agents Weekly: Grok 4, Context Engineering Guide, Kimi K2, SmolLM3, MedGemma 27B, AI SDK 5

Grok 4, Context Engineering Guide, Kimi K2, SmolLM3, MedGemma 27B, AI SDK 5

Top Stories

Grok 4

This post is for paid subscribers