AI Newsletter

AI Newsletter

Share this post

AI Newsletter
AI Newsletter
🤖 AI Agents Weekly: Grok 4, Context Engineering Guide, Kimi K2, SmolLM3, MedGemma 27B, AI SDK 5

🤖 AI Agents Weekly: Grok 4, Context Engineering Guide, Kimi K2, SmolLM3, MedGemma 27B, AI SDK 5

Grok 4, Context Engineering Guide, Kimi K2, SmolLM3, MedGemma 27B, AI SDK 5

Jul 12, 2025
∙ Paid
7

Share this post

AI Newsletter
AI Newsletter
🤖 AI Agents Weekly: Grok 4, Context Engineering Guide, Kimi K2, SmolLM3, MedGemma 27B, AI SDK 5
1
Share

In today’s issue:

  • xAI releases Grok 4

  • DAIR.AI releases new Context Engineering Guide

  • Moonshot AI introduces Kimi K2, a 1T Mixture-of-Experts model

  • Hugging Face releases SmolLM3

  • Mistral AI and All Hands AI launch two code-centric LLMs

  • New RL-driven memory agent

  • Google introduces Batch Mode for the Gemini API

  • AI SDK 5 enters beta

  • Google has released MedGemma 27B

  • Top AI dev news, research papers, and much more.



Top Stories

Grok 4

Image
Source: https://x.com/ArtificialAnlys/status/1943166841150644622/photo/1

xAI introduces Grok 4, a frontier model trained with large-scale reinforcement learning using the 200k-GPU Colossus cluster. Unlike its predecessor Grok 3, which pioneered RL-tuned reasoning, Grok 4 extends this to pretraining scale with a 6× boost in compute efficiency and diversified high-quality training data across math, code, science, and more. It natively learns to use tools like code interpreters and real-time search, enabling high-quality, self-directed information retrieval and deeper reasoning.

  • Grok 4 Heavy, the most powerful variant, scores 50.7% on the text-only "Humanity’s Last Exam" (HLE), the first model to cross the 50% threshold on this expert-level benchmark. It also leads on USAMO’25 (61.9%) and ARC-AGI v2 (15.9%, nearly double Claude Opus 4).

  • The model dominates competitive math and reasoning tasks: 100% on AIME’25, 96.7% on HMMT, and 88.4% on GPQA. On LiveCodeBench, it ties or beats Gemini 2.5 and o3 with 79.4%.

  • Grok 4’s native tool use includes real-time web and X (formerly Twitter) search, demonstrated via fully autonomous multi-step tracing of viral puzzle posts, revealing deep retrieval and analysis capabilities.

  • The Grok 4 API offers 256k context, multimodal capabilities, and enterprise-grade security (SOC 2 Type 2, GDPR, CCPA), with Grok 4 Voice mode now enabling camera-assisted scene understanding in conversation.

Blog | Docs

Keep reading with a 7-day free trial

Subscribe to AI Newsletter to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 elvis
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share