🤖 AI Agents Weekly: GLM-4.5, AI SDK 5, Video Overviews, ChatGPT Study Mode, Context engineering Tips, AlphaEarth Foundations
GLM-4.5, AI SDK 5, Video Overviews, ChatGPT Study Mode, Context engineering Tips, AlphaEarth Foundations
In today’s issue:
OpenAI announces Study Mode
Z.AI releases GLM-4.5 and GLM-4.5-Air
Towards agentic GraphRAG framework via end-to-end RL
Claude Code now supports custom subagents
A survey on self-evolving agents for ASI
AI SDK v5 has been released
Context engineering tips from the Manus team
Gemini CLI now supports custom slash commands
Google DeepMind has launched AlphaEarth Foundations
Black Forest Labs and Krea AI have released FLUX.1 Krea [dev]
Top AI dev news, research papers, product/tool updates, and more.
Top Stories
GLM-4.5
Z.AI unveils GLM-4.5 and GLM-4.5-Air, flagship LLMs built for unified reasoning, coding, and agentic tasks. GLM-4.5 uses a 355B MoE architecture (32B active), while Air runs lighter at 106B (12B active). Both adopt a dual-mode inference design, “thinking” for complex reasoning/tool use and “non-thinking” for faster replies. Across 12 benchmarks, GLM-4.5 ranks 3rd overall (behind only Claude 4 Opus and GPT-4.1) and excels in several key areas:
Agentic performance: GLM-4.5 matches Claude 4 Sonnet on τ-bench and BFCL-v3, and beats Claude 4 Opus on BrowseComp (26.4% vs. 18.8%). It supports 128k context, native function calling, and achieves a 90.6% tool-calling success rate, higher than Claude, Kimi K2, and Qwen3-Coder.
Reasoning: GLM-4.5 performs near-SOTA on tasks like AIME24 (91.0), MATH500 (98.2), and GPQA (79.1), approaching or surpassing Claude and Gemini. It benefits from deep model design (increased layers, more attention heads), grouped-query attention, and multi-token prediction (MTP) for faster inference.
Coding: It ranks high on SWE-bench Verified (64.2%) and Terminal Bench (37.5%), showing strong full-stack capabilities including frontend/backend generation, slide/poster design, and game prototyping. GLM-4.5 wins 80.8% of tasks against Qwen3-Coder and 53.9% against Kimi K2, but still trails Claude 4 Sonnet in head-to-head comparisons.
Training innovations: Trained on 22T tokens, GLM-4.5 uses domain-specific instruction tuning and a two-stage RL setup with “slime”, a custom RL framework for large models. Slime enables efficient agentic RL via hybrid training, decoupled rollout engines, and mixed-precision rollouts with FP8.
Keep reading with a 7-day free trial
Subscribe to AI Newsletter to keep reading this post and get 7 days of free access to the full post archives.