AI Newsletter

AI Newsletter

Share this post

AI Newsletter
AI Newsletter
🤖 AI Agents Weekly: GLM-4.5, AI SDK 5, Video Overviews, ChatGPT Study Mode, Context engineering Tips, AlphaEarth Foundations

🤖 AI Agents Weekly: GLM-4.5, AI SDK 5, Video Overviews, ChatGPT Study Mode, Context engineering Tips, AlphaEarth Foundations

GLM-4.5, AI SDK 5, Video Overviews, ChatGPT Study Mode, Context engineering Tips, AlphaEarth Foundations

Aug 02, 2025
∙ Paid
24

Share this post

AI Newsletter
AI Newsletter
🤖 AI Agents Weekly: GLM-4.5, AI SDK 5, Video Overviews, ChatGPT Study Mode, Context engineering Tips, AlphaEarth Foundations
1
Share

In today’s issue:

  • OpenAI announces Study Mode

  • Z.AI releases GLM-4.5 and GLM-4.5-Air

  • Towards agentic GraphRAG framework via end-to-end RL

  • Claude Code now supports custom subagents

  • A survey on self-evolving agents for ASI

  • AI SDK v5 has been released

  • Context engineering tips from the Manus team

  • Gemini CLI now supports custom slash commands

  • Google DeepMind has launched AlphaEarth Foundations

  • Black Forest Labs and Krea AI have released FLUX.1 Krea [dev]

  • Top AI dev news, research papers, product/tool updates, and more.



Top Stories

GLM-4.5

Z.AI unveils GLM-4.5 and GLM-4.5-Air, flagship LLMs built for unified reasoning, coding, and agentic tasks. GLM-4.5 uses a 355B MoE architecture (32B active), while Air runs lighter at 106B (12B active). Both adopt a dual-mode inference design, “thinking” for complex reasoning/tool use and “non-thinking” for faster replies. Across 12 benchmarks, GLM-4.5 ranks 3rd overall (behind only Claude 4 Opus and GPT-4.1) and excels in several key areas:

  • Agentic performance: GLM-4.5 matches Claude 4 Sonnet on τ-bench and BFCL-v3, and beats Claude 4 Opus on BrowseComp (26.4% vs. 18.8%). It supports 128k context, native function calling, and achieves a 90.6% tool-calling success rate, higher than Claude, Kimi K2, and Qwen3-Coder.

  • Reasoning: GLM-4.5 performs near-SOTA on tasks like AIME24 (91.0), MATH500 (98.2), and GPQA (79.1), approaching or surpassing Claude and Gemini. It benefits from deep model design (increased layers, more attention heads), grouped-query attention, and multi-token prediction (MTP) for faster inference.

  • Coding: It ranks high on SWE-bench Verified (64.2%) and Terminal Bench (37.5%), showing strong full-stack capabilities including frontend/backend generation, slide/poster design, and game prototyping. GLM-4.5 wins 80.8% of tasks against Qwen3-Coder and 53.9% against Kimi K2, but still trails Claude 4 Sonnet in head-to-head comparisons.

  • Training innovations: Trained on 22T tokens, GLM-4.5 uses domain-specific instruction tuning and a two-stage RL setup with “slime”, a custom RL framework for large models. Slime enables efficient agentic RL via hybrid training, decoupled rollout engines, and mixed-precision rollouts with FP8.

Blog

Keep reading with a 7-day free trial

Subscribe to AI Newsletter to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 elvis
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share