🤖 AI Agents Weekly: AgentKit, Gemini 2.5 Computer Use, State of AI Report 2025, Agentic Context Engineering, CodeMender
AgentKit, Gemini 2.5 Computer Use, State of AI Report 2025, Agentic Context Engineering, CodeMender
In today’s issue:
OpenAI launches AgentKit
Google announced Gemini 2.5 Computer Use model
State of AI Report 2025
Claude Code now supports plugins
Agentic Context Engineering
ElevenLabs announced Agent Workflows
Equipping LLM agents with memory using RL
Google DeepMind announces CodeMender for code security
Top AI news, papers, tools, and much more.
Top Stories
OpenAI Launches AgentKit
OpenAI announces AgentKit to streamline agent development: a visual workflow builder, a connector registry for governance, a drop-in chat UI kit, stronger evals, and reinforcement fine-tuning options. Built to replace fragmented orchestration, manual evals, and ad-hoc frontends with versioned, production-ready components.
Agent Builder: Visual canvas to compose multi-agent workflows with drag-and-drop nodes, tool connections, and custom guardrails. Supports preview runs, inline eval config, and full versioning for fast iteration. Teams report hours to first agent instead of months, with product, legal, and engineering collaborating in the same interface.
Connector Registry: Centralized governance for data and tool access across ChatGPT and the API. Admins manage prebuilt connectors like Google Drive, SharePoint, Teams, Dropbox, and third-party MCPs from one panel. Requires the Global Admin Console. Rolling out in beta to select Enterprise and Edu tenants.
Guardrails: Open-source, modular safety layer that flags jailbreaks, masks or flags PII, and adds policy checks. Deploy standalone or via Python and JavaScript libraries. Designed to integrate directly in Agent Builder nodes.
ChatKit: Embeddable toolkit for chat-based agent UIs with streaming, threads, “show thinking,” and in-chat actions. Skips weeks of frontend work and can be themed to your brand. Used for support, onboarding, docs assistants, and research agents by companies like Canva and HubSpot.
Evals upgrades: New capabilities to measure and improve agents at scale:
Datasets to seed and grow eval suites with auto-graders and human annotations
Trace grading to assess end-to-end workflows and pinpoint failures
Automated prompt optimization driven by grader outputs and human feedback
Third-party model support to compare models in the same evals
Reinforcement fine-tuning (RFT): GA on o4-mini and private beta on GPT-5. Adds custom tool-call training so models learn when and how to invoke tools, plus custom graders to align rewards with your KPI.
Availability and pricing: ChatKit and the new Evals features are generally available. Agent Builder is in beta. Connector Registry begins beta for orgs with the Global Admin Console. Included under standard API model pricing. A standalone Workflows API and agent deployments to ChatGPT are planned.
Where it fits: Built on the Responses API and Agents SDK for end-to-end agentic workflows such as deep research, customer support, sales, and internal assistants, with examples citing high ticket coverage and growth gains.
Keep reading with a 7-day free trial
Subscribe to AI Newsletter to keep reading this post and get 7 days of free access to the full post archives.