🤖 AI Agents Weekly: Self-Improving Agents, Eleven v3, /Search, Deep Research Updates, Top AI Devs News, Agents SDK for TypeScript
Self-Improving Agents, Eleven v3, /Search, Deep Research Updates, Top AI Devs News, Agents SDK for TypeScript
In today’s issue:
Self-Challenging Language Model Agents
Gemini 2.5 Pro Updates
OpenAI Agents SDK for TypeScript
ElevenLabs has launched Conversational AI 2.0
DAIR.AI released a new talk on Reasoning LLMs
ElevenLabs Eleven v3 Alpha TTS
Firecrawl launches Advanced Search for AI Agents
OpenAI Deep Research Updates
Top AI papers, dev news, product updates, and more.
Top Stories
ElevenLabs Eleven v3 Alpha
ElevenLabs announced Eleven v3 Alpha, which they consider their most advanced and expressive Text-to-Speech model. It introduces a more expressive and controllable TTS experience, emphasizing longer, detailed prompts and fine-grained vocal direction through audio tags. Their guide outlines how to shape speech output effectively with new settings and tags.
Voice + stability tuning – Voice selection is crucial; emotional nuance must match the chosen voice. The stability slider offers three modes, Creative (most expressive), Natural (balanced), and Robust (least responsive but most consistent), to trade off between emotion and reliability.
Audio tags for emotional delivery – Users can embed tags like [laughs], [whispers], [sarcastic], or [strong x accent] to elicit fine-grained prosody. These cues influence pacing, tone, and non-verbal sounds, creating realistic delivery if matched with the right voice.
Rich prompt structure and punctuation – Capitalization, ellipses, and expressive punctuation affect rhythm and emphasis. For example, "It was a VERY long day [sigh] … nobody listens anymore." creates a more human cadence.
Multi-speaker and sound effects support – Assigning different voices to each line in a dialogue enables realistic back-and-forth. Tags like [applause] and [gunshot] add cinematic flair.
Best practices – Prompts should exceed 250 characters to avoid inconsistencies. Combining audio tags, proper voice selection, and structured text allows V3 to simulate dynamic, emotionally rich speech with near-human delivery.
Keep reading with a 7-day free trial
Subscribe to AI Newsletter to keep reading this post and get 7 days of free access to the full post archives.