🤖AI Agents Weekly: CodeScientist, Nova Act, Awesome MCP Servers, AWS MCP Servers

CodeScientist, Nova Act, Awesome MCP Servers, AWS MCP Servers

Apr 05, 2025

∙ Paid

In today’s issue:

AI2 releases CodeScientist
Letta AI releases open format for stateful agents
OpenHands LM is a new 32B open-source, local coding agent
DAIR.AI shared a new guide on getting started with MCP
AWS releases AWS MCP Servers
Nova Act: Amazon’s browser-native agent model
HuggingFace releases YourBench
Evaluating AI Agents on replicating AI research
Awesome MCP Servers is a curated list of MCP servers
AI dev news and much more

Top Stories

CodeScientist

Researchers at AI2 release CodeScientist, a system that autonomously generates and tests scientific hypotheses via code-based experimentation. It’s among the first to produce validated discoveries with minimal human input. Key ideas:

Code-first scientific agent – CodeScientist reviews research papers and assembles experiments using vetted Python code blocks (e.g., for analysis, simulation). It follows a five-step pipeline: Ideation → Planning → Code Execution → Reporting → Meta-Analysis.
Validated AI discoveries – From 50 AI research papers on agents and virtual environments, CodeScientist proposed 19 findings. Of these, 6 were judged scientifically sound and novel. Examples:
- Confidence ≠ Accuracy – LLM self-assessed confidence in simulations often mismatched actual accuracy.
- Simpler state = better prediction – Using binary vs. text states improved model reliability.
- Graph memory helps – Agents with graph-structured memory outperformed baselines in a scientific simulation game.
Human-guided autonomy – Full automation is possible, but brief human feedback (e.g., ranking ideas) significantly boosts output quality. Human-in-the-loop interaction improves idea selection and experiment debugging.
Challenges remain – Despite successes, over half the generated experiments fail due to code errors, not scientific flaws. Peer review is still needed to verify results, and current systems lack deep methodological rigor.

Paper | Blog | GitHub

Keep reading with a 7-day free trial

Subscribe to AI Newsletter to keep reading this post and get 7 days of free access to the full post archives.