🤖 AI Agents Weekly: DeepSeek-OCR, Claude Code on the Web, ChatGPT Atlas Browser,...

DeepSeek-OCR, Claude Code on the Web, ChatGPT Atlas Browser

Oct 25, 2025

∙ Paid

In today’s issue:

DeepSeek releases DeepSeek-OCR:
Claude Code on the Web
OpenAI launches ChatGPT Atlas
Apple introduces a foundation model for computer-use agents
Anthropic Sandbox Runtime
Web Agents that Learn Tools
Stanford CME 295: Comprehensive LLM Course
Top AI papers, product updates, and more.

Top Stories

DeepSeek-OCR

DeepSeek released DeepSeek-OCR, an open-source optical character recognition model focused on contextual optical compression that investigates vision encoder roles from an LLM-centric perspective. Available under MIT license, it enables efficient document and image text extraction with advanced visual understanding capabilities for both commercial and research applications.

Key features include:

Multiple native resolution modes (Tiny: 512×512, Small: 640×640, Base: 1024×1024, Large: 1280×1280) with dynamic resolution support through “Gundam” mode, combining multiple resolutions
Comprehensive task support, including document-to-markdown conversion, free OCR and text extraction, figure parsing, image description, and element location with grounding capabilities
High-performance inference achieving ~2500 tokens/s concurrency on A100-40G GPUs when processing PDFs through vLLM
Native vLLM support (v0.11.1+) for batch processing and HuggingFace Transformers compatibility for flexible implementations
Built on CUDA 11.8 + PyTorch 2.6.0 with Flash Attention 2 support and bfloat16 precision

Developers can integrate DeepSeek-OCR through vLLM for high-throughput batch processing or Transformers for custom implementations. The model uses simple <image> token prompts that can be customized with task-specific instructions, making it accessible for various document understanding and visual text extraction workflows.

GitHub

AI Newsletter

🤖 AI Agents Weekly: DeepSeek-OCR, Claude Code on the Web, ChatGPT Atlas Browser,...

DeepSeek-OCR, Claude Code on the Web, ChatGPT Atlas Browser

Top Stories

DeepSeek-OCR

This post is for paid subscribers