🤖 AI Agents Weekly: DeepSeek-OCR, Claude Code on the Web, ChatGPT Atlas Browser,...
DeepSeek-OCR, Claude Code on the Web, ChatGPT Atlas Browser
In today’s issue:
DeepSeek releases DeepSeek-OCR:
Claude Code on the Web
OpenAI launches ChatGPT Atlas
Apple introduces a foundation model for computer-use agents
Anthropic Sandbox Runtime
Web Agents that Learn Tools
Stanford CME 295: Comprehensive LLM Course
Top AI papers, product updates, and more.
Top Stories
DeepSeek-OCR
DeepSeek released DeepSeek-OCR, an open-source optical character recognition model focused on contextual optical compression that investigates vision encoder roles from an LLM-centric perspective. Available under MIT license, it enables efficient document and image text extraction with advanced visual understanding capabilities for both commercial and research applications.
Key features include:
Multiple native resolution modes (Tiny: 512×512, Small: 640×640, Base: 1024×1024, Large: 1280×1280) with dynamic resolution support through “Gundam” mode, combining multiple resolutions
Comprehensive task support, including document-to-markdown conversion, free OCR and text extraction, figure parsing, image description, and element location with grounding capabilities
High-performance inference achieving ~2500 tokens/s concurrency on A100-40G GPUs when processing PDFs through vLLM
Native vLLM support (v0.11.1+) for batch processing and HuggingFace Transformers compatibility for flexible implementations
Built on CUDA 11.8 + PyTorch 2.6.0 with Flash Attention 2 support and bfloat16 precision
Developers can integrate DeepSeek-OCR through vLLM for high-throughput batch processing or Transformers for custom implementations. The model uses simple <image> token prompts that can be customized with task-specific instructions, making it accessible for various document understanding and visual text extraction workflows.
Keep reading with a 7-day free trial
Subscribe to AI Newsletter to keep reading this post and get 7 days of free access to the full post archives.

