1). Agents Learn Soccer Skills - applies deep reinforcement learning to synthesize agile soccer skills for a miniature humanoid robot; the resulting policy allows dynamic movement skills such as fast recovery, walking, and kicking. (paper | tweet)
2). Scaling Transformer to 1M tokens with RMT - leverages a recurrent memory transformer architecture to increase BERT’s effective context length to two million tokens while maintaining high memory retrieval accuracy. (paper | tweet)
3). Track Anything - an interactive tool for video object tracking and segmentation; it’s built on top segment anything and allows flexible tracking and segmenting via user clicks. (paper | tweet)
4). A Cookbook of Self-Supervised Learning - provides an overview of fundamental techniques and key concepts in SSL; it also introduces practical considerations for implementing SSL methods successfully. (paper | tweet)
5). Harnessing the Power of LLMs - a comprehensive and practical guide for practitioners working with LLMs; discusses many use cases with practical applications and limitations of LLMs in real-world scenarios. (paper | tweet)
6). AudioGPT - connects ChatGPT with audio foundational models to handle challenging audio tasks and a modality transformation interface to enable spoken dialogue. (paper | tweet)
7). DataComp - releases a new multimodal dataset benchmark containing 12.8B image-text pairs. (paper | tweet)
8). ChatGPT for Information Extraction - provides a deeper assessment of ChatGPT's performance on the important information extraction task. (paper | tweet)
9). Comparing Physician vs ChatGPT - investigates if chatbot assistants like ChatGPT can provide responses to patient questions while emphasizing quality and empathy; finds that chatbot responses were preferred over physician responses and rated significantly higher in terms of both quality and empathy. (paper | tweet)
10). Stable and Low-Precision Training for Large-Scale Vision-Language Models - introduces methods for accelerating and stabilizing training of large-scale language vision models. (paper | tweet)