1). Depth Anything - a robust monocular depth estimation solution that can deal with any images under any circumstance; automatically annotates large-scale unlabeled data (~62M) which helps to reduce generalization error; proposes effective strategies to leverage the power of the large-scale unlabeled data; besides generalization ability, it established new state-of-the-art through fine-tuning and even results in an enhanced depth-conditioned ControlNet. (paper | tweet)
2). Knowledge Fusion of LLMs - proposes FuseLLM with the core idea of externalizing knowledge from multiple LLMs and transferring their capabilities to a target LLM; leverages the generative distributions of source LLMs to externalize both their collective knowledge and individual strengths and transfer them to the target LLM through continual training; finds that the FuseLLM can improve the performance of the target model across a range of capabilities such as reasoning, common sense, and code generation. (paper | tweet)
3). MambaByte - adapts Mamba SSM to learn directly from raw bytes; bytes lead to longer sequences which autoregressive Transformers will scale poorly on; this work reports huge benefits related to faster inference and even outperforms subword Transformers. (paper | tweet)
4). Diffuse to Choose - a diffusion-based image-conditioned inpainting model to balance fast inference with high-fidelity while enabling accurate semantic manipulations in a given scene content; outperforms existing zero-shot diffusion inpainting methods and even few-shot diffusion personalization algorithms such as DreamPaint. (paper | tweet)
5). WARM - introduces weighted averaged rewards models (WARM) that involve fine-tuning multiple rewards models and then averaging them in the weight space; average weighting improves efficiency compared to traditional prediction ensembling; it improves the quality and alignment of LLM predictions. (paper | tweet)
6). Resource-efficient LLMs & Multimodal Models - a survey of resource-efficient LLMs and multimodal foundations models; provides a comprehensive analysis and insights into ML efficiency research, including architectures, algorithms, and practical system designs and implementations. (paper | tweet)
7). Red Teaming Visual Language Models - first presents a red teaming dataset of 10 subtasks (e.g., image misleading, multi-modal jailbreaking, face fairness, etc); finds that 10 prominent open-sourced VLMs struggle with the red teaming in different degrees and have up to 31% performance gap with GPT-4V; also applies red teaming alignment to LLaVA-v1.5 with SFT using the proposed red teaming dataset, which improves model performance by 10% in the test set. (paper | tweet)
8). Lumiere - a text-to-video space-time diffusion model for synthesizing videos with realistic and coherent motion; introduces a Space-Time U-Net architecture to generate the entire temporal duration of a video at once via a single pass; achieves state-of-the-art text-to-video generation results and supports a wide range of content creation tasks and video editing applications, including image-to-video, video inpainting, and stylized generation. (paper | tweet)
9). Medusa - a simple framework for LLM inference acceleration using multiple decoding heads that predict multiple subsequent tokens in parallel; parallelization substantially reduces the number of decoding steps; it can achieve over 2.2x speedup without compromising generation quality, while Medusa-2 further improves the speedup to 2.3-3.6x. (paper | tweet)
10). AgentBoard - a comprehensive benchmark with an open-source evaluation framework to perform analytical evaluation of LLM agents; helps to assess the capabilities and limitations of LLM agents and demystifies agent behaviors which leads to building stronger and robust LLM agents. (paper | tweet)
Hey Elvis I'm looking for guest posts to my AI Newsletter and was curious if you'd be interested in doing a compilation of "Top ML Papers of the Month", it might be a good way to get new signups to this Newsletter and your work generally. I think my audience would really love it!
I'd share it on my two biggest Newsletters, one of which is on LinkedIn. Let me know if you are interested. It would also be relatively easy for you to put together considering you already have weekly editions.
This is my guide for guest contributors in how I work: https://aisupremacy.substack.com/p/guest-posts-on-ai-supreamcy