1). Gemini - a series of multimodal models with multimodal reasoning capabilities across text, images, video, audio, and code; claims to outperform human experts on MMLU, a popular benchmark to test the knowledge and problem-solving abilities of AI models; capabilities reported include multimodality, multilinguality, factuality, summarization, math/science, long-context, reasoning, and more. (paper | tweet)
2). EfficientSAM - a lightweight Segment Anything Model (SAM) that exhibits decent performance with largely reduced complexity; leverages masked autoencoders with 20x fewer parameters and 20x faster runtime; EfficientSAM performs within 2 points (44.4 AP vs 46.5 AP) of the original SAM model. (paper | tweet)
3). Magicoder - a series of fully open-source LLMs for code that close the gap with top code models while having no more than 7B parameters; trained on 75K synthetic instruction data; uses open-source references for the production of more diverse, realistic, high-quality, and controllable data; outperforms state-of-the-art code models with similar or even larger sizes on several coding benchmarks, including Python text-to-code generation, multilingual coding, and data-science program completion; MagicoderS-CL-7B based on CodeLlama surpasses ChatGPT on HumanEval+ (66.5 vs. 65.9 in pass@1). (paper | tweet)
4). LLMs on Graphs - a comprehensive overview that summarizes different scenarios where LLMs are used on graphs such as pure graphs, text-rich graphs, and text-paired graphs. (paper | tweet)
5). Llama Guard - an LLM-based safeguard model that involves a small (Llama2-7B) customizable instruction-tuned model that can classify safety risks in prompts and responses for conversational AI agent use cases; the model can be leveraged in a zero-shot or few-shot way if you need to adapt it to a different safety risk taxonomy that meets the requirements for a target use case; it can also be fine-tune on a specific dataset to adapt to a new taxonomy. (paper | tweet)
6). Human-Centered Loss Functions - proposes an approach called Kahneman-Tversky Optimization (KTO) that matches or exceeds DPO performance methods at scales from 1B to 30B; KTO maximizes the utility of LLM generations instead of maximizing the log-likelihood of preferences as most current methods do. (paper | tweet)
7). Chain of Code - a simple extension of the chain-of-thought approach that improves LM code-driven reasoning; it encourages LMs to format semantic sub-tasks in a program as pseudocode that the interpreter can explicitly catch undefined behavior and hand off to simulate with an LLM; on BIG-Bench Hard, Chain of Code achieves 84%, a gain of 12% over Chain of Thought. (paper | tweet)
8). Data Management For LLMs - an overview of current research in data management within both the pretraining and supervised fine-tuning stages of LLMs; it covers different aspects of data management strategy design: data quantity, data quality, domain/task composition, and more. (paper | tweet)
9). RankZephyr - an open-source LLM for listwise zero-shot reranking that bridges the effectiveness gap with GPT-4 and in some cases surpasses the proprietary model; it outperforms GPT-4 on the NovelEval test set, comprising queries and passages past its training period, which addresses concerns about data contamination. (paper | tweet)
10). The Efficiency Spectrum of LLMs - a comprehensive review of algorithmic advancements aimed at improving LLM efficiency; covers various topics related to efficiency, including scaling laws, data utilization, architectural innovations, training and tuning strategies, and inference techniques. (paper | tweet)