1). Zephyr LLM - a 7B parameter model with competitive performance to ChatGPT on AlpacaEval; applies distilled supervised fine-tuning to improve task accuracy and distilled direct performance optimization on AI feedback data to better align the model; shows performance comparable to 70B-parameter chat models aligned with human feedback. (paper | tweet)
2). Fact-checking with LLMs - investigates the fact-checking capabilities of LLMs like GPT-4; results show the enhanced prowess of LLMs when equipped with contextual information; GPT4 outperforms GPT-3, but accuracy varies based on query language and claim veracity; while LLMs show promise in fact-checking, they demonstrate inconsistent accuracy. (paper | tweet)
3). Matryoshka Diffusion Models - introduces an end-to-end framework for high-resolution image and video synthesis; involves a diffusion process that denoises inputs at multiple resolutions jointly and uses a NestedUNet architecture; enables a progressive training schedule from lower to higher resolutions leading to improvements in optimization for high-resolution generation. (paper | tweet)
4). Spectron - a new approach for spoken language modeling trained end-to-end to directly process spectrograms; it can be fine-tuned to generate high-quality accurate spoken language; the method surpasses existing spoken language models in speaker preservation and semantic coherence. (paper | tweet)
5). LLMs Meet New Knowledge - presents a benchmark to assess LLMs' abilities in knowledge understanding, differentiation, and association; benchmark results show (unsurprisingly) that an LLM’s performance when introduced to new knowledge is not satisfactory. (paper | tweet)
Sponsor message
DAIR.AI presents a new cohort-based course, Prompt Engineering for LLMs, that teaches how to effectively use the latest prompt engineering techniques and tools to improve the capabilities, performance, and reliability of LLMs. Enroll here. Feel free to use this 20% discount promo code: MAVENAI20
6). Detecting Pretraining Data from LLMs - explores the problem of pretraining data detection which aims to determine if a black box model was trained on a given text; proposes a detection method named Min-K% Prob as an effective tool for benchmark example contamination detection, privacy auditing of machine unlearning, and copyrighted text detection in LM’s pertaining data. (paper | tweet)
7). ConvNets Match Vision Transformers - evaluates a performant ConvNet architecture pretrained on JFT-4B at scale; observes a log-log scaling law between the held out loss and compute budget; after fine-tuning on ImageNet, NFNets match the reported performance of Vision Transformers with comparable compute budgets. (paper | tweet)
8). CommonCanvas - a dataset of Creative-Commons-licensed (CC) images to train diffusion models competitive with Stable Diffusion 2 (SD2); rely on a transfer learning technique to produce high-quality synthetic captions paired with curated CC images; also develops an efficient training recipe requiring as little as 3% of LAION-2B with comparable performance to SD2. (paper | tweet)
9). Managing AI Risks - a short paper outlining risks from upcoming and advanced AI systems, including an examination of social harms, malicious uses, and other potential societal issues emerging from the rapid adoption of autonomous AI systems. (paper | tweet)
10). Branch-Solve-Merge Reasoning in LLMs - an LLM program that consists of branch, solve, and merge modules parameterized with specific prompts to the base LLM; this enables an LLM to plan a decomposition of task into multiple parallel sub-tasks, independently solve them, and fuse solutions to the sub-tasks; improves evaluation correctness and consistency for multiple LLMs. (paper | tweet)