1). Llama 2 - a collection of pretrained foundational models and fine-tuned chat models ranging in scale from 7B to 70B; Llama 2-Chat is competitive on a range of tasks and shows strong results on safety and helpfulness. (paper | tweet)
2). How is ChatGPT’s Behavior Changing Over Time? - evaluates different versions of GPT-3.5 and GPT-4 on various tasks and finds that behavior and performance vary greatly over time; this includes differences in performance for tasks such as math problem-solving, safety-related generations, and code formatting. (paper | tweet)
3). FlashAttention-2 - improves work partitioning and parallelism and addresses issues like reducing non-matmul FLOPs, parallelizing attention computation which increases occupancy, and reducing communication through shared memory. (paper | tweet)
A word from our Sponsor:
Are you a data scientist with experience working with a crowd-sourced platform like Kaggle, machine learning, or web3 technologies? If you are, we'd love to interview you for a half-hour about your work and experience with machine learning. Find out about our latest product, get whitelisted for future product releases, and snag a $50 Amazon Gift Card. Details here!
4). Measuring Faithfulness in Chain-of-Thought Reasoning - finds that CoT reasoning shows large variation across tasks by simple interventions like adding mistakes and paraphrasing; demonstrates that as the model becomes larger and more capable, the reasoning becomes less faithful; suggests carefully choosing the model size and tasks can enable CoT faithfulness. (paper | tweet)
5). Generative TV & Showrunner Agents - an approach to generate episodic content using LLMs and multi-agent simulation; this enables current systems to perform creative storytelling through the integration of simulation, the user, and powerful AI models and enhance the quality of AI-generated content. (paper | tweet)
6). Challenges & Application of LLMs - summarizes a comprehensive list of challenges when working with LLMs that range from brittle evaluations to prompt brittleness to a lack of robust experimental designs. (paper | tweet)
7). Retentive Network - presents a foundation architecture for LLMs with the goal to improve training efficiency, inference, and efficient long-sequence modeling; adapts retention mechanism for sequence modeling that support parallel representation, recurrent representations, and chunkwise recurrent representation. (paper | tweet)
8). Meta-Transformer - a framework that performs unified learning across 12 modalities; it can handle tasks that include fundamental perception (text, image, point cloud, audio, video), practical application (X-Ray, infrared, hyperspectral, and IMU), and data mining (graph, tabular, and time-series). (paper | tweet)
9). Retrieve In-Context Example for LLMs - presents a framework to iteratively train dense retrievers to identify high-quality in-context examples for LLMs; the approach enhances in-context learning performance demonstrated using a suite of 30 tasks; examples with similar patterns are helpful and gains are consistent across model sizes. (paper | tweet)
10). FLASK - proposes fine-grained evaluation for LLMs based on a range of alignment skill sets; involves 12 skills and can help to provide a holistic view of a model’s performance depending on skill, domain, and level of difficulty; useful to analyze factors that make LLMs more proficient at specific skills. (paper | tweet)