1). LLMs for Discoveries in Mathematical Sciences - uses LLMs to search for new solutions in mathematics & computer science; proposes FunSearch which combines a pre-trained LLM with a systematic evaluator and iterates over them to evolve low-scoring programs into high-scoring ones discovering new knowledge; one of the key findings in this work is that safeguarding against LLM hallucinations is important to produce mathematical discoveries and other real-world problems. (paper | tweet)
2). Weak-to-strong Generalization - studies whether weak model supervision can elicit the full capabilities of stronger models; finds that when naively fine-tuning strong pretrained models on weak model generated labels they can perform better than their weak supervisors; reports that finetuning GPT-4 with a GPT-2-level supervisor it’s possible to recover close to GPT-3.5-level performance on NLP tasks. (paper | tweet)
3). Audiobox - a unified model based on flow-matching capable of generating various audio modalities; designs description-based and example-based prompting to enhance controllability and unify speech and sound generation paradigms; adapts a self-supervised infilling objective to pre-train on large quantities of unlabeled audio; performs well on speech and sound generation and unlocks new methods for generating audio with novel vocal and acoustic styles. (paper | tweet)
4). Mathematical LLMs - a survey on the progress of LLMs on mathematical tasks; covers papers and resources on LLM research around prompting techniques and tasks such as math word problem-solving and theorem proving. (paper | tweet)
5). Towards Fully Transparent Open-Source LLMs - proposes LLM360 to support open and collaborative AI research by making the end-to-end LLM training process transparent and reproducible; releases 7B parameter LLMs pre-trained from scratch, AMBER and CRYSTALCODER, including their training code, data, intermediate checkpoints, and analyses. (paper | tweet)
6). LLMs in Medicine - a comprehensive survey (analyzing 300+ papers) on LLMs in medicine; includes an overview of the principles, applications, and challenges faced by LLMs in medicine. (paper | tweet)
7). Beyond Human Data for LLMs - proposes an approach for self-training with feedback that can substantially reduce dependence on human-generated data; the model-generated data combined with a reward function improves the performance of LLMs on problem-solving tasks. (paper | tweet)
8). Gaussian-SLAM - a neural RGBD SLAM method capable of photorealistically reconstructing real-world scenes without compromising speed and efficiency; extends classical 3D Gaussians for scene representation to overcome the limitations of the previous methods. (paper | tweet)
9). Pearl - introduces a new production-ready RL agent software package that enables researchers and practitioners to develop RL AI agents that adapt to environments with limited observability, sparse feedback, and high stochasticity. (paper | tweet)
10). Quip - compresses trained model weights into a lower precision format to reduce memory requirements; the approach combines lattice codebooks with incoherence processing to create 2 bit quantized models; significantly closes the gap between 2 bit quantized LLMs and unquantized 16 bit models. (paper | tweet)
From what I can tell, the FunSearch paper is a genetic algorithm paired with a "fact checking" LLM. It's a great approach, but it doesn't seem like it's worth of all the hype it's getting - am I wrong here?
Audiobox and Pearl caught my attention.
Thank you for sharing the list.