1). A Survey on Evaluation of LLMs - a comprehensive overview of evaluation methods for LLMs focusing on what to evaluate, where to evaluate, and how to evaluate. (paper | tweet)
2). How Language Models Use Long Contexts - finds that LM performance is often highest when relevant information occurs at the beginning or end of the input context; performance degrades when relevant information is provided in the middle of a long context. (paper | tweet)
3). LLMs as Effective Text Rankers - proposes a prompting technique that enables open-source LLMs to perform state-of-the-art text ranking on standard benchmarks. (paper | tweet)
4). Multimodal Generation with Frozen LLMs - introduces an approach that effectively maps images to the token space of LLMs; enables models like PaLM and GPT-4 to tackle visual tasks without parameter updates; enables multimodal tasks and uses in-context learning to tackle various visual tasks. (paper | tweet)
5). CodeGen2.5 - releases a new code LLM trained on 1.5T tokens; the 7B model is on par with >15B code-generation models and it’s optimized for fast sampling. (paper | tweet)
6). Elastic Decision Transformer - introduces an advancement over Decision Transformers and variants by facilitating trajectory stitching during action inference at test time, achieved by adjusting to shorter history that allows transitions to diverse and better future states. (paper | tweet)
7). Robots That Ask for Help - presents a framework to measure and align the uncertainty of LLM-based planners that ask for help when needed. (paper | tweet)
8). Physics-based Motion Retargeting in Real-Time - proposes a method that uses reinforcement learning to train a policy to control characters in a physics simulator; it retargets motions in real-time from sparse human sensor data to characters of various morphologies. (paper | tweet)
9). Scaling Transformer to 1 Billion Tokens - presents LongNet, a Transformer variant that can scale sequence length to more than 1 billion tokens, with no loss in shorter sequences. (paper | tweet)
10). InterCode - introduces a framework of interactive coding as a reinforcement learning environment; this is different from the typical coding benchmarks that consider a static sequence-to-sequence process. (paper | tweet)