1). Hallucination in LLMs - a comprehensive survey (50+ pages) on hallucination in LLMs; provides information about principles, taxonomy, challenges, and open questions related to the issue of hallucination in LLMs. (paper | tweet)
2). Simplifying Transformer Blocks - explores simplifying the transformer block and finds that many block components can be removed with no loss of training speed; using different architectures like autoregressive decoder-only and BERT encoder-only models, the simplified blocks emulate per-update training speed and performance of standard transformers, and even achieve 15% faster training throughput with fewer parameters (15%). (paper | tweet)
3). Understanding In-Context Learning Abilities in Transformers - investigates how effectively transformers can bridge between pretraining data mixture to identify and learn new tasks in-context which are both inside and outside the pretraining distribution; in the regimes studied, there is limited evidence that the models’ in-context learning behavior is capable of generalizing beyond their pretraining data. (paper | tweet)
4). MusicGen - a single-stage transformer-based LLM that operates over several streams of compressed discrete music representation; it can generate high-quality samples (mono and stereo) while conditioning on textual description or melodic features. (paper | tweet)
5). AltUp - a method that makes it possible to take advantage of increasing scale and capacity in Transformer models without increasing the computational cost; achieved by working on a subblock of the widened representation at each layer and using a predict-and-correct mechanism to update the inactivated blocks; it widens the learn representation while only incurring a negligible increase in latency. (paper | tweet)
Sponsor Message
Giskard’s LLM Testing solution is a powerful open-source testing framework for ML models & LLMs that allows developers and researchers to detect hallucinations & biases automatically. Learn more about the tool in this quickstart tutorial.
6). Rephrase and Respond - an effective prompting method that uses LLMs to rephrase and expand questions posed by humans to improve overall performance; it can improve the performance of different models across a wide range of tasks; the approach can be combined with chain-of-thought to improve performance further. (paper | tweet)
7). On the Road with GPT-4V(ision) - provides an exhaustive evaluation of the latest state-of-the-art visual language model, GPT-4V(vision), and its application in autonomous driving; the model demonstrates superior performance in scene understanding and causal reasoning compared to existing autonomous systems. (paper | tweet)
8). GPT4All - outlines technical details of the GPT4All model family along with the open-source repository that aims to democratize access to LLMs. (paper | tweet)
9). S-LoRA - an approach that enables the scalable serving of many LoRA adapters; it stores all adapters in main memory and fetches adapters of currently running queries to the GPU memory; employs novel tensor parallelism strategy and highly optimized custom CUDA kernels for heterogenous batching of LoRA computation; improves throughput by 4x, when compared to other solutions, and increases the number of served adapters by several orders of magnitude. (paper | tweet)
10). FreshLLMs - proposes a dynamic QA benchmark (FreshQA) to test the factuality of LLM-generated text; proposes FreshPrompt, a simple few-shot prompting method that substantially boosts the performance of an LLM on FreshQA by incorporating relevant and up-to-date information retrieved from a search engine into the prompt; finds that instructing the LLM to generate concise and direct answers helps reduce hallucination compared to encouraging more verbose answers. (paper | tweet)