1). OLMo - introduces Open Language Model (OLMo), a 7B parameter model; it includes open training code, open data, full model weights, evaluation code, and fine-tuning code; it shows strong performance on many generative tasks; there is also a smaller version of it, OLMo 1B. (paper | tweet)
2). Advances in Multimodal LLMs - a comprehensive survey outlining design formulations for model architecture and training pipeline around multimodal large language models. (paper | tweet)
3). Corrective RAG - proposes Corrective Retrieval Augmented Generation (CRAG) to improve the robustness of generation in a RAG system; the core idea is to implement a self-correct component for the retriever and improve the utilization of retrieved documents for augmenting generation; the retrieval evaluator helps to assess the overall quality of retrieved documents given a query; using web search and optimized knowledge utilization operations can improve automatic self-correction and efficient utilization of retrieved documents. (paper | tweet)
4). LLMs for Mathematical Reasoning - introduces an overview of research developments in LLMs for mathematical reasoning; discusses advancements, capabilities, limitations, and applications to inspire ongoing research on LLMs for Mathematics. (paper | tweet)
5). Compression Algorithms for LLMs - covers compression algorithms like pruning, quantization, knowledge distillation, low-rank approximation, parameter sharing, and efficient architecture design. (paper | tweet)
6). MoE-LLaVA - employs Mixture of Experts tuning for Large Vision-Language Models which constructs a sparse model with a substantial reduction in parameters with a constant computational cost; this approach also helps to address performance degradation associated with multi-modal learning and model sparsity. (paper | tweet)
7). Rephrasing the Web - uses an off-the-shelf instruction-tuned model prompted to paraphrase web documents in specific styles and formats such as “like Wikipedia” or “question-answer format” to jointly pre-train LLMs on real and synthetic rephrases; it speeds up pre-training by ~3x, improves perplexity, and improves zero-shot question answering accuracy on many tasks. (paper | tweet)
8). Redefining Retrieval in RAG - a study that focuses on the components needed to improve the retrieval component of a RAG system; confirms that the position of relevant information should be placed near the query, the model will struggle to attend to the information if this is not the case; surprisingly, it finds that related documents don't necessarily lead to improved performance for the RAG system; even more unexpectedly, irrelevant and noisy documents can help drive up accuracy if placed correctly. (paper | tweet)
9). Hallucination in LVLMs - discusses hallucination issues and techniques to mitigate hallucination in Large Vision-Language Models (LVLM); it introduces LVLM hallucination evaluation methods and benchmarks; provides tips and a good analysis of the causes of LVLM hallucinations and potential ways to mitigate them. (paper | tweet)
10). SliceGPT - a new LLM compression technique that proposes a post-training sparsification scheme that replaces each weight matrix with a smaller dense matrix; helps reduce the embedding dimension of the network and can remove up to 20% of model parameters for Llama2-70B and Phi-2 models while retaining most of the zero-shot performance of the dense models. (paper | tweet)