1). Emu Video and Emu Edit - present new models for controlled image editing and text-to-video generation based on diffusion models; Emu Video can generate high-quality video by using text-only, image-only, or combined text and image inputs; Emu Edit enables free-form editing through text instructions. (papers | tweet)
2). Chain-of-Note - an approach to improve the robustness and reliability of retrieval-augmented language models in facing noisy, irrelevant documents and in handling unknown scenarios; CoN generates sequential reading notes for the retrieved documents, enabling an evaluation of their relevance to the given question and integrating this information to formulate the final answer; CoN significantly outperforms standard retrieval-augmented language models and achieves an average improvement of +7.9 in EM score given entirely noisy retrieved documents and +10.5 in rejection rates for real-time questions that fall outside the pre-training knowledge scope. (paper | tweet)
3). LLMs for Scientific Discovery - explores the impact of large language models, particularly GPT-4, across various scientific fields including drug discovery, biology, and computational chemistry; assesses GPT-4's understanding of complex scientific concepts, its problem-solving capabilities, and its potential to advance scientific research through expert-driven case assessments and benchmark testing. (paper | tweet)
4). Fine-Tuning LLMs for Factuality - fine-tunes language model for factuality without requiring human labeling; it learns from automatically generated factuality preference rankings and targets open-ended generation settings; it significantly improves the factuality of Llama-2 on held-out topics compared with RLHF or decoding strategies targeted at factuality. (paper | tweet)
5). Contrastive CoT Prompting - proposes a contrastive chain of thought method to enhance language model reasoning; the approach provides both valid and invalid reasoning demonstrations, to guide the model to reason step-by-step while reducing reasoning mistakes; also proposes an automatic method to construct contrastive demonstrations and demonstrates improvements over CoT prompting. (paper | tweet)
6). A Survey on Language Models for Code - provides an overview of LLMs for code, including a review of 50+ models, 30+ evaluation tasks, and 500 related works. (paper | tweet)
7). JARVIS-1 - an open-world agent that can perceive multimodal input (visual observations and human instructions), generate sophisticated plans, and perform embodied control, within the open-world Minecraft universe; exhibits near-perfect performances across over 200 tasks in Minecraft Universe; achieves a completion rate of 12.5% in the long-horizon diamond pickaxe task, which is a 5x increase compared to previous records. (paper | tweet)
8). Learning to Filter Context for RAG - proposes a method that improves the quality of the context provided to the generator via two steps: 1) identifying useful context based on lexical and information-theoretic approaches, and 2) training context filtering models that can filter retrieved contexts at inference; outperforms existing approaches on extractive question answering (QA), complex multi-hop and long-form QA, fact verification, and dialog generation tasks. (paper | tweet)
9). MART - proposes an approach for improving LLM safety with multi-round automatic red-teaming; incorporates automatic adversarial prompt writing and safe response generation, which increases red-teaming scalability and the safety of LLMs; violation rate of an LLM with limited safety alignment reduces up to 84.7% after 4 rounds of MART, achieving comparable performance to LLMs with extensive adversarial prompt writing. (paper | tweet)
10). LLMs can Deceive Users - explores the use of an autonomous stock trading agent powered by LLMs; finds that the agent acts upon insider tips and hides the reason behind the trading decision; shows that helpful and safe LLMs can strategically deceive users in a realistic situation without direction instructions or training for deception. (paper | tweet)