1). Universal Adversarial LLM Attacks - finds universal and transferable adversarial attacks that cause aligned models like ChatGPT and Bard to generate objectionable behaviors; the approach automatically produces adversarial suffixes using greedy and gradient search. (paper | tweet)
2). RT-2 - a new end-to-end vision-language-action model that learns from both web and robotics data; enables the model to translate the learned knowledge to generalized instructions for robotic control. (paper | tweet)
3). Med-PaLM Multimodal - introduces a new multimodal biomedical benchmark with 14 different tasks; it presents a proof of concept for a generalist biomedical AI system called Med-PaLM Multimodal; it supports different types of biomedical data like clinical text, imaging, and genomics. (paper | tweet)
4). Tracking Anything in High Quality - propose a framework for high-quality tracking anything in videos; consists of a video multi-object segmented and a pretrained mask refiner model to refine the tracking results; the model ranks 2nd place in the VOTS2023 challenge. (paper | tweet)
5). Foundation Models in Vision - presents a survey and outlook discussing open challenges and research directions for foundational models in computer vision. (paper | tweet)
6). L-Eval - a standardized evaluation for long context language models containing 411 long documents over 2K query-response pairs encompassing areas such as law, finance, school lectures, long conversations, novels, and meetings. (paper | tweet)
7). LoraHub - introduces LoraHub to enable efficient cross-task generalization via dynamic LoRA composition; it enables the combination of LoRA modules without human expertise or additional parameters/gradients; mimics the performance of in-context learning in few-shot scenarios. (paper | tweet)
8). Survey of Aligned LLMs - presents a comprehensive overview of alignment approaches, including aspects like data collection, training methodologies, and model evaluation. (paper | tweet)
9). WavJourney - leverages LLMs to connect various audio models to compose audio content for engaging storytelling; this involves an explainable and interactive design that enhances creative control in audio production. (paper | tweet)
10). FacTool - a task and domain agnostic framework for factuality detection of text generated by LLM; the effectiveness of the approach is tested on tasks such as code generation and mathematical reasoning; a benchmark dataset is released, including a ChatGPT plugin. (paper | tweet)
Reach out to elvis if you would like to sponsor the next issue of the newsletter. We can help promote your tool, research, company, etc.
I'm particularly excited about the Med-PaLM Multimodal and L-Eval papers, as they seem to have great potential in the biomedical and long-context language model domains. Looking forward to more insightful updates.