1). DINOv2 - a new method for training high-performance computer vision models based on self-supervised learning; enables learning rich and robust visual features without supervision which are useful for both image-level visual tasks and pixel-level tasks; tasks supported include image classification, instance retrieval, video understanding, depth estimation, and much more. (paper | tweet)
2). Compressing Prompts with Gist Tokens - an approach that trains language models to compress prompts into gist tokens reused for compute efficiency; this approach enables 26x compression of prompts, resulting in up to 40% FLOPs reductions. (paper | tweet)
3). Deep Learning for Large-Scale Biomolecular Dynamics - presents a framework for large-scale biomolecular simulation; this is achieved through the high accuracy of equivariant deep learning and the ability to scale to large and long simulations; the system is able to “perform nanoseconds-long stable simulations of protein dynamics and scale up to a 44-million atom structure of a complete, all-atom, explicitly solvated HIV capsid on the Perlmutter supercomputer.” (paper | tweet)
4). Evaluating Verifiability in Generative Search Engines - performs human evaluation to audit popular generative search engines such as Bing Chat, Perplexity AI, and NeevaAI; finds that, on average, only 52% of generated sentences are supported by citations and 75% of citations support their associated sentence. (paper | tweet)
5). Generative Disco - an AI system based on LLMs and text-to-image models that generates music visualizations. (paper | tweet)
6). A Survey of Topological Neural Networks. (paper | tweet)
7) Visual Instruction Tuning - presents an approach that uses language-only GPT-4 to generate multimodal language-image instruction-following data; applies instruction tuning with the data and introduces LLaVA, an end-to-end trained large multimodal model for general-purpose visual and language understanding. (paper | tweet)
8). Overview of ChatGPT applications, opportunities, and threats. (paper | tweet)
9). Chameleon - a plug-and-play compositional reasoning framework that augments LLMs and can infer the appropriate sequence of tools to compose and execute in order to generate final responses; achieves 87% accuracy on ScienceQA and 99% on TabMWP. (paper | tweet)
10). Video Latent Diffusion Models - applies latent diffusion models to high-resolution video generation; validates the model on creative content creation and real driving videos of 512 x 1024 and achieves state-of-the-art performance. (paper | tweet)