🥇Top ML Papers of the Week

The top ML Papers of the Week (Mar 13 - Mar 19)

Mar 19, 2023

1). GPT-4 - a large multimodal model with broader general knowledge and problem-solving abilities. (paper)

Announcing GPT-4, a large multimodal model, with our best-ever results on capabilities and alignment: openai.com/product/gpt-4

2). LERF (Language Embedded Radiance Fields) - a method for grounding language embeddings from models like CLIP into NeRF; this enables open-ended language queries in 3D. (paper)

AK @_akhaliq

LERF: Language Embedded Radiance Fields TL;DR: Grounding CLIP vectors volumetrically inside a NeRF allows flexible natural language queries in 3D abs: arxiv.org/abs/2303.09553 project page: lerf.io

3). An Overview of Language Models - an overview of language models covering recent developments and future directions. It also covers topics like linguistic units, structures, training methods, evaluation, and applications. (paper)

elvis @omarsar0

An Overview of Language Models Nice overview of language models covering recent developments and future directions. It also covers topics like linguistic units, structures, training methods, evaluation, and applications. arxiv.org/abs/2303.05759

4). Tuned Lens - a method for transformer interpretability that can trace a language model predictions as it develops layer by layer. (paper)

Nora Belrose @norabelrose

Ever wonder how a language model decides what to say next? Our method, the tuned lens (arxiv.org/abs/2303.08112), can trace an LM’s prediction as it develops from one layer to the next. It's more reliable and applies to more models than prior state-of-the-art. 🧵

5). MIM (Meet in the Middle) - a new pre-training paradigm using techniques that jointly improve training data efficiency and capabilities of LMs in the infilling task; performance improvement is shown in code generation tasks. (paper)

Weizhu Chen @WeizhuChen

Meet In the Middle (MIM) : A New Pretraining Paradigm. MIM(2.7B) outperforms CodeGen 16B, Incoder 6.7B, PaLM 540B, LLaMA 65B, FIM 2.7B in Code generation tasks. Read arxiv.org/abs/2303.07295 to know why MIM could be a new pre-training paradigm for left-to-right and infilling LMs.

6). Resurrecting RNNs - demonstrates that careful design of deep RNNs using standard signal propagation argument can recover the performance of deep state-space models on long-range reasoning tasks. (paper)

Aran Komatsuzaki @arankomatsuzaki

Resurrecting Recurrent Neural Networks for Long Sequences Shows that careful design of deep RNNs performs on par with SSMs on long-range reasoning tasks with comparable speed. arxiv.org/abs/2303.06349

7). Universal Prompt Retrieval - a new approach to tune a lightweight and versatile retriever to automatically retrieve prompts to improve zero-shot performance and help mitigate hallucinations. (paper)

John Nay @johnjnay

Universal Prompt Retrieval for LLMs -Cross-task & cross-model -Tune small model to retrieve prompts for tasks on small frozen GPT-Neo -Test on unseen task types on much larger LLMs (OPT, GPT3) -Improves zero-shot perf & mitigates hallucinations Paper: arxiv.org/abs/2303.08518

8). Patches Are All You Need - proposes ConvMixer, a parameter-efficient fully-convolutional model which replaces self-attention and MLP layers in ViTs with less-expressive depthwise and pointwise convolutional layers. (paper)

hardmaru @hardmaru

“Patches Are All You Need? 🤷” is published in TMLR! openreview.net/forum?id=rAnB7…

9). NeRFMeshing - a compact and flexible architecture that enables easy 3D surface reconstruction from any NeRF-driven approach; distills NeRFs into geometrically-accurate 3D meshes. (paper)

AK @_akhaliq

NeRFMeshing: Distilling Neural Radiance Fields into Geometrically-Accurate 3D Meshes abs: arxiv.org/abs/2303.09431

10). FlexGen - a high-throughput generation engine for running LLMs with limited GPU memory. (paper)

elvis @omarsar0

FlexGen - a high-throughput generation engine for running LLMs with limited GPU memory. This is a big deal! Here is a quick overview of the paper:

AI Newsletter

Discussion about this post