NLP Newsletter

Share this post

🥇Top ML Papers of the Week

nlp.elvissaravia.com

🥇Top ML Papers of the Week

The top ML Papers of the Week (Mar 13 - Mar 19)

elvis
Mar 19, 2023
5
1
Share
Share this post

🥇Top ML Papers of the Week

nlp.elvissaravia.com

1). GPT-4 - a large multimodal model with broader general knowledge and problem-solving abilities. (paper)

Twitter avatar for @OpenAI
OpenAI @OpenAI
Announcing GPT-4, a large multimodal model, with our best-ever results on capabilities and alignment: openai.com/product/gpt-4
5:00 PM ∙ Mar 14, 2023
65,166Likes18,246Retweets

2). LERF (Language Embedded Radiance Fields) - a method for grounding language embeddings from models like CLIP into NeRF; this enables open-ended language queries in 3D. (paper)

Twitter avatar for @_akhaliq
AK @_akhaliq
LERF: Language Embedded Radiance Fields TL;DR: Grounding CLIP vectors volumetrically inside a NeRF allows flexible natural language queries in 3D abs: arxiv.org/abs/2303.09553 project page: lerf.io
1:37 AM ∙ Mar 17, 2023
1,074Likes244Retweets

3). An Overview of Language Models - an overview of language models covering recent developments and future directions. It also covers topics like linguistic units, structures, training methods, evaluation, and applications. (paper)

Twitter avatar for @omarsar0
elvis @omarsar0
An Overview of Language Models Nice overview of language models covering recent developments and future directions. It also covers topics like linguistic units, structures, training methods, evaluation, and applications. arxiv.org/abs/2303.05759
Image
1:36 PM ∙ Mar 13, 2023
344Likes70Retweets

4). Tuned Lens - a method for transformer interpretability that can trace a language model predictions as it develops layer by layer. (paper)

Twitter avatar for @norabelrose
Nora Belrose @norabelrose
Ever wonder how a language model decides what to say next? Our method, the tuned lens (arxiv.org/abs/2303.08112), can trace an LM’s prediction as it develops from one layer to the next. It's more reliable and applies to more models than prior state-of-the-art. 🧵
Image
6:13 PM ∙ Mar 15, 2023
845Likes164Retweets

5). MIM (Meet in the Middle) - a new pre-training paradigm using techniques that jointly improve training data efficiency and capabilities of LMs in the infilling task; performance improvement is shown in code generation tasks. (paper)

Twitter avatar for @WeizhuChen
Weizhu Chen @WeizhuChen
Meet In the Middle (MIM) : A New Pretraining Paradigm. MIM(2.7B) outperforms CodeGen 16B, Incoder 6.7B, PaLM 540B, LLaMA 65B, FIM 2.7B in Code generation tasks. Read arxiv.org/abs/2303.07295 to know why MIM could be a new pre-training paradigm for left-to-right and infilling LMs.
Image
4:30 AM ∙ Mar 14, 2023
390Likes71Retweets

6). Resurrecting RNNs - demonstrates that careful design of deep RNNs using standard signal propagation argument can recover the performance of deep state-space models on long-range reasoning tasks. (paper)

Twitter avatar for @arankomatsuzaki
Aran Komatsuzaki @arankomatsuzaki
Resurrecting Recurrent Neural Networks for Long Sequences Shows that careful design of deep RNNs performs on par with SSMs on long-range reasoning tasks with comparable speed. arxiv.org/abs/2303.06349
Image
1:30 AM ∙ Mar 14, 2023
352Likes67Retweets

7). Universal Prompt Retrieval - a new approach to tune a lightweight and versatile retriever to automatically retrieve prompts to improve zero-shot performance and help mitigate hallucinations. (paper)

Twitter avatar for @johnjnay
John Nay @johnjnay
Universal Prompt Retrieval for LLMs -Cross-task & cross-model -Tune small model to retrieve prompts for tasks on small frozen GPT-Neo -Test on unseen task types on much larger LLMs (OPT, GPT3) -Improves zero-shot perf & mitigates hallucinations Paper: arxiv.org/abs/2303.08518
Image
1:25 AM ∙ Mar 16, 2023
328Likes75Retweets

8). Patches Are All You Need - proposes ConvMixer, a parameter-efficient fully-convolutional model which replaces self-attention and MLP layers in ViTs with less-expressive depthwise and pointwise convolutional layers. (paper)

Twitter avatar for @hardmaru
hardmaru @hardmaru
“Patches Are All You Need? 🤷” is published in TMLR! openreview.net/forum?id=rAnB7…
Image
3:11 PM ∙ Mar 15, 2023
439Likes59Retweets

9). NeRFMeshing - a compact and flexible architecture that enables easy 3D surface reconstruction from any NeRF-driven approach; distills NeRFs into geometrically-accurate 3D meshes. (paper)

Twitter avatar for @_akhaliq
AK @_akhaliq
NeRFMeshing: Distilling Neural Radiance Fields into Geometrically-Accurate 3D Meshes abs: arxiv.org/abs/2303.09431
Image
5:16 AM ∙ Mar 17, 2023
445Likes87Retweets

10). FlexGen - a high-throughput generation engine for running LLMs with limited GPU memory. (paper)

Twitter avatar for @omarsar0
elvis @omarsar0
FlexGen - a high-throughput generation engine for running LLMs with limited GPU memory. This is a big deal! Here is a quick overview of the paper:
Image
2:17 PM ∙ Mar 14, 2023
207Likes44Retweets
5
1
Share
Share this post

🥇Top ML Papers of the Week

nlp.elvissaravia.com
1 Comment
Asaf
Mar 19

Hi, thanks for this great newsletter!

It's worth mentioning that a concurrent work to #4 about interpretability of hidden layers was published just a few days later:

http://arxiv.org/abs/2303.09435

Expand full comment
Reply
Top
New
Community

No posts

Ready for more?

© 2023 elvis
Privacy ∙ Terms ∙ Collection notice
Start WritingGet the app
Substack is the home for great writing