🥇Top ML Papers of the Week

The top ML Papers of the Week (Mar 6 - Mar 12)

Mar 12, 2023

1). PaLM-E - incorporates real-world continuous sensor modalities resulting in an embodied LM that performs tasks such as robotic manipulation planning, visual QA, and other embodied reasoning tasks. (paper | demo)

Danny Driess @DannyDriess

What happens when we train the largest vision-language model and add in robot experiences? The result is PaLM-E 🌴🤖, a 562-billion parameter, general-purpose, embodied visual-language generalist - across robotics, vision, and language. Website: palm-e.github.io

2). Prismer - a parameter-efficient vision-language model powered by an ensemble of domain experts; it efficiently pools expert knowledge from different domains and adapts it to various vision-language reasoning tasks. (paper | code)

Jim Fan @DrJimFan

After ChatGPT, the future belongs to multimodal LLMs. What’s even better? Open-sourcing. Announcing Prismer, my team’s latest vision-language AI, empowered by domain-expert models in depth, surface normal, segmentation, etc. No paywall. No forms. shikun.io/projects/prism…… https://t.co/PYHEHzB5ZL

3). Visual ChatGPT - it connects ChatGPT and different visual foundation models to enable users to interact with ChatGPT beyond language format. (paper | code)

AK @_akhaliq

Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models build a system called Visual ChatGPT, incorporating different Visual Foundation Models, to enable the user to interact with ChatGPT by 1) sending and receiving not only languages arxiv.org/abs/2303.04671… https://t.co/9uSthMHwHn

4). A History of Generative AI - an overview of generative AI - from GAN to ChatGPT. (paper)

elvis @omarsar0

A History of Generative AI Wow! This is a nice overview of Generative AI - from GAN to ChatGPT. arxiv.org/abs/2303.04226 https://t.co/jZmRzzmcME

5). LLMs do In-Context Learning Differently - shows that with scale, LLMs can override semantic priors when presented with enough flipped labels; these models can also perform well when replacing targets with semantically-unrelated targets. (paper)

Jerry Wei @JerryWeiAI

New @GoogleAI paper: How do language models do in-context learning? arxiv.org/abs/2303.03846 Large language models (GPT-3.5, PaLM) can follow in-context exemplars, even if the labels are flipped or semantically unrelated. This ability wasn’t present in small language models. 1/

6). Foundation Models for Decision Making - provides an overview of foundation models for decision making, including tools, methods, and new research directions. (paper)

Sherry Yang @mengjiao_yang

Review paper on Foundation Models for Decision Making: arxiv.org/abs/2303.04129 Foundation models can characterize various components of decision making, such as states (S), behaviors (A), dynamics (T), task specifiers (R), through generative modeling or representation learning.

7). Hyena Hierarchy - a subquadratic drop-in replacement for attention by interleaving implicit long convolutions and data-controlled gating; it can learn on sequences 10x longer and up to 100x faster than optimized attention. (paper | code)

Michael Poli @MichaelPoli6

Attention is great. Are there other operators that scale? Excited to share our work on Hyena, an alternative to attn that can learn on sequences *10x longer*, up to *100x faster* than optimized attn, by using implicit long convolutions & gating 📜arxiv.org/abs/2302.10866 1/

8). OpenICL - a new open-source toolkit for in-context learning and LLM evaluation; supports various state-of-the-art retrieval and inference methods, tasks, and zero-/few-shot evaluation of LLMs. (paper | code)

elvis @omarsar0

OpenICL - a new open-source toolkit for in-context learning and LLM evaluation. Supports various state-of-the-art retrieval and inference methods, tasks, and zero-/few-shot evaluation of LLMs. repo: github.com/Shark-NLP/Open… paper: arxiv.org/abs/2303.02913

9). MathPrompter - a technique that improves LLM performance on mathematical reasoning problems; it uses zero-shot chain-of-thought prompting and verification to ensure generated answers are accurate. (paper)

John Nay @johnjnay

Math LLM Prompting 0-shot chain-of-thought prompting to generate multiple Algebraic expressions / Python functions to solve same math problem in diff ways -Raises confidence in output -Improves over SoTA on MultiArith dataset (78.7% → 92.5%) Paper: arxiv.org/abs/2303.05398

10). GigaGAN - a new architecture that enables scaling up GAN models to benefit from large datasets for text-to-image synthesis; it’s found to be orders of magnitude faster at inference time, can synthesize high-resolution images, and supports various latent space editing applications. (paper | demo)

Aran Komatsuzaki @arankomatsuzaki

Scaling up GANs for Text-to-Image Synthesis - Orders of magnitude faster at inference time - Can synthesize high-resolution images, for example, 16-megapixel pixels in 3.66 seconds. proj: mingukkang.github.io/GigaGAN/ abs: arxiv.org/abs/2303.05511

AI Newsletter

Discussion about this post