1). GPT-4 - a large multimodal model with broader general knowledge and problem-solving abilities. (paper)
2). LERF (Language Embedded Radiance Fields) - a method for grounding language embeddings from models like CLIP into NeRF; this enables open-ended language queries in 3D. (paper)
3). An Overview of Language Models - an overview of language models covering recent developments and future directions. It also covers topics like linguistic units, structures, training methods, evaluation, and applications. (paper)
4). Tuned Lens - a method for transformer interpretability that can trace a language model predictions as it develops layer by layer. (paper)
5). MIM (Meet in the Middle) - a new pre-training paradigm using techniques that jointly improve training data efficiency and capabilities of LMs in the infilling task; performance improvement is shown in code generation tasks. (paper)
6). Resurrecting RNNs - demonstrates that careful design of deep RNNs using standard signal propagation argument can recover the performance of deep state-space models on long-range reasoning tasks. (paper)
7). Universal Prompt Retrieval - a new approach to tune a lightweight and versatile retriever to automatically retrieve prompts to improve zero-shot performance and help mitigate hallucinations. (paper)
8). Patches Are All You Need - proposes ConvMixer, a parameter-efficient fully-convolutional model which replaces self-attention and MLP layers in ViTs with less-expressive depthwise and pointwise convolutional layers. (paper)
9). NeRFMeshing - a compact and flexible architecture that enables easy 3D surface reconstruction from any NeRF-driven approach; distills NeRFs into geometrically-accurate 3D meshes. (paper)
10). FlexGen - a high-throughput generation engine for running LLMs with limited GPU memory. (paper)
Hi, thanks for this great newsletter!
It's worth mentioning that a concurrent work to #4 about interpretability of hidden layers was published just a few days later:
http://arxiv.org/abs/2303.09435