1). CogAgent - presents an 18 billion parameter visual language model specializing in GUI understanding and navigation; supports high-resolution inputs (1120x1120) and shows abilities in tasks such as visual Q&A, visual grounding, and GUI Agent; achieves state of the art on 5 text-rich and 4 general VQA benchmarks. (paper | tweet)
2). From Gemini to Q-Star - surveys 300+ papers and summarizes research developments to look at in the space of Generative AI; it covers computational challenges, scalability, real-world implications, and the potential for Gen AI to drive progress in fields like healthcare, finance, and education. (paper | tweet)
3). PromptBench - a unified library that supports comprehensive evaluation and analysis of LLMs; it consists of functionalities for prompt construction, prompt engineering, dataset and model loading, adversarial prompt attack, dynamic evaluation protocols, and analysis tools. (paper | tweet)
4). Exploiting Novel GPT-4 APIs - performs red-teaming on three functionalities exposed in the GPT-4 APIs: fine-tuning, function calling, and knowledge retrieval; Main findings: 1) fine-tuning on as few as 15 harmful examples or 100 benign examples can remove core safeguards from GPT-4, 2) GPT-4 Assistants divulge the function call schema and can be made to execute arbitrary function calls, and 3) knowledge retrieval can be hijacked by injecting instructions into retrieval documents. (paper | tweet)
5). Fact Recalling in LLMs - investigates how MLP layers implement a lookup table for factual recall; scopes the study on how early MLPs in Pythia 2.8B look up which of 3 different sports various athletes play; suggests that early MLP layers act as a lookup table and recommends thinking about the recall of factual knowledge in the model as multi-token embeddings. (paper | tweet)
6). Generative AI for Math - presents a diverse and high-quality math-centric corpus comprising of ~9.5 billion tokens to train foundation models. (paper | tweet)
7). Pricipled Instructions Are All You Need - introduces 26 guiding principles designed to streamline the process of querying and prompting large language models; applies these principles to conduct extensive experiments on LLaMA-1/2 (7B, 13B and 70B), GPT-3.5/4 to verify their effectiveness on instructions and prompts design. (paper | tweet)
8). A Survey of Reasoning with Foundation Models - provides a comprehensive survey of seminal foundational models for reasoning, highlighting the latest advancements in various reasoning tasks, methods, benchmarks, and potential future directions; also discusses how other developments like multimodal learning, autonomous agents, and super alignment accelerate and extend reasoning research. (paper | tweet)
9). Making LLMs Better at Dense Retrieval - proposes LLaRA which adapts an LLM for dense retrieval; it consists of two pretext tasks: EBAE (Embedding-Based Auto-Encoding) and EBAR (Embedding-Based Auto-Regression), where the text embeddings from LLM are used to reconstruct the tokens for the input sentence and predict the tokens for the next sentence, respectively; a LLaMa-2-7B was improved on benchmarks like MSMARCO and BEIR. (paper)
10). Gemini vs GPT-4V - provides a comprehensive preliminary comparison and combination of vision-language models like Gemini and GPT-4V through several qualitative cases; finds that GPT-4V is precise and succinct in responses, while Gemini excels in providing detailed, expansive answers accompanied by relevant imagery and links. (paper | tweet)
Interesting stuff,
One that catch my attention is PromptBench, still new development phase. will watch regularly