1). Segment Anything Model - presents a set of resources to establish foundational models for image segmentation; releases the largest segmentation dataset with over 1 billion masks on 11M licensed images; the model’s zero-shot performance is competitive with or even superior to fully supervised results. (paper)
2). Instruction Tuning with GPT-4 - presents GPT-4-LLM, a "first attempt" to use GPT-4 to generate instruction-following data for LLM fine-tuning; the dataset is released and includes 52K unique English and Chinese instruction-following data; the dataset is used to instruction-tune LLaMA models which leads to superior zero-shot performance on new tasks. (paper)
3). 8 Things to Know about LLMs - discusses important considerations regarding the capabilities and limitations of LLMs. (paper)
4). A Survey of LLMs - a new 50 pages survey on large language models. (paper)
5). Baize - an open-source chat model fine-tuned with LoRA. Leverages 100K dialogs generated from ChatGPT chatting with itself; it releases the dialogs along with 7B, 13B, and 30B parameter models. (paper)
6). MACHIAVELLI - a new benchmark of 134 text-based Choose-Your-Own-Adventure games to evaluate the capabilities and unethical behaviors of LLMs. (paper)
7). Better Language Models of Code through Self-Improvement - generates pseudo data from knowledge gained through pre-training and fine-tuning; adds the data to the training dataset for the next step; results show that different frameworks can be improved in performance using code-related generation tasks. (paper)
8). Summary of ChatGPT/GPT-4 Research - an overview of applications of ChatGPT and GPT-4; the analysis is done on 194 relevant papers and discusses capabilities, limitations, concerns, and more. (paper)
9). Pythia - a suite for analyzing LLMs across training and scaling; includes 16 LLMs trained on public data and ranging in size from 70M to 12B parameters. (paper)
10). SegGPT - unifies segmentation tasks into a generalist model through an in-context framework that supports different kinds of data. (paper)