NLP Newsletter: Detecting AI-Generated Text, Text-to-4D, ML Papers Explained, MusicLM,...
Detecting AI-Generated Text, Text-to-4D, ML Papers Explained, MusicLM,...
Hi all. Welcome back to our regular NLP Newsletter issue. I am excited to relaunch the newsletter to keep you informed on the latest in NLP and ML.
Detecting AI-Generated Text
Given how popular text-based generative applications have gotten over the past few months, several tools and frameworks have emerged that help detect content generated by language models. A few recent efforts include:
WaterMarking of LLMs
Kirchenbauer et al. recently proposed a new watermarking framework for proprietary language models. While watermarking can be algorithmically detected, it’s important that it doesn’t affect the text quality.
Ideally, the watermark can be detected algorithmically without access to the LLM API, leading to the possibility of open-sourcing the detection algorithm and reducing costs that could emerge from loading or running the models.
The basic idea of the paper is to use the concept of whitelisting and blacklisting to restrict the model's next token output. The watermark can be detected by counting the whitelist tokens (which are tokens the LLM is allowed to use). This is an effective approach that works well along with other design tricks, even for short text.
DetectGPT
DetectGPT is an approach (by Mitchell et al.) for zero-shot machine-generated text detection. Unlike other methods that require classifiers or watermarking generated text, this work uses raw log probabilities from the LLM to determine if the passage was sampled from it.
As demonstrated in the diagram above, DetectGPT compares the log probability under p of the original sample x with the perturbations obtained from a pre-trained model like T5.
DetectGPT improved the detection of fake news articles generated by a 20B parameter GPT-NeoX from 0.81 AUROC (strongest zero-shot baseline) to 0.95 AUROC.
GPTZero
GPTZero is a platform that helps detect AI plagiarism. The system is based on properties like perplexity and burstiness of text.
AI Text Classifier by Open AI
More recently, OpenAI also released a new tool to distinguish between AI-written and human-written text. Try the classifier here.
All these techniques also come with their disadvantages. For instance, DetectGPT relies on the outputs of an LLM that may not be representative. Watermarking requires a strong algorithm that’s robust to potential attacks (e.g., text insertion, generative attacks, etc.) that aim to avoid detection.
We will keep a close eye on related developments and report progress as this becomes an important consideration when developing on top of LLMs.
This issue is brought to you by Monster API. I recently tried Monster API, a new platform based on decentralized computing offering generative AI models as a service. A word from them:
If you are building in the Generative AI space, you can relate with the pain of accessing cutting-edge ML models.
They are super expensive and often only available through centralized clouds like AWS or Google cloud. But not anymore, Monster API gives you access to top-notch models, like DreamBooth, Stable Diffusion, and ChatGPT alternatives, through easy to use & scalable APIs powered by the disruptive force of decentralized computing. And for a limited time, we're giving you the chance to experience this game-changer for yourself with 5000 free API calls for the first 100 users.
This is the future of cloud computing, and it's available today, at a fraction of the cost. Unleash the full potential of your projects and change the world with Monster API.
Text-to-4D
A new model called Make-A-Video3D (by Meta AI) is trained to generate 3D dynamic scenes from input text descriptions. This follows previous efforts such as Make-A-Scene and Make-A-Video.
The approach incorporates a 4D dynamic Neural Radiance Field (NeRF), optimized for scene appearance, density, and motion consistency by querying a Text-to-Video diffusion model. No 4D or 3D data is required. The Text-to-Video model is trained only on text-image pairs and unlabeled videos. Find interactive examples here.
Super-resolution fine-tuning is also used to improve the resolution of the model. The authors claim that this is the first AI system to generate 3D dynamic scenes given a text description.
Here is a nice Twitter thread by Jim Fan on some of the recent milestones in generative AI:
MusicLM
Google Research introduces a new model, MusicLM, for generating high-fidelity music (24 kHz) from text descriptions. The system can be conditioned on both text and melody. The model can generate coherent music up to 5 minutes long.
They also release a new evaluation dataset, MusicCaps, consisting of 5.5k high-quality music captions written by musicians.
The field is moving so fast that there is already another method that can also perform text-to-music generation with long-context latent diffusion. This approach, called Moûsai, can generate high-quality stereo at 48kHz from textual descriptions. Here is the open-source PyTorch-based library and samples to explore.
Audio generation continues to get better but approaches are not as developed as in other areas like image and text generation. This repository contains a nice list of some of the latest AI models for audio generation.
Notable Mentions
This section includes notable mentions of other trending ML resources and papers.
Top ML Papers of the Week - every week we will be publishing a recap of the top trending ML papers. You can also keep track via Twitter or LinkedIn
A new foundation model, ClimaX, for weather and climate
InstructPix2Pix is a method with the capability of editing images from human instructions
LeCun argues that ChatGPT is “not particularly innovative”
If you are interested in sponsoring a future newsletter issue, reach out at ellfae@gmail.com or Twitter.