- AI Research Insights
- Posts
- AI News: DetectGPT from Stanford; Electrostatic Halftoning; LLMs continue to improve; ChatGPT for Biology
AI News: DetectGPT from Stanford; Electrostatic Halftoning; LLMs continue to improve; ChatGPT for Biology
Hi there, today we will share some research updates from DetectGPT from Stanford, Electrostatic Halftoning, Comparisons suggest as LLMs continue to improve, Google introduce MusicML, ChatGPT for Biology, A Watermark for Large Language Models, and some bonus cool AI tools. So, let's start...
DetectGPT from Stanford: LLMs like ChatGPT are becoming more fluent – how can we detect if a language model or a human wrote something? Stanford researchers propose DetectGPT: a method for detecting if a particular language model wrote a passage. DetectGPT measures the probability that a model assigns to the written text and compares that to the prob it assigns to a modification of the text. If the probability of the original is much higher than the modified text, it’s likely generated by the model.
Comparisons suggest as LLMs continue to improve: OpenAI latest LLM 78% accuracy while previous release 73% accuracy. A model from their 2020 GPT-3 paper 27% accuracy (worse than random)
Electrostatic Halftoning: An AI Approach Based on Physical Principles of Electrostatics for Image Dithering, Stippling, Screening, and Sampling. In a nutshell, the algorithm consists of an initialization stage that includes precalculating the electrostatic force caused by the input image by imagining a test charge and moving it to every grid point. Bilinear interpolation computations for all particles are performed repeatedly once the initialization phase is finished until the system converges. The team’s methodology achieves a smaller approximation error under Gaussian convolution than existing state-of-the-art methodologies and exhibits favorable blue-noise behavior in the frequency domain.
Google: Researchers from Google introduce MusicLM, a model for generating high-fidelity music from text descriptions such as “a calming violin melody backed by a distorted guitar riff”. MusicLM casts the process of conditional music generation as a hierarchical sequenceto-sequence modeling task, and it generates music at 24 kHz that remains consistent over several minutes.
ChatGPT for Biology: Deep-learning language models have shown promise in various biotechnological applications, including protein design and engineering. Here the research group describe ProGen, a language model that can generate protein sequences with a predictable function across large protein families, akin to generating grammatically and semantically correct natural language sentences on diverse topics. The model was trained on 280 million protein sequences from >19,000 families and is augmented with control tags specifying protein properties.
SlotFormer: Understanding dynamics from visual observations is a challenging problem that requires disentangling individual objects from the scene and learning their interactions. While recent object-centric models can successfully decompose a scene into objects, modeling their dynamics effectively still remains a challenge. Researchers from the University of Toronto, Vector Institute, Samsung AI, and Google Research address this problem by introducing SlotFormer -- a Transformer-based autoregressive model operating on learned object-centric representations.
A Watermark for Large Language Models: Researchers propose a watermarking framework for proprietary language models. The watermark can be embedded with negligible impact on text quality, and can be detected using an efficient open-source algorithm without access to the language model API or parameters.