AI Dev and Research News
Posts
AI News: You Sing, I Play! Meet SingSong; Auditing LLMs: A 3-Layer Approach; Meet pix2pix-zero; AI Systems Can Optimize Their Own Code....

AI News: You Sing, I Play! Meet SingSong; Auditing LLMs: A 3-Layer Approach; Meet pix2pix-zero; AI Systems Can Optimize Their Own Code....

This newsletter brings AI research news that is much more technical than most resources but still digestible and applicable

ASIF RAZZAQ
February 20, 2023

Hi there, today we will share some research updates from Auditing LLMs: A 3-Layer Approach, Did you know that you can use Neural Radiance Fields (NeRFs) to take an optimal selfie?, You Sing, I Play! Meet SingSong, AI Systems Can Optimize Their Own Code, Meet pix2pix-zero, The Problem of Gender Presentation Differences in a Fine-Grained Pattern and many other cool updates. So, let's start...

Did you know that you can use Neural Radiance Fields (NeRFs) to take an optimal selfie? NeRFs have trouble forming 3D representations from noisy images (e.g., those captured on a mobile phone). Thus, it is needed to make NeRFs more selfie-compatible for this to work! To solve this problem, researchers from University of Washington augment a NeRF with a learnable deformation field, modeled by a feed-forward network. This allows the NeRF to more easily model/handle small variations and noise that exist between different images of the same scene. With this approach, the research group trains an accurate NeRF using a 20 second mobile phone video of a person’s face. The resulting NeRF, called a “Nerfie”, allows the subject to render different selfie images from an arbitrary number of different viewpoints

Source: https://nerfies.github.io

Auditing LLMs: A 3-Layer Approach: Researchers present research to bridge that gap by outlining a blueprint for auditing LLMs. They introduced a three-layered approach whereby governance, model, and application audits inform and complement each other. During governance audits, technology providers’ accountability structures and quality management systems are evaluated for robustness, completeness, and adequacy. During model audits, LLMs’ capabilities and limitations are assessed along several dimensions, including performance, robustness, information security, and truthfulness. Finally, during application audits, products and services built on top of LLMs are first assessed for legal compliance and subsequently evaluated based on their impact on users, groups, and the natural environment.

You Sing, I Play! Meet SingSong: An AI Model Capable of Generating Accompanying Music for Singing. SingSong is capable of accompanying your singing with correct instrumental music. Let’s say you just record yourself reciting the vocal, and you want to add some instrumental music to that recording. In this case, you can give your vocal to SingSong, which will generate the music for you, and you will have a song for yourself. SingSong became possible thanks to the advancement in two key fields of music technology. First is source separation which is used to separate vocals and instrumental sources in the music. This dataset is aligned in pairs of vocals and instrumental sources and is used for training. The second one is the generative modeling of audio. They use this model to go from vocals to instrumental music, i.e., conditional audio-to-audio generation.

AI Systems Can Optimize Their Own Code: A group of researchers from Google and CMU have presented a dataset containing pairs of (before, after) code optimizations, as well as methods for creating code optimizing LLMs. Their research focused on addressing issues such as enhancing code performance while maintaining the same code functionality, and surpassing traditional compiler optimizations. The researchers demonstrated that modern LLMs can genuinely enhance program runtime without altering behavior. In fact, OpenAI's Codex alone can improve the performance of around 45% of Python programs. The researchers' primary contribution is the PIE dataset, which comprises sequences of (original, optimized) code and includes unit tests to verify correctness. While the findings are promising, the dataset only applies to small programs of approximately 150 lines.

Meet pix2pix-zero: A Diffusion-Based Image-to-Image Translation Method that Allows Users to Specify the Edit Direction on-the-fly (e.g., Cat → Dog). This method can directly use pre-trained text-to-image diffusion models, such as Stable Diffusion, for editing real and synthetic images while preserving the input image's structure. pix2pix-zero method is training-free and prompt-free, as it requires neither manual text prompting for each input image nor costly fine-tuning for each task.

Source: https://pix2pixzero.github.io/

The Problem of Gender Presentation Differences in a Fine-Grained Pattern: Researchers from Georgia Tech, Stanford, Google, and CMU introduce a new metric called GEP that utilizes fine-grained self-presentation attributes to study how gender is presented differently in text-to-image models. To study gender biases in text-to-image models, prior works classify generated images into gender categories and measure biases using the relative gender frequencies. However, a person's gender should not be determined nor predicted solely by appearance. RQ: When probing different genders in the text input, how will text-to-image models alter the person's presence in the generated image? “The person's presence” refers to the presence of presentation-related attributes, such as whether the person is wearing “a shirt”. Based on the attribute set above, the GEP metric uses frequency differences on various presentation-centric attributes to summarize the difference between genders.

DRAGON: Meta researchers introduce Dense Retriever trained with diverse AuGmentatiON. It is the first BERT-base-sized dense retriever (DR) to achieve state-of-the-art effectiveness on both supervised and zero-shot evaluations. Despite significant recent progress in DR training, there's a somewhat overlooked trade-off between supervised and zero-shot evaluations. In most cases, existing methods improve the accuracy in one setting at the expense of the other. The research team show that this trade-off can be broken without increasing the model size or resorting to the more complex and expensive late interaction.

Yay! We were mentioned 🎉🎉 🤩

From @Marktechpost: a description of our latest work on Image Understanding Through Contextual Phrase Detection, by a team from NYU consisting of @ashkamath20, Sara Price, Jonas Pfeiffer, me, and @alcinos26.
— Yann LeCun (@ylecun)
2:53 PM • Feb 19, 2023

Do You Know Marktechpost has 1.5 Million+ Page views per month and 500,000 AI Community members?