• AI Research Insights
  • Posts
  • AI News Thread 🚨🧵: Text-to-4D (3D+time), EchoNet-Peds Database, ConceptFusion, Noise2Music, Are the largest NLP models always the most truthful?.....

AI News Thread 🚨🧵: Text-to-4D (3D+time), EchoNet-Peds Database, ConceptFusion, Noise2Music, Are the largest NLP models always the most truthful?.....

Hi there, today we will share some cool research updates like Text-to-4D (3D+time), EchoNet-Peds Database, ConceptFusion, Noise2Music, Are the largest NLP models always the most truthful? and many other cool updates. So, let's start...

EchoNet-Peds Database: Researchers at Stanford release a large pediatric echocardiography video dataset for computer vision research. The EchoNet-Peds database includes 7,643 labeled echocardiogram videos and human expert annotations (measurements, tracings, and calculations) to provide a baseline to study cardiac motion and chamber sizes. The database includes patients ranging from 0-18 years (43% female) with a wide range of sizes.

🚨 Text-to-4D (3D+time): Meta AI Proposes a Novel System for Text-to-4D (3D+time) Generation by Combining the Benefits of Video and 3D Generative Models. They names it MAV3D (Make-A-Video3D) and it is a method for generating three-dimensional dynamic scenes from text descriptions. The dynamic video output generated from the provided text can be viewed from any camera location and angle and can be composited into any 3D environment.

ConceptFusion: The research, which involves collaboration between researchers from various institutions including MIT, Université de Montréal, and Amazon, focuses on developing a new approach for building 3D maps of environments for use in robot navigation, interaction, and planning. This approach, called ConceptFusion, addresses the limitations of existing methods by providing an open-set representation that enables reasoning beyond a pre-defined set of concepts and multi-modal queries using various forms of input including text, images, audio, and 3D geometry. The resulting 3D maps can be queried using these inputs in concert.

Noise2Music: Google researchers propose introducing Noise2Music, where a series of diffusion models is trained to generate high-quality 30-second music clips from text prompts. Two types of diffusion models, a generator model, which generates an intermediate representation conditioned on text, and a cascader model, which generates high-fidelity audio conditioned on the intermediate representation and possibly the text, are trained and utilized in succession to generate high-fidelity music. This work has the potential to grow into a useful tool for artists and content creators that can further enrich their creative pursuits.

Are the largest NLP models always the most truthful?: The answer is no! This research test from the University of Oxford and Open AI shows that large NLP models like GPT-3 have only 58% accuracy compared to humans at 94%. Models generate many false answers that mimic popular misconceptions and have the potential to deceive humans. As per this research, the largest models were generally the least truthful. This contrasts with other NLP tasks, where performance improves with model size.

Offsite-Tuning: In this paper, MIT researchers propose Offsite-Tuning. This privacy-preserving and efficient transfer learning framework can adapt billion-parameter foundation models to downstream data without access to the full model. In offsite tuning, the model owner sends a lightweight adapter and a lossy compressed emulator to the data owner, who then fine-tunes the adapter on the downstream data with the emulator’s assistance. Offsitetuning achieves comparable accuracy as full model fine-tuning while being privacy-preserving and efficient, gaining 6.5x speedup and 5.6x memory reduction.

UC Berkeley Research: They propose Hindsight Instruction Relabeling (HIR), a novel algorithm for aligning language models with instructions. The resulting two-stage algorithm shed light to a family of reward-free approaches that utilize the hindsightly relabeled instructions based on feedback. When analyzed, researchers found that the performance of HIR extensively on 12 challenging BigBench reasoning tasks and show that HIR outperforms the baseline algorithms and is comparable to or even surpasses supervised finetuning.

SWARM Parallelism For Training Deep Learning Models: This paper explores alternative methods for training large deep-learning models with billions of parameters, which are typically expensive to train due to the need for specialized HPC clusters. The authors analyze the performance of existing model-parallel algorithms using cheap "preemptible" instances or pooling resources from multiple regions. Based on their findings, the authors propose a new model-parallel training algorithm called SWARM parallelism, which is designed for poorly connected, heterogeneous, and unreliable devices. The key advantages of SWARM parallelism include being Fault-tolerant, self-balancing on slow GPUs/networks, and it works in low-bandwidth setups.

Can CNNs learn the spatial relation between features for object recognition?: The research team developed a feature-scrambling approach combined with models of different effective receptive fields to test the capacity of CNNs to learn the spatial relations between features for object recognition. They found that CNNs exploit the spatial arrangement of features to build more coarse-grained features that are more reliable for object classification. However, this capacity varied according to the dataset and the class within the same dataset. However, using minimal Minimal recognizable configurations (MIRCs) analysis, the researchers noticed that CNNs employ the spatial configuration of features to build more coarse-grained features only up to an intermediate degree of granularity and do not exploit the global shape of objects.

Do You Know Marktechpost has 1.8 Million+ Pageviews per month and 500,000 AI Community members?

Want to support us?