• AI Research Insights
  • Posts
  • AI News: 🚀 The first Open-Source Text2video 1.7 billion parameter diffusion model has been released; Runway announces Gen-2; How Does A Language Model Decide What To Say Next?....

AI News: 🚀 The first Open-Source Text2video 1.7 billion parameter diffusion model has been released; Runway announces Gen-2; How Does A Language Model Decide What To Say Next?....

This newsletter brings AI research news that is much more technical than most resources but still digestible and applicable

The first Open-Source Text2video 1.7 billion parameter diffusion model has been released, and you can play with it now at HuggingFace. ModelScope is built upon the notion of “Model-as-a-Service” (MaaS). It seeks to bring together most advanced machine learning models from the AI community, and streamlines the process of leveraging AI models in real-world applications. The core ModelScope library open-sourced in this repository provides the interfaces and implementations that allow developers to perform model inference, training and evaluation.

Web-scale data has driven incredible progress in AI, but do we really need all that data? Meet SemDeDup: an exceedingly simple method to remove semantic duplicates in web data that can reduce the LAION dataset (& train time) by 2x w/ minimal performance loss. The research group consists of people from Meta and Stanford, and, they show that while exact deduplication is applied to many datasets such as LAION, these approaches miss tons of "semantic duplicates": duplicates that convey the same info but are not exactly identical. To perform SemDeDup, they embedded the dataset using a readily available pre-trained model, did k-means clustering, and then found neighbors below an appropriate threshold.

Runway announces Gen-2: A multimodal AI system that can generate realistic videos from the text. It's like filming something new without filming anything at all. Gen-2 offers several modes. (1) Mode 1: Text To Video. (2) Mode 2: Text + Image to Video (3) Mode 3: Image to Video (4) Mode 4: Stylization (5) Mode 5: Storyboard (6) Mode 6: Mask (7) Render (8) Mode 8: Customization.

A fascinating paper from OpenAI about the potential impact of LLMs on the job market. OpenAI researchers investigate the potential implications of Generative Pre-trained Transformer (GPT) models and related technologies on the U.S. labor market. Their findings indicate that approximately 80% of the U.S. workforce could have at least 10% of their work tasks affected by the introduction of GPTs, while around 19% of workers may see at least 50% of their tasks impacted.

How Does A Language Model Decide What To Say Next? This New AI Method Called Tuned Lens Can Trace A Language Model’s Prediction As It Develops From One Layer To The Next. Eleuther AI, FAR AI, Boston University, the University of Toronto, and UC Berkeley collaborated on a study that uses an iterative inference lens to transform representations. Each layer of a transformer language model is considered to enhance a latent prediction of the next token by a small amount. Using early exiting, the researchers decode these hidden predictions by mapping the burden state at each intermediate layer onto a vocabulary-wide distribution. The resulting distribution sequence, called the prediction trajectory, has a high probability of smoothly converging to the output distribution as the number of hidden layers increases and the complexity decreases.

CoLT5: Google Researchers propose CoLT5, a long-input Transformer model that builds on this intuition by employing conditional computation, devoting more resources to important tokens in both feedforward and attention layers. They show that CoLT5 achieves stronger performance than LongT5 with much faster training and inference, achieving SOTA on the long-input SCROLLS benchmark. Moreover, CoLT5 can effectively and tractably make use of extremely long inputs, showing strong gains up to 64k input length.

Do You Know Marktechpost has a community of 1.5 Million+ AI Professionals and Engineers?