- AI Research Insights
- Posts
- 🚀 AI News: Trending AI Research + Trending AI Tools.. (Aug 15, 2023 Edition)
🚀 AI News: Trending AI Research + Trending AI Tools.. (Aug 15, 2023 Edition)
This newsletter brings AI research news that is much more technical than most resources but still digestible and applicable
🔥 Trending AI Research: Let’s learn something new from the trending papers.
🛎️ Trending Tools: Check out some cool AI tools picked up by our editorial team.
Read Time: 3 Minutes
🔥Trending AI Research
Large language models (LLMs) excel in numerous NLP tasks but sometimes exhibit unwanted behaviors like hallucination, unfaithful reasoning, and toxicity. To address these issues, self-correction methods, where the LLM fixes its own mistakes, are gaining traction. Particularly, techniques using automated feedback, either from the LLM itself or an external system, seem promising for enhancing the reliability of LLMs with less human intervention. This paper offers an in-depth review of such techniques, categorizing them into training-time, generation-time, and post-hoc correction strategies. The study further highlights key applications of these techniques and concludes by addressing potential future directions and challenges in this domain.
2️⃣ ChatGPT with Eyes and Ears: BuboGPT is an AI Approach That Enables Visual Grounding in Multi-Modal LLMs [Blog] [Paper]
LLMs (Language Learning Models) have made significant strides in human interaction using language, with newer models like MiniGPT-4, LLaVA, and X-LLM integrating multi-modal inputs such as images, videos, and speech. However, these models lack the capability to pinpoint specific parts of their inputs, resulting in a broad understanding. Addressing this limitation, this paper introduces BuboGPT, a multi-modal LLM that has visual grounding. This model can interact cross-modally with vision, audio, and language, allowing a detailed understanding of visual elements and can specify the exact location of objects within images during its responses. The research offers two main innovations: 1) A ready-made visual grounding feature using SAM that can link entities in text to areas in images. 2) A novel training method with an instruction dataset for synchronized text, image, and audio comprehension. Testing indicates that BuboGPT excels in multi-modal comprehension and visual grounding during human interactions.
3️⃣ Master Key to Audio Source Separation: Introducing AudioSep to Separate Anything You Describe [Paper] [Blog]
Language-queried audio source separation (LASS) is a technique that separates target sounds from audio mixtures based on natural language queries, offering a flexible interface for digital audio tasks. However, current LASS methods face limitations when dealing with broad audio concepts. This paper presents AudioSep, an innovative model designed for open-domain audio source separation using natural language. Trained on expansive multimodal datasets, AudioSep is extensively tested on various tasks such as separating audio events, musical instruments, and enhancing speech. The model not only showcases exceptional separation performance but also excels in zero-shot generalization when using audio captions or text labels as queries. Compared to earlier models, AudioSep significantly surpasses their capabilities.
4️⃣ Researchers at Boston University Release the Platypus Family of Fine-Tuned LLMs: To Achieve Cheap, Fast and Powerful Refinement of Base LLMs [Paper] [Blog]
The paper introduces "Platypus," a set of highly-performing Large Language Models (LLMs) which lead HuggingFace's Open LLM Leaderboard at the time of its release. The study details: (1) "Open-Platypus," a curated dataset drawn from other open sources and made publicly available; (2) a methodology for fine-tuning and integrating LoRA modules, maintaining the efficiency of pre-trained LLMs and emphasizing specific domain knowledge; and (3) a protocol to identify and mitigate test data leaks and training data contamination. Remarkably, Platypus models outperform competitors on the LLM leaderboard while using significantly less fine-tuning data and computational power. For instance, a 13B Platypus model can be trained using just 25k questions in 5 hours on a single A100 GPU. This underscores the effectiveness of the Open-Platypus dataset and suggests potential for further advancements in the domain.