AI Dev and Research News
Posts
🚀 AI News: Trending AI Research + Trending AI Tools.. (Aug 15, 2023 Edition)

🚀 AI News: Trending AI Research + Trending AI Tools.. (Aug 15, 2023 Edition)

This newsletter brings AI research news that is much more technical than most resources but still digestible and applicable

ASIF RAZZAQ
August 15, 2023

🔥 Trending AI Research: Let’s learn something new from the trending papers.

🛎️ Trending Tools: Check out some cool AI tools picked up by our editorial team.

Read Time: 3 Minutes

🔥Trending AI Research

1️⃣ Enhancing Large Language Models (LLMs) Through Self-Correction Approaches [Paper] [Blog]

Large language models (LLMs) excel in numerous NLP tasks but sometimes exhibit unwanted behaviors like hallucination, unfaithful reasoning, and toxicity. To address these issues, self-correction methods, where the LLM fixes its own mistakes, are gaining traction. Particularly, techniques using automated feedback, either from the LLM itself or an external system, seem promising for enhancing the reliability of LLMs with less human intervention. This paper offers an in-depth review of such techniques, categorizing them into training-time, generation-time, and post-hoc correction strategies. The study further highlights key applications of these techniques and concludes by addressing potential future directions and challenges in this domain.

2️⃣ ChatGPT with Eyes and Ears: BuboGPT is an AI Approach That Enables Visual Grounding in Multi-Modal LLMs [Blog] [Paper]

LLMs (Language Learning Models) have made significant strides in human interaction using language, with newer models like MiniGPT-4, LLaVA, and X-LLM integrating multi-modal inputs such as images, videos, and speech. However, these models lack the capability to pinpoint specific parts of their inputs, resulting in a broad understanding. Addressing this limitation, this paper introduces BuboGPT, a multi-modal LLM that has visual grounding. This model can interact cross-modally with vision, audio, and language, allowing a detailed understanding of visual elements and can specify the exact location of objects within images during its responses. The research offers two main innovations: 1) A ready-made visual grounding feature using SAM that can link entities in text to areas in images. 2) A novel training method with an instruction dataset for synchronized text, image, and audio comprehension. Testing indicates that BuboGPT excels in multi-modal comprehension and visual grounding during human interactions.

3️⃣ Master Key to Audio Source Separation: Introducing AudioSep to Separate Anything You Describe [Paper] [Blog]

Language-queried audio source separation (LASS) is a technique that separates target sounds from audio mixtures based on natural language queries, offering a flexible interface for digital audio tasks. However, current LASS methods face limitations when dealing with broad audio concepts. This paper presents AudioSep, an innovative model designed for open-domain audio source separation using natural language. Trained on expansive multimodal datasets, AudioSep is extensively tested on various tasks such as separating audio events, musical instruments, and enhancing speech. The model not only showcases exceptional separation performance but also excels in zero-shot generalization when using audio captions or text labels as queries. Compared to earlier models, AudioSep significantly surpasses their capabilities.

4️⃣ Researchers at Boston University Release the Platypus Family of Fine-Tuned LLMs: To Achieve Cheap, Fast and Powerful Refinement of Base LLMs [Paper] [Blog]

The paper introduces "Platypus," a set of highly-performing Large Language Models (LLMs) which lead HuggingFace's Open LLM Leaderboard at the time of its release. The study details: (1) "Open-Platypus," a curated dataset drawn from other open sources and made publicly available; (2) a methodology for fine-tuning and integrating LoRA modules, maintaining the efficiency of pre-trained LLMs and emphasizing specific domain knowledge; and (3) a protocol to identify and mitigate test data leaks and training data contamination. Remarkably, Platypus models outperform competitors on the LLM leaderboard while using significantly less fine-tuning data and computational power. For instance, a 13B Platypus model can be trained using just 25k questions in 5 hours on a single A100 GPU. This underscores the effectiveness of the Open-Platypus dataset and suggests potential for further advancements in the domain.

🛎️ Trending Tools

Taplio: AI Tool for LinkedIn Automation, Productivity and Content Posting

Pecan AI: Low-code predictive modeling platform for better business and sales decisions.

Otter AI: AI Tool for Meeting recordings, transcripts and productivity

AdCreative.ai: AI Tool for social media marketing, ad planning and automation with 10x productivity.

Hostinger AI Website Builder: an intuitive interface combined with advanced AI capabilities, designed for crafting websites for any purpose

AiPassportPhotos: Discover AiPassportPhotos: Snap perfect passport and visa shots online. Powered by cutting-edge Al-Tech, it's your at-home photo booth.

SaneBox: AI Tool for Email optimization and productivity 10x

Avian: AI tool where you can link up your information with ChatGPT, Google Sheets, and Looker Studio for seamless insights.

Notion: A connected workspace where you can work on your projects and tasks using AI with 10x productivity

Quizgecko: An AI-powered online test and quiz maker.

Threado AI: AI Platform for creating your custom trained AI in minutes and installation on Notion, Slack etc. for instant conversation.

🚀 Check Out 1000+ AI Tools in AI Tools Club