- AI Research Insights
- Posts
- AI News: 🚀 Stability AI launches StableLM | NVIDIA just released a very impressive text-to-video paper | Meta AI Open-Sources DINOv2 | Is ChatGPT Good at Search?.....
AI News: 🚀 Stability AI launches StableLM | NVIDIA just released a very impressive text-to-video paper | Meta AI Open-Sources DINOv2 | Is ChatGPT Good at Search?.....
This newsletter brings AI research news that is much more technical than most resources but still digestible and applicable
Stability AI launches StableLM: The creators of Stable Diffusion, Stability AI, just released a suite of open-sourced large language models (LLMs) called StableLM. This comes just 5 days after the public release of their text-to-image generative AI model, SDXL. The Alpha version of the model is available in 3 billion and 7 billion parameters, with 15 billion to 65 billion parameter models to follow. Developers can freely inspect, use, and adapt our StableLM base models for commercial or research purposes, subject to the terms of the CC BY-SA-4.0 license.
Meta AI Open-Sources DINOv2: A New AI Method for Training High-Performance Computer Vision Models Based on Self-Supervised Learning. DINOv2 is a novel approach to building high-performance computer vision models using self-supervised learning. DINOv2 provides the unsupervised learning of high-quality visual features that may be used for both visual tasks at the picture level and the pixel level. Image categorization, instance retrieval, video comprehension, depth estimation, and many more tasks are covered.
Meet MiniGPT-4: An Open-Source AI Model That Performs Complex Vision-Language Tasks Like GPT-4. Developed by a team of Ph.D. students from King Abdullah University of Science and Technology, Saudi Arabia, MiniGPT-4 consists of similar abilities to those portrayed by GPT-4, such as detailed image description generation and website creation from hand-written drafts. MiniGPT-4 uses an advanced LLM called Vicuna as the language decoder, which is built upon LLaMA and is reported to achieve 90% of ChatGPT’s quality as evaluated by GPT-4. MiniGPT-4 has used the pretrained vision component of BLIP-2 (Bootstrapping Language-Image Pre-training) and has added a single projection layer to align the encoded visual features with the Vicuna language model by freezing all other vision and language components.
NVIDIA just released a very impressive text-to-video paper: A recent research paper by NVIDIA reveals their innovative approach to producing top-notch short videos based on textual cues. This method employs Video Latent Diffusion Models (Video LDMs), which ensure optimal output while utilizing minimal computing resources. The technology can generate 4.7-second video clips consisting of 113 frames, with a resolution of 1280x2048 and a rendering rate of 24 FPS. Considering the rapid progress of this technology, we may witness the capability to create full-length films from just a few text prompts in the near future.
LLM as A Robotic Brain: Embodied AI is a field that aims to develop intelligent systems with a physical or virtual form, like robots, that can interact with their surroundings dynamically. Memory and control are essential for such systems and usually require separate frameworks. This paper presents LLM-Brain, a novel and generalizable framework that utilizes Large-scale Language Models as a robotic brain to unify memory and control. The LLM-Brain framework employs multiple multimodal language models for robotic tasks, utilizing a zero-shot learning approach. All components within LLM-Brain communicate in natural language using closed-loop multi-round dialogues that encompass perception, planning, control, and memory. The system's core is an embodied LLM that maintains egocentric memory and controls the robot. The researchers demonstrate LLM-Brain by examining two downstream tasks: active exploration and embodied question answering. The former involves the robot exploring an unknown environment within a limited number of actions.
Meet Sabi´a: Portuguese Large Language Models: This paper contributes to the expanding body of scientific evidence that specializing models for individual languages leads to improvements, even when the baseline model is large and extensively trained. The researchers achieved this for the Portuguese language utilizing a near state-of-the-art model with 65 billion parameters. Given the relatively low pretraining cost and significant performance gains observed, they foresee a future landscape consisting of a diverse array of models, each tailored to a specific domain rather than a single, all-encompassing model.
Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking Agents: Large Language Models (LLMs) have demonstrated a remarkable ability to generalize zero-shot to various language-related tasks. This paper focuses on the study of exploring generative LLMs such as ChatGPT and GPT-4 for relevance ranking in Information Retrieval (IR). Surprisingly, the experiments reveal that properly instructed ChatGPT and GPT-4 can deliver competitive, even superior results than supervised methods on popular IR benchmarks. Notably, GPT-4 outperforms the fully fine-tuned monoT5-3B on MS MARCO by an average of 2.7 nDCG on TREC datasets, an average of 2.3 nDCG on eight BEIR datasets, and an average of 2.7 nDCG on ten low-resource languages Mr.TyDi.
AI Tools Club: Find 100s of cool artificial intelligence (AI) tools. Our expert team reviews and provides insights into some of the most cutting-edge AI tools available.
Do You Know Marktechpost has a community of 1.5 Million+ AI Professionals and Engineers? For partnership and advertisement, please feel to contact us through this form.