Hi There,
Dive into the hottest AI breakthroughs of the week—handpicked just for you!
Meet HOVER, A Unified Neural Controller Aimed at Enhancing Humanoid Robot Capabilities
Researchers from NVIDIA, Carnegie Mellon University, UC Berkeley, UT Austin, and UC San Diego introduced HOVER, a unified neural controller aimed at enhancing humanoid robot capabilities. This research proposes a multi-mode policy distillation framework, integrating different control strategies into one cohesive policy, thereby making a notable advancement in humanoid robotics. HOVER is a paradigm shift. It’s a “generalist policy”—a single neural network that harmonizes diverse control modes, enabling seamless transitions and unprecedented versatility. HOVER supports diverse control modes, including over 15 useful configurations for real-world applications on a 19-DOF humanoid robot. This versatile command space encompasses most of the modes used in previous research.
Salesforce AI Released APIGen-MT and xLAM-2-fc-r Model Series: Advancing Multi-Turn Agent Training with Verified Data Pipelines and Scalable LLM Architectures
A research team from Salesforce AI Research introduced APIGen-MT, a novel two-phase data generation pipeline designed to create high-quality, multi-turn interaction data between agents and simulated human users. The approach focuses on realism, structure, and verification by constructing validated task blueprints and then simulating detailed agent-human conversations in executable environments. Unlike earlier approaches, this method employs a layered validation mechanism using both automated checkers and committees of large language models to assess task coherence, accuracy, and feasibility. The researchers train a family of models under the xLAM-2-fc-r series, ranging from 1 billion to 70 billion parameters, using this synthetic data to outperform major benchmarks in multi-turn agent evaluation significantly.....……..
Scalable and Principled Reward Modeling for LLMs: Enhancing Generalist Reward Models RMs with SPCT and Inference-Time Optimization
DeepSeek-AI and Tsinghua University researchers explore enhancing reward models RM for general queries by improving inference-time scalability using increased computing and better learning techniques. They employ pointwise GRM for flexible input handling and propose a learning method—Self-Principled Critique Tuning (SPCT)—which helps GRMs generate adaptive principles and accurate critiques during online reinforcement learning. They apply parallel sampling and introduce a meta RM to scale effectively and refine the voting process. Their DeepSeek-GRM models outperform existing benchmark methods, offering higher reward quality and scalability, with plans for open-sourcing despite challenges in some complex tasks....……..
Huawei Noah’s Ark Lab Released Dream 7B: A Powerful Open Diffusion Reasoning Model with Advanced Planning and Flexible Inference Capabilities
Researchers from the University of Hong Kong and Huawei Noah’s Ark Lab released Dream 7B (Diffusion reasoning model), the most powerful open diffusion large language model to date. The model matches or exceeds similarly-sized AR models on general tasks, mathematics, and coding benchmarks. Dream 7B shows exceptional zero-shot planning capabilities and inference flexibility, outperforming larger models like DeepSeek V3 (671B) on structured tasks. Trained on 580B tokens from diverse datasets, including Dolma and OpenCoder, the model employs mask-based diffusion with autoregressive weight initialization from Qwen2.5 7B. Its architecture enables powerful bidirectional context processing, arbitrary-order generation, infilling capabilities, and adjustable quality-speed tradeoffs during inference.....……..
MMSearch-R1: End-to-End Reinforcement Learning for Active Image Search in LMMs
This research introduces MMSearch-R1, which represents a pioneering approach to equip LMMs with active image search capabilities through an end-to-end reinforcement learning framework. This robust method focuses specifically on enhancing visual question answering (VQA) performance by enabling models to autonomously engage with image search tools. MMSearch-R1 trains models to make critical decisions about when to initiate image searches and how to effectively process the retrieved visual information. The system excels at extracting, synthesizing, and utilizing relevant visual data to support sophisticated reasoning processes.....……..
This AI Paper Introduces Inference-Time Scaling Techniques: Microsoft’s Deep Evaluation of Reasoning Models on Complex Tasks
Researchers at Microsoft introduced a rigorous evaluation framework for inference-time scaling that covers nine models and eight complex task benchmarks. This included comparing conventional models against reasoning-optimized ones such as DeepSeek R1, O1, and O3-mini. Their method involved parallel scaling, where multiple outputs are generated and aggregated, and sequential scaling, where the model is prompted to revise its output based on structured feedback iteratively. Benchmarks were sourced from domains like calendar planning, math Olympiads, and spatial reasoning, and the team introduced two new datasets for NP-hard problems: 3SAT and TSP.......……..
University of Michigan Researchers Introduce OceanSim: A High-Performance GPU-Accelerated Underwater Simulator for Advanced Marine Robotics
Researchers from the University of Michigan have proposed OceanSim, a high-performance underwater simulator accelerated by NVIDIA parallel computing technology. Built upon NVIDIA Isaac Sim, OceanSim leverages high-fidelity, physics-based rendering, and GPU-accelerated real-time ray tracing to create realistic underwater environments. It bridges underwater simulation with the rapidly expanding NVIDIA Omniverse ecosystem, enabling the application of multiple existing sim-ready assets and robot learning approaches within underwater robotics research. Moreover, OceanSim allows the user to operate the robot, visualize sensor data, and record data simultaneously during GPU-accelerated simulated data generation......……..
This AI Paper from ByteDance Introduces MegaScale-Infer: A Disaggregated Expert Parallelism System for Efficient and Scalable MoE-Based LLM Serving
ByteDance and Peking University researchers have introduced MegaScale-Infer, a system that rethinks the architecture of MoE serving. Instead of serving the model as a monolithic block, the researchers disaggregate the attention and FFN modules, deploying them on separate GPUs. This separation enables customized scaling and parallelism strategies tailored to the specific needs of each module. Attention modules, which are memory-intensive, are replicated to aggregate requests, while FFN modules are scaled using expert parallelism. The system also supports heterogeneous GPU deployment, assigning cost-effective memory-heavy GPUs to attention tasks and compute-optimized GPUs to FFNs. This disaggregation dramatically improves resource usage and flexibility in deployment.......……..