AI Insights: PyTorch 2.5 Released and Katanemo Open Sources Arch-Function

Newsletter Series by Marktechpost.com

Hi There,

Dive into the hottest AI breakthroughs of the week—handpicked just for you!

Super Important AI News 🔥 🔥 🔥

🎃 Nvidia AI Quietly Launches Nemotron 70B: Crushing OpenAI’s GPT-4 on Various Benchmarks

🤯 🎥✨ Live Webinar: Increase Throughput by 4x and Cut Costs by 50% with the Predibase Inference Engine [Oct 29, 2024] (Sponsored)

SeedLM: A Post-Training Compression Method that Uses Pseudo-Random Generators to Efficiently Encode and Compress LLM Weights

📢 Neural Magic Unveils Machete: A New Mixed-Input GEMM Kernel for NVIDIA Hopper GPUs

🚨 Katanemo Open Sources Arch-Function: A Set of Large Language Models (LLMs) Promising Ultra-Fast Speeds at Function-Calling Tasks for Agentic Workflows

Mistral AI Introduces Les Ministraux: Ministral 3B and Ministral 8B- Revolutionizing On-Device AI

Featured AI Research 🛡️🛡️🛡️

SeedLM: A Post-Training Compression Method that Uses Pseudo-Random Generators to Efficiently Encode and Compress LLM Weights

Summary

SeedLM is a novel post-training compression method for Large Language Models (LLMs) that utilizes seeds of pseudo-random generators to encode and compress model weights. This method trades increased compute for reduced memory accesses, effectively speeding up memory-bound tasks like autoregressive generation. SeedLM outperforms state-of-the-art compression techniques by achieving nearly identical accuracy with 4-bit compression, which is a significant feat for LLMs, especially without the use of calibration data. The authors demonstrate the effectiveness of SeedLM on Llama 2 and Llama 3 models, which are particularly challenging to compress, showing that it achieves significantly better zero-shot accuracy retention at 4- and 3-bit than existing methods. Furthermore, FPGA-based tests show that SeedLM approaches a 4x speed-up over an FP16 baseline as the model size increases to 70B…

Other AI News 🎖️🎖️🎖️

🎙️ Google DeepMind Introduces DeepMind Control Vision Benchmark (DMC-VB): A Dataset and Benchmark to Evaluate the Robustness of Offline Reinforcement Learning Agents to Visual Distractors

🎯 Live Webinar: Increase Throughput by 4x and Cut Costs by 50% with the Predibase Inference Engine [Oct 29, 2024] (Sponsored)

♦️ From ONNX to Static Embeddings: What Makes Sentence Transformers v3.2.0 a Game-Changer?

🧩 Revolutionizing Fine-Tuned Small Language Model Deployments: Introducing Predibase’s Next-Gen Inference Engine

🥁 📚 Today, ChatGPT Plus, Enterprise, Team, and Edu users can start testing an early version of the Windows desktop app