• AI Research Insights
  • Posts
  • AI Research/Dev Super Interesting News ➡️ AMD-135M, SFR-Judge and many more....

AI Research/Dev Super Interesting News ➡️ AMD-135M, SFR-Judge and many more....

AI Research/Dev Super Interesting News ➡️ AMD-135M, SFR-Judge and many more....

Coming This November: AI Magazine on ‘Small Language Models’ [PARTNER WITH US]

Newsletter Series by Marktechpost.com

Hi There…

It was another busy week with plenty of news and updates about artificial intelligence (AI) research and dev. We have curated the top industry research updates specially for you. I hope you enjoy these updates, and make sure to share your opinions with us on social media.

AMD has recently introduced its new language model, AMD-135M or AMD-Llama-135M, which is a significant addition to the landscape of AI models. Based on the LLaMA2 model architecture, this language model boasts a robust structure with 135 million parameters and is optimized for performance on AMD’s latest GPUs, specifically the MI250. This release marks a crucial milestone for AMD in its endeavor to establish a strong foothold in the competitive AI industry.

The AMD-135M is built on the LLaMA2 model architecture and is integrated with advanced features to support various applications, particularly in text generation and language comprehension. The model is designed to work seamlessly with the Hugging Face Transformers library, making it accessible for developers and researchers. The model can handle complex tasks with a hidden size of 768, 12 layers (blocks), and 12 attention heads while maintaining high efficiency. The activation function used is the Swiglu function, and the layer normalization is based on RMSNorm. Its positional embedding is designed using the RoPE method, enhancing its ability to understand and generate contextual information accurately…

PARTNER WITH US 👇👇

Partner with us for our upcoming magazine on ‘Small Language Models’

Salesforce AI Research introduces SFR-Judge, a family of three LLM-based judge models, to revolutionize how LLM outputs are evaluated. Built using Meta Llama 3 and Mistral NeMO, SFR-Judge comes in three sizes: 8 billion (8B), 12 billion (12B), and 70 billion (70B) parameters. Each model is designed to perform multiple evaluation tasks, such as pairwise comparisons, single ratings, and binary classification. These models were developed to support research teams in rapidly and effectively evaluating new LLMs…

Researchers from Microsoft have introduced an innovative solution to these challenges with GRIN (GRadient-INformed Mixture of Experts). This approach aims to address the limitations of existing sparse models by introducing a new method of gradient estimation for expert routing. GRIN enhances model parallelism, allowing for more efficient training without the need for token dropping, a common issue in sparse computation. By applying GRIN to autoregressive language models, the researchers have developed a top-2 mixture-of-experts model with 16 experts per layer, referred to as the GRIN MoE model. This model selectively activates experts based on input, significantly reducing the number of active parameters while maintaining high performance…

Nvidia unveiled its latest large language model (LLM) offering, the Llama-3.1-Nemotron-51B. Based on Meta’s Llama-3.1-70B, this model has been fine-tuned using advanced Neural Architecture Search (NAS) techniques, resulting in a breakthrough in both performance and efficiency. Designed to fit on a single Nvidia H100 GPU, the model significantly reduces memory consumption, computational complexity, and costs associated with running such large models. It marks an important milestone in Nvidia’s ongoing efforts to optimize large-scale AI models for real-world applications…

Trending Feeds…

➡️ Can we 'segment anything' without supervision?🤔 Yes! Check out UnSAM @NeurIPS24—an unsupervised segmenter that achieves SAM-level results! 🎉[Tweet]

➡️ Ovis-1.6: An Open-Source Multimodal Large Language Model (MLLM) Architecture Designed to Structurally Align Visual and Textual Embeddings [Tweet]

➡️ Training RL/robot policies requires extensive experience in the target environment, which is often difficult to obtain. How can we “distill” embodied policies from foundational models? [Tweet]

➡️ "LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance." [Tweet]

➡️ I’m looking for a PhD student intern to work with me in Meta FAIR’s New York City office. The internship can start in spring or summer 2025 and last for 12-24 weeks (longer preferred)… [Tweet]

Wanna get in front of 1 Million+ Data Scientists, developers, AI engineers, CTOs???

Sponsor our newsletter or social post