• AI Research Insights
  • Posts
  • AI Research/Dev Super Interesting News: LLM Compressor, Marqo-FashionCLIP and Marqo-FashionSigLIP, Hermes 3 and many more...

AI Research/Dev Super Interesting News: LLM Compressor, Marqo-FashionCLIP and Marqo-FashionSigLIP, Hermes 3 and many more...

AI Research/Dev Super Interesting News: LLM Compressor, Marqo-FashionCLIP and Marqo-FashionSigLIP, Hermes 3 and many more...

In partnership with

Want to get in front of 1 Million+ AI enthusiasts? Work with us here

Hello, You!

It was another busy week with plenty of news and updates about artificial intelligence (AI) research and dev. We have curated the top industry research updates specially for you. I hope you enjoy these updates, and make sure to share your opinions with us on social media.

Neural Magic has released the LLM Compressor, a state-of-the-art tool for large language model optimization that enables far quicker inference through much more advanced model compression. Hence, the tool is an important building block in Neural Magic’s pursuit of making high-performance open-source solutions available to the deep learning community, especially inside the vLLM framework.

LLM Compressor reduces the difficulties that arise from the previously fragmented landscape of model compression tools, wherein users had to develop multiple bespoke libraries similar to AutoGPTQ, AutoAWQ, and AutoFP8 to apply certain quantization and compression algorithms. Such fragmented tools are folded into one library by LLM Compressor to easily apply state-of-the-art compression algorithms like GPTQ, SmoothQuant, and SparseGPT. These algorithms are implemented to create compressed models that offer reduced inference latency and maintain high levels of accuracy, which is critical for the model to be in production environments....

A research team from the University of California Irvine and Cisco Research has proposed another line of work in a new approach toward automated attack graph generation using retriever-augmented LLMs, namely CrystalBall, leveraging GPT-4. This approach automates chaining CVEs according to their preconditions and postconditions, supporting dynamicity and scalability in attack graph generation. It is designed to process large volumes of unstructured and structured data and fits modern cybersecurity environments. The research team has worked particularly on integrating LLMs with a retriever model that improves the accuracy and relevance of the attack graphs generated.

The underlying technology behind CrystalBall is sophisticated and effective. It applies a generation method augmented by a retriever, namely RAG, for retrieving the most relevant CVEs concerning a given set of system information supplied by the user against a large dataset. This information will be stored in a relational database supporting semantic search, enabling the system to chain vulnerabilities with a high degree of accuracy. It is applied as a black box to the LLM-based system, where the latter generates attack graphs. This approach ensures the comprehensiveness and relevance of generated graphs to the context in which they are applied for security purposes.

Margo releases two new state-of-the-art multimodal models for fashion domain search and recommendations, Marqo-FashionCLIP and Marqo-FashionSigLIP. For use in subsequent search and recommendation systems, Marqo-FashionCLIP and Marqo-FashionSigLIP can generate embeddings for both text and images. More than one million fashion items with extensive meta-data, including materials, colors, styles, keywords, and descriptions, were used to train the models.

The team used two pre-existing base models (ViT-B-16-laion and ViT-B-16-SigLIP-webli) to fine-tune the models using GCL. The seven-part loss is optimized for keywords, categories, details, colors, materials, and extensive descriptions. This multi-part loss was far superior to the conventional text-image InfoNCE loss concerning contrastive learning and fine-tuning. This produces a model that yields better search application results when dealing with shorter descriptive text and keyword-like material....

Nous Research addresses the challenge of making LLMs more user-friendly, controllable, and effective in generating high-quality responses. While “base” or “foundation” models are trained on a wide range of text data, they often struggle to maintain coherence and context over multiple turns. This lack of steerability and consistency limits their practical utility, particularly for users needing models to respond reliably to specific prom

Current methods for improving LLMs include instruct-tuning and chat-tuning, where models are fine-tuned to respond to specific commands or to engage in conversations. However, these methods often have limitations, such as an inability to follow nuanced instructions or to remain neutral in their responses. To address these limitations, Nous Research introduced Hermes 3, an advanced open-source language model built on Llama 3.1. Hermes 3 models are designed to be highly steerable, allowing them to follow system and instruction prompts precisely while incorporating advanced reasoning and creative capabilities. The largest model, Hermes 3 405B, is particularly noted for achieving state-of-the-art performance on several public benchmarks.....

The Llama-3.1-Minitron 4B model is the distilled and pruned version of the bigger Llama-3.1 8B sister model. To create this smaller model from the original 8B model, Nvidia used structured pruning in the depth and width directions. Pruning is a technique that deletes less important layers or neurons of the network to reduce model size and complexity while retaining its performance. In this case, Nvidia performed the depth pruning by removing 16 layers from the model and downsizing it from an 8B to a 4B model. Another technique applied is width pruning through trimming embedding dimensions and MLP intermediate.

Besides pruning, Nvidia also applied classical distillation to enhance the efficiency of Llama-3.1-Minitron 4B. Knowledge distillation is a process whereby a smaller model, the student, is trained to mimic the behavior of a larger and more complex one, the teacher. In this way, much of the predictive power of the original model is preserved in the smaller model, but it is faster and more frugal in terms of resources. Nvidia has combined this with the distillation technique and pruning, making sure that the retrained model of 4B is high-performing and is well-spent in larger models....