AI Dev and Research News
Posts
AI Research/Dev Super Interesting News: StructuredRAG Released by Weaviate, iAsk Ai Outperforms ChatGPT, and many more...

AI Research/Dev Super Interesting News: StructuredRAG Released by Weaviate, iAsk Ai Outperforms ChatGPT, and many more...

August 28, 2024

In partnership with

Want to get in front of 1 Million+ AI enthusiasts? Work with us here

Hi There…

It was another busy week with plenty of news and updates about artificial intelligence (AI) research and dev. We have curated the top industry research updates specially for you. I hope you enjoy these updates, and make sure to share your opinions with us on social media.

iAsk Ai Outperforms ChatGPT and All Other AI Models on MMLU Pro Test

iAsk Ai has quickly become a leader in AI search. iAsk.ai's answer engine is powered by iAsk Pro, their latest model that has outperformed top competitors like OpenAI’s GPT-4o, Anthropic’s Claude 3.5 Sonnet, and Google’s Gemini Pro, as shown by its record-breaking results on the MMLU Pro benchmark test, where iAsk Pro scored 85.85% - a full 12 percentage points higher than GPT-4o. Additionally, iAsk Pro achieved a superhuman performance of 93.89% on the traditional MMLU benchmark, surpassing the accuracy of the top 10% of human experts.

In less than two years, iAsk Ai has processed 325 million searches and now handles 1.5 million searches daily. This remarkable growth highlights iAsk Pro's ability to consistently deliver accurate and reliable information, solidifying its position as the most reliable AI model available today…

➡️ Continue reading here!

StructuredRAG Released by Weaviate: A Comprehensive Benchmark to Evaluate Large Language Models’ Ability to Generate Reliable JSON Outputs for Complex AI Systems

The research team from Weaviate introduced a novel benchmark called StructuredRAG, which consists of six different tasks designed to assess the ability of LLMs to generate structured outputs like JSON. The benchmark evaluated two state-of-the-art models: Gemini 1.5 Pro and Llama 3 8B-instruct, leading LLMs in the field. The researchers employed two distinct prompting strategies—f-String and Follow the Format (FF)—to measure the models’ proficiency in following response format instructions. These strategies were chosen to explore different approaches to prompting, aiming to identify which method yields better results in structured output generation.

The researchers conducted 24 experiments in their methodology, each designed to test the models’ ability to follow the specified JSON format instructions. The experiments covered a range of output complexities, from simple string values to more intricate composite objects that include multiple data types. The success of the models was measured by their ability to produce outputs that could be accurately parsed into the requested JSON format. The study also introduced OPRO prompt optimization, a technique to improve JSON response formatting without relying on structured decoding methods. This approach focuses on refining the prompts to enhance the likelihood of generating correctly formatted outputs...…

➡️ Continue reading here!

SalesForce AI Research Introduced LlamaRank: A State-of-the-Art Reranker for Enhanced Document Retrieval and Code Search, Outperforming Cohere Rerank v3 and Mistral-7B QLM in Accuracy

The Salesforce AI Research team carefully crafted LlamaRank as a specialized tool for document relevancy ranking. Powered by iterative on-policy feedback from their highly dedicated RLHF data annotation team, LlamaRank does a great job, outperforms many leading APIs in general document ranking, and redefines the state-of-the-art performance on code search. The training data includes high-quality synthesized data from Llama3-70B and Llama3-405B, along with human-labeled annotations, covering domains from topic-based search and document QA to code QA.

In RAG systems, there is a reranker at the core, such as LlamaRank. First, a query is processed in a very cheap but less precise way- for example, semantic search with embeddings- to return a list of candidate documents that could be useful. This set is refined in a more subtle way by the reranker to find out which document is most relevant to the query. In other words, this final selection makes sure that the language model is fine-tuned with only the most relevant information, hence contributing to higher accuracy and coherence in the output responses.....

➡️ Continue reading here!

CogVideoX Released in Two Variants – CogVideoX-2B and CogVideoX-5B: A Revolutionary Advancement in Text-to-Video Generation with Enhanced Temporal Consistency and Superior Dynamic Scene Handling

Zhipu AI and Tsinghua University researchers have introduced CogVideoX, a novel approach that leverages cutting-edge techniques to enhance text-to-video generation. CogVideoX employs a 3D causal VAE, compressing video data along spatial and temporal dimensions, significantly reducing the computational load while maintaining video quality. The model also integrates an expert transformer with adaptive LayerNorm, which improves the alignment between text and video, facilitating a more seamless integration of these two modalities. This advanced architecture enables the generation of high-quality, semantically accurate videos that can extend over longer durations than previously possible.

CogVideoX incorporates several innovative techniques that set it apart from earlier models. The 3D causal VAE allows for a 4×8×8 compression from pixels to latents, a substantial reduction that preserves the continuity and quality of the video. The expert transformer uses a 3D full attention mechanism, comprehensively modeling video data to ensure that large-scale motions are accurately represented. The model includes a sophisticated video captioning pipeline, which generates new textual descriptions for video data, enhancing the semantic alignment of the videos with the input text. This pipeline includes video filtering to remove low-quality clips and a dense video captioning method that improves the model’s understanding of video content.....

➡️ Continue reading here!

Trending Feeds…

➡️ Hugging Face Speech-to-Speech Library: A Modular and Efficient Solution for Real-Time Voice Processing [Tweet]

➡️ Jina AI Introduced ‘Late Chunking’: A Simple AI Approach to Embed Short Chunks by Leveraging the Power of Long-Context Embedding Models [Tweet]

➡️ SarcasmBench: A Comprehensive Evaluation Framework Revealing the Challenges and Performance Gaps of Large Language Models in Understanding Subtle Sarcastic Expressions [Tweet]

➡️ MaVEn: An Effective Multi-granularity Hybrid Visual Encoding Framework for Multimodal Large Language Models (MLLMs) [Tweet]

➡️ The latest Gemini (Pro/Flash/Flash-9b) results are now live, with over 20K community votes! [Tweet]

Wanna get in front of 1 Million+ Data Scientists, developers, AI engineers, CTOs???

Sponsor a newsletter or social post

Click here for all the details.