• AI Research Insights
  • Posts
  • Marktechpost AI Newsletter: BigCodeBench by BigCode + Together AI Introduces Mixture of Agents (MoA) + many more....

Marktechpost AI Newsletter: BigCodeBench by BigCode + Together AI Introduces Mixture of Agents (MoA) + many more....

Marktechpost AI Newsletter: BigCodeBench by BigCode + Together AI Introduces Mixture of Agents (MoA) + many more....

Presented by

Want to get in front of 1.5 Million AI enthusiasts? Work with us here

Featured Research..

Meet BigCodeBench by BigCode: The New Gold Standard for Evaluating Large Language Models on Real-World Coding Tasks

BigCodeBench contains 1,140 function-level tasks that challenge LLMs to follow user-oriented instructions and compose multiple function calls from 139 diverse libraries. Each task is meticulously designed to mimic real-world scenarios, requiring complex reasoning and problem-solving skills. The tasks are further validated through an average of 5.6 test cases per task, achieving a branch coverage of 99% to ensure thorough evaluation.

BigCodeBench is divided into two main components: BigCodeBench-Complete and BigCodeBench-Instruct. BigCodeBench-Complete focuses on code completion, where LLMs must finish implementing a function based on detailed docstring instructions. This tests the modelsā€™ ability to generate functional and correct code snippets from partial information.

BigCodeBench-Instruct, on the other hand, is designed to evaluate instruction-tuned LLMs that follow natural-language instructions. This component presents a more conversational approach to task descriptions, reflecting how real users might interact with these models in practical applicationā€¦..

 Editorā€™s Picksā€¦

Fireworks AI Releases Firefunction-v2: An Open Weights Function Calling Model with Function Calling Capability on Par with GPT4o at 2.5x the Speed and 10% of the Cost

Fireworks AI releases Firefunction-v2, an open-source function-calling model designed to excel in real-world applications. It integrates with multi-turn conversations, instruction following, and parallel function calling. Firefunction-v2 offers a robust and efficient solution that rivals high-end models like GPT-4o but at a fraction of the cost and with superior speed and functionality.

LLMsā€™ capabilities have improved substantially in recent years, particularly with releases like Llama 3. These advancements have underscored the importance of function calling, allowing models to interact with external APIs and enhancing their utility beyond static data handling. Firefunction-v2 builds on these advancements, offering a model for real-world scenarios involving multi-turn conversations, instruction following, and parallel function calling.

Firefunction-v2 retains Llama 3ā€™s multi-turn instruction capability while significantly outperforming it in function-calling tasks. It scores 0.81 on a medley of public benchmarks compared to GPT-4oā€™s 0.80, all while being far more cost-effective and faster. Specifically, Firefunction-v2 costs $0.9 per output token, compared to GPT-4oā€™s $15, and operates at 180 tokens per second versus GPT-4oā€™s 69 tokens per second

ADVERTISEMENT

Meet Gretel Navigator: the first compound AI system built to create, edit, and augment tabular data. šŸš€šŸš€šŸš€

Get inspired by popular Navigator use cases:

  • Empower frontier AI teams with high-quality datasets to train LLMs.

  • Safeguard sensitive, proprietary datasets when evaluating public ML models

  • Teach LLMs new tasks or domains for new generative AI-powered applications.

  • Augment real-world data to build more performant intelligent applications.

  • Generate synthetic question-truth pairs to evaluate RAG models.

    [Sign Up for Free] Try Gretel Navigator, the first compound AI system built to create, edit, and augment tabular data.

Together AI Introduces Mixture of Agents (MoA): An AI Framework that Leverages the Collective Strengths of Multiple LLMs to Improve State-of-the-Art Quality

In a significant leap forward for AI, Together AI has introduced an innovative Mixture of Agents (MoA) approach, Together MoA. This new model harnesses the collective strengths of multiple large language models (LLMs) to enhance state-of-the-art quality and performance, setting new benchmarks in AI.

MoA employs a layered architecture, with each layer comprising several LLM agents. These agents utilize outputs from the previous layer as auxiliary information to generate refined responses. This method allows MoA to integrate diverse capabilities and insights from various models, resulting in a more robust and versatile combined model. The implementation has proven successful, achieving a remarkable score of 65.1% on the AlpacaEval 2.0 benchmark, surpassing the previous leader, GPT-4o, which scored 57.5%.

A critical insight driving the development of MoA is the concept of ā€œcollaborativenessā€ among LLMs. This phenomenon suggests that an LLM tends to generate better responses when presented with outputs from other models, even if those models are less capable. By leveraging this insight, MoAā€™s architecture categorizes models into ā€œproposersā€ and ā€œaggregators.ā€ Proposers generate initial reference responses, offering nuanced and diverse perspectives, while aggregators synthesize these responses into high-quality outputs. This iterative process continues through several layers until a comprehensive and refined response is achieved.

Anthropic AI Releases Claude 3.5: A New AI Model that Surpasses GPT-4o on Multiple Benchmarks While Being 2x Faster than Claude 3 Opus

Anthropic AI has launched Claude 3.5 Sonnet, marking the first release in its new Claude 3.5 model family. This latest iteration of Claude brings significant advancements in AI capabilities, setting a new benchmark in the industry for intelligence and performance.

Claude 3.5 Sonnet is available for free on Claude.ai and the Claude iOS app. The model is accessible via the Anthropic API, Amazon Bedrock, and Google Cloudā€™s Vertex AI. Enhanced rate limits are provided for Claude Pro and Team plan subscribers. The pricing structure is set at $3 per million input tokens and $15 per million output tokens, with a 200K token context window, making it cost-effective and highly efficient.

Trending AI Social Media Posts