• AI Research Insights
  • Posts
  • Marktechpost AI Newsletter: GLM-4-9B-Chat-1M + Whisper WebGPU + Qwen2-72B + Nomic AI Releases Nomic Embed Vision and many more....

Marktechpost AI Newsletter: GLM-4-9B-Chat-1M + Whisper WebGPU + Qwen2-72B + Nomic AI Releases Nomic Embed Vision and many more....

Marktechpost AI Newsletter: GLM-4-9B-Chat-1M + Whisper WebGPU + Qwen2-72B + Nomic AI Releases Nomic Embed Vision and many more....

Presented by

🐝 Promote Your Webinar/E-Book: Reach 1.5 Million AI Monthly Audience: Talk to Us

Featured Research..

Meet Tsinghua University’s GLM-4-9B-Chat-1M: An Outstanding Language Model Challenging GPT 4V, Gemini Pro (on vision), Mistral and Llama 3 8B

Tsinghua University’s Knowledge Engineering Group (KEG) has unveiled GLM-4 9B, a powerful new language model that outperforms GPT-4 and Gemini in various benchmarks. Developed by the Tsinghua Deep Model (THUDM) team, this open-source model marks a significant milestone in the field of natural language processing.

At its core, GLM-4 9B is a massive language model trained on an unprecedented 10 trillion tokens spanning 26 languages. It caters to various capabilities, including multi-round dialogue in Chinese and English, code execution, web browsing, and custom tool calling through Function Call. The model’s architecture is built upon the latest advancements in deep learning, incorporating cutting-edge techniques such as attention mechanisms and transformer architectures. The base version supports a context window of up to 128,000 tokens, while a specialized variant allows for an impressive 1 million token context length.

The model’s architecture is built upon the latest advancements in deep learning, incorporating cutting-edge techniques such as attention mechanisms and transformer architectures. The base version supports a context window of up to 128,000 tokens, while a specialized variant allows for an impressive 1 million token context length.

 Editor’s Picks…

Whisper WebGPU: Real-Time in-Browser 🎙️ Speech Recognition with OpenAI Whisper

Achieving real-time speech recognition directly within a web browser has long been a sought-after milestone. Whisper WebGPU by a Hugging Face Engineer (nickname ‘Xenova’) is a groundbreaking technology that leverages OpenAI’s Whisper model to bring real-time, in-browser speech recognition to fruition. This remarkable development is a monumental shift in interaction with AI-driven web applications.

The core of Whisper WebGPU lies in the Whisper-base model, a 73-million-parameter speech recognition model meticulously optimized for web inference. With a model size of approximately 200 MB, Whisper-base is designed to be lightweight yet powerful, making it ideal for real-time applications. Once the model is downloaded, it is cached for future use, ensuring that subsequent interactions are swift and seamless.

The true innovation of Whisper WebGPU is its ability to run entirely within the user’s browser. Utilizing Hugging Face Transformers.js and ONNX Runtime Web, this model performs all computations locally, eliminating the need to send data to a server. This enhances privacy and enables functionality even when the device is offline. Users can disconnect from the internet after the initial model load and benefit from Whisper’s robust speech recognition capabilities.

ADVERTISEMENT

Quit sending emails like a dinosaur.

It’s the year 2024 and all the top newsletters are using beehiiv.

beehiiv was created by the same early Morning Brew employees who scaled their daily email to over 4 million subscribers. And now every newsletter on beehiiv has access to the same tools and winning formula.

So what exactly does beehiiv offer?

  • World-class growth tools like the referral program and recommendation network

  • Monetization via the beehiiv Ad Network and premium subscriptions (i.e. beehiiv helps you get paid)

  • Seamless content creation with a sleek collaborative editor

  • Best-in-class inbox deliverability of 98.7%

  • Oh and it’s the most affordable by a mile…

Take your newsletter to the next level — get started for free.

Meet Qwen2-72B: An Advanced AI Model With 72B Parameters, 128K Token Support, Multilingual Mastery, and SOTA Performance

Qwen2-72B is part of the Qwen2 series, which includes a range of large language models (LLMs) with varying parameter sizes. As the name suggests, the Qwen2-72 B boasts an impressive 72 billion parameters, making it one of the most powerful models in the series. The Qwen2 series aims to improve upon its predecessor, Qwen1.5, by introducing more robust capabilities in language understanding, generation, and multilingual tasks.

The Qwen2-72B is built on the Transformer architecture and features advanced components such as SwiGLU activation, attention QKV bias, and group query attention. These enhancements enable the model to handle complex language tasks more efficiently. The improved tokenizer is adaptive to multiple natural and coding languages, broadening the model’s applicability in various domains.

Nomic AI Releases Nomic Embed Vision v1 and Nomic Embed Vision v1.5: CLIP-like Vision Models that Can be Used Alongside their Popular Text Embedding Models

Nomic AI has recently unveiled two significant releases in multimodal embedding models: Nomic Embed Vision v1 and Nomic Embed Vision v1.5. These models are designed to provide high-quality, fully replicable vision embeddings that seamlessly integrate with the existing Nomic Embed Text v1 and v1.5 models. This integration creates a unified embedding space that enhances the performance of multimodal and text tasks, outperforming competitors like OpenAI CLIP and OpenAI Text Embedding 3 Small.

Nomic Embed Vision aims to address the limitations of existing multimodal models such as CLIP, which, while impressive in zero-shot multimodal capabilities, underperform tasks outside image retrieval. By aligning a vision encoder with the existing Nomic Embed Text latent space, Nomic has created a unified multimodal latent space that excels in image and text tasks. This unified space has shown superior performance on benchmarks like Imagenet 0-Shot, MTEB, and Datacomp, making it the first weights model to achieve such results.

🐝 Promote Your Webinar/E-Book: Reach 1.5 Million Monthly Audience: Talk to Us