AI Dev and Research News
Posts
⏰ Featured AIs: Microsoft AI Introduces Claimify and NVIDIA AI Open Sources Dynamo.......

⏰ Featured AIs: Microsoft AI Introduces Claimify and NVIDIA AI Open Sources Dynamo.......

March 21, 2025

miniCON 2025- OPEN SOURCE AI

Hi There,

Dive into the hottest AI breakthroughs of the week—handpicked just for you!

Featured AI Update 🛡️🛡️🛡️

🔥 IBM and Hugging Face Researchers Release SmolDocling: A 256M Open-Source Vision Language Model for Complete Document OCR

Researchers from IBM and Hugging Face have recently addressed these challenges by releasing SmolDocling, a 256M open-source vision-language model (VLM) designed explicitly for end-to-end multi-modal document conversion tasks. Unlike larger foundational models, SmolDocling provides a streamlined solution that processes entire pages through a single model, significantly reducing complexity and computational demands. Its ultra-compact nature, at just 256 million parameters, makes it notably lightweight and resource-efficient. The researchers also developed a universal markup format called DocTags, which precisely captures page elements, their structures, and spatial contexts in a highly compact and clear form.

SmolDocling leverages Hugging Face’s compact SmolVLM-256M as its architecture base, which features significant reductions in computational complexity through optimized tokenization and aggressive visual feature compression methods. Its main strength lies in the innovative DocTags format, providing structured markup that distinctly separates document layout, textual content, and visual information such as equations, tables, code snippets, and charts. SmolDocling utilizes curriculum learning for efficient training, which initially involves freezing its vision encoder and gradually fine-tuning it using enriched datasets that enhance visual-semantic alignment across different document elements. Additionally, the model’s efficiency allows it to process entire document pages at lightning-fast speeds, averaging just 0.35 seconds per page on a consumer GPU while consuming under 500MB of VRAM....……

Super Important AI News 🔥 🔥 🔥

🧵 miniCON 2025- OPEN SOURCE AI: Free Event for 2 HOURs- A Dynamic 2-hour event, showcasing the latest breakthroughs shaping the future of Open Source AI [ e-Certificate of Attendance]

🚨 IBM and Hugging Face Researchers Release SmolDocling: A 256M Open-Source Vision Language Model for Complete Document OCR

💡 Microsoft AI Introduces Claimify: A Novel LLM-based Claim-Extraction Method that Outperforms Prior Solutions to Produce More Accurate, Comprehensive, and Substantiated Claims from LLM Outputs

📢 NVIDIA AI Just Open Sourced Canary 1B and 180M Flash – Multilingual Speech Recognition and Translation Models

🧲 NVIDIA AI Open Sources Dynamo: An Open-Source Inference Library for Accelerating and Scaling AI Reasoning Models in AI Factories

Project Implementations </>

✅ A Coding Implementation to Build a Document Search Agent (DocSearchAgent) with Hugging Face, ChromaDB, and Langchain [Colab Notebook Included]

✅ A Step-by-Step Guide to Building a Semantic Search Engine with Sentence Transformers, FAISS, and all-MiniLM-L6-v2 [Colab Notebook Included]

✅ Building an Interactive Bilingual (Arabic and English) Chat Interface with Open Source Meraj-Mini by Arcee AI: Leveraging GPU Acceleration, PyTorch, Transformers, Accelerate, BitsAndBytes, and Gradio [Colab Notebook Included]

✅ A Step by Step Guide to Build an Interactive Health Data Monitoring Tool Using Hugging Face Transformers and Open Source Model Bio_ClinicalBERT [Colab Notebook Included]