• AI Research Insights
  • Posts
  • AI News: πŸš€ Stanford Alpaca 7B, GPT4, Anthropic releases Claude, Google's PaLM API, Pytorch 2.0 and MidjourneyV5

AI News: πŸš€ Stanford Alpaca 7B, GPT4, Anthropic releases Claude, Google's PaLM API, Pytorch 2.0 and MidjourneyV5

This newsletter brings AI research news that is much more technical than most resources but still digestible and applicable

Memoji on Steroids: This AI Model Can Reconstruct 3D Avatars from Videos. Time to meet Vid2Avatar. A tool that can generate high-fidelity 3D avatars from videos captured in the wild. Vid2Avatar learns 3D human avatars from in-the-wild videos. It does not need without need ground truth supervision, priors extracted from large datasets, or any external segmentation modules. You just give it a video of someone, and it will generate a robust 3D avatar for you.

LERF (Language Embedded Radiance Fields): LERF optimizes a dense, multi-scale language 3D field by volume rendering CLIP embeddings along training rays, supervising these embeddings with multi-scale CLIP features across multi-view training images. After optimization, LERF can extract 3D relevancy maps for language queries interactively in real-time. LERF enables pixel-aligned queries of the distilled 3D CLIP embeddings without relying on region proposals, masks, or fine-tuning, supporting long-tail open-vocabulary queries hierarchically across the volume.

Automatic Reasoning & Tool-Use of LLMs:Automatic Reasoning and Tool-use (ART) is a framework that uses frozen LLMs to automatically generate intermediate reasoning steps as a program. Given a new task to solve, ART selects demonstrations of multi-step reasoning and tool use from a task library. At test time, ART seamlessly pauses generation whenever external tools are called, and integrates their output before resuming generation. ART achieves a substantial improvement over few-shot prompting and automatic CoT on unseen tasks in the BigBench and MMLU benchmarks, and matches performance of hand-crafted CoT prompts on a majority of these tasks.

Microsoft Proposes VALL-E X: Cross-lingual speech synthesis is basically an approach for transmitting a speaker’s voice from one language to another. The cross-lingual neural codec language model that the researchers have introduced is called VALL-E X. It is an extended version of the VALL-E Text to speech model, which has been developed by acquiring strong in-context learning capabilities from the VALL-E TTS model.

Do You Know Marktechpost has a community of 1.5 Million+ AI Professionals and Engineers?