• AI Research Insights
  • Posts
  • AI News Thread 👇🧵: Large models LMs can teach themselves to use external tools via simple APIs; A Solution to Human dialogue data for training LLMs; 🚨 If 2022 is the year of pixels for generative AI, then 2023 is the year of sound waves......

AI News Thread 👇🧵: Large models LMs can teach themselves to use external tools via simple APIs; A Solution to Human dialogue data for training LLMs; 🚨 If 2022 is the year of pixels for generative AI, then 2023 is the year of sound waves......

Hi there; today, we will share some research updates from Large models LMs can teach themselves to use external tools via simple APIs; A Solution to Human dialogue data for training LLMs; Can you diagnose and rectify a vision model using language?; If 2022 is the year of pixels for generative AI, then 2023 is the year of sound waves; Meet the closest competitor to OpenAI- No, it is not Google; and more in the list. So, let's start...

🔥 Human dialogue data for training LLMs are expensive so what?: A promising direction to tackle this problem is to generate synthetic dialogues by prompting large language models. Researchers from Columbia University and Amazon use a small set of expert-written conversations as in-context examples to synthesize a social conversation dataset using prompting. The proposed synthetic multi-party conversations were rated more favorably across all measured dimensions compared to conversation excerpts sampled from a human-collected multi-party dataset.

Stanford: Can you diagnose and rectify a vision model using language? This latest research analysis by Stanford researchers reveals when and how text embeddings can be used as a proxy for image embeddings to debug vision models. Recent multi-modal contrastive learning models have demonstrated the ability to learn an embedding space suitable for building strong vision classifiers. This research highlights another distinct advantage: the ability to diagnose vision classifiers through natural language. The proposed approach leverages multi-modal contrastive representation space by first training an image classifier on image embeddings.

🚨 If 2022 is the year of pixels for generative AI, then 2023 is the year of sound waves: The previous year saw a significant increase in the amount of work that concentrated on Computer Vision (CV) and Natural Language Processing (NLP). Because of this, academics worldwide are looking at the potential benefits deep learning and large language models (LLMs) might bring to audio generation. In the last few weeks alone, four new papers ( MusicLM, SingSong, Moûsai 2, AudioLDM, and EPIC-SOUNDS) have been published, each introducing a potentially useful audio model that can make further research in this area much easier.

👀 Meet the closest competitor to OpenAI- No, it is not Google: German startup Aleph Alpha is Europe's closest comparison to OpenAI. It has built a 300 billion parameter LLM (bigger than OpenAI's GPT-3) with a fraction of the funding. While OpenAI has raised a cool $11bn, Aleph Alpha has raised only $31.1m. In terms of data, however, Aleph Alpha’s largest model is trained on 300bn parameters; OpenAI’s is trained on 175bn.

Stanford: "Theory of Mind ToM" may have spontaneously emerged in Large Language Models.Stanford professor publishes paper on GPT-3's ability to ascribe mental states to persons. This research shows that models published before 2022 show virtually no ability to solve ToM tasks. Yet, the January 2022 version of GPT3 (davinci-002) solved 70% of ToM tasks, a performance comparable with that of seven-year-old children. Moreover, its November 2022 version (davinci-003), solved 93% of ToM tasks, a performance comparable with that of nine-year-old children. These findings suggest that ToMlike ability (thus far considered to be uniquely human) may have spontaneously emerged as a byproduct of language models’ improving language skills.

🚀 Meta AI: This latest research from Meta AI shows that large models LMs can teach themselves to use external tools via simple APIs and achieve the best of both worlds. They introduce Toolformer, a model trained to decide which APIs to call, when to call them, what arguments to pass, and how to incorporate the results into future token prediction best. This is done in a self-supervised way, requiring nothing more than a handful of demonstrations for each API.

DeepMind/University of Haifa: Previously, this research team from Deepmind had introduced ‘functa,’ a framework for representing data as neural functions (aka neural fields, INRs) and doing deep learning on them. In their recent work, ‘spatial functa,’ they show how to scale up the approach to ImageNet-1k 256x256. ‘spatial functa’ is a spatially arranged representation of functa whose features capture local signal information in input space.