• AI Research Insights
  • Posts
  • 🚀 AI News: TII Open-Sourced Falcon LLMs : A New AI Model That Uses Only 75 Percent of GPT-3’s Training Compute, 40 Percent of Chinchilla’s, and 80 Percent of PaLM-62B’s | Large Language Models as Tool Makers......

🚀 AI News: TII Open-Sourced Falcon LLMs : A New AI Model That Uses Only 75 Percent of GPT-3’s Training Compute, 40 Percent of Chinchilla’s, and 80 Percent of PaLM-62B’s | Large Language Models as Tool Makers......

This newsletter brings AI research news that is much more technical than most resources but still digestible and applicable

Technology Innovation Institute Open-Sourced Falcon LLMs: A New AI Model That Uses Only 75 Percent of GPT-3’s Training Compute, 40 Percent of Chinchilla’s, and 80 Percent of PaLM-62B’s. Falcon-40B is a powerful decoder-only model developed by TII (Technology Innovation Institute) and trained on a vast amount of data consisting of 1,000B tokens from RefinedWeb and curated corpora. This model is available under the TII Falcon LLM License. Falcon-7B is a highly advanced causal decoder-only model TII (Technology Innovation Institute) developed. It boasts an impressive parameter count of 7B and has been trained on an extensive dataset of 1,500B tokens derived from RefinedWeb, further enhanced with curated corpora. This model is made accessible under the TII Falcon LLM License.

How Small Can Language Models Be and Still Speak Coherent English? Meet Tinystories: Language models, such as GPT-Neo (small) or GPT-2 (small), often struggle to generate coherent English text beyond a few words due to their smaller size, around 125M parameters. This issue raises questions about the necessity of larger-scale models with complex architectures to produce fluent and coherent text. In response, Microsoft researchers introduced TinyStories, a synthetic dataset of short stories comprehensible to 3 to 4-year-olds and generated by GPT-3.5 and GPT-4. This dataset can be used to train and evaluate smaller language models (below 10 million total parameters) with simpler architectures (a single transformer block). These smaller models have demonstrated the ability to produce fluent, grammatically correct stories that exhibit diversity and reasoning capabilities.

LLMs Outperform Reinforcement Learning- Meet SPRING: An Innovative Prompting Framework for LLMs Designed to Enable in-Context Chain-of-Thought Planning and Reasoning. SPRING is an LLM-based policy that outperforms Reinforcement Learning algorithms in an interactive environment requiring multi-task planning and reasoning. A group of researchers from Carnegie Mellon University, NVIDIA, Ariel University, and Microsoft have investigated the use of Large Language Models (LLMs) for understanding and reasoning with human knowledge in the context of games. They propose a two-stage approach called SPRING, which involves studying an academic paper and then using a Question-Answer (QA) framework to justify the knowledge obtained.

Meet BiomedGPT: a unified biomedical generative pretrained transformer model for vision, language, and multimodal tasks. The research paper proposes and explains how BiomedGPT leverages self-supervision on large and diverse datasets to accept multi-modal inputs and perform a range of downstream tasks. BiomedGPT delivers expansive and inclusive representations of biomedical data, outperforming the majority of preceding state-of-the-art models across five distinct tasks with 20 public datasets spanning over 15 unique biomedical modalities.

Reinventing RNNs for the Transformer Era - proposes an approach that combines the efficient parallelizable training of Transformers with the efficient inference of RNNs; results show that the method performs on part with similarly sized Transformers. This approach leverages a linear attention mechanism and allows us to formulate the model as either a Transformer or an RNN, which parallelizes computations during training and maintains constant computational and memory complexity during inference, leading to the first non-transformer architecture to be scaled to tens of billions of parameters. The presented experiments reveal that RWKV performs on par with similarly sized Transformers, suggesting that future work can leverage this architecture to create more efficient models. This work presents a significant step towards reconciling the trade-offs between computational efficiency and model performance in sequence processing tasks.

Large Language Models as Tool Makers: The research group from Stanford, Google, and Princeton introduced LATM, a closed-loop framework empowering large language models (LLMs) to create and utilize their own tools for diverse tasks. Our approach, inspired by human’s evolutionary strides in tool creation, employs two key stages: Tool Making and Tool Using. This division of labor allows us to harness the capabilities of advanced LLMs while significantly reducing computational costs.

Using an artificial intelligence algorithm, researchers at MIT and McMaster University have identified a new antibiotic that can kill a type of bacteria (Acinetobacter baumannii, pink) that is responsible for many drug-resistant infections. If developed for use in patients, the drug could help to combat Acinetobacter baumannii, a species of bacteria that is often found in hospitals and can lead to pneumonia, meningitis, and other serious infections. The microbe is also a leading cause of infections in wounded soldiers in Iraq and Afghanistan.

SPONSORED SECTION

A Banksy got everyday investors 32% returns?

Mm-hmm, sure. So, what’s the catch?

We know it may sound too good to be true. But thousands of investors are already smiling all the way to the bank, thanks to the fine-art investing platform Masterworks.

These results aren’t cherry-picking. This is the whole bushel. Masterworks has built a track record of 13 exits, including net returns of +10.4%, +27.3%, and +35.0%, even while financial markets plummeted.

But art? Really? Okay, skeptics, here are the numbers. Contemporary art prices:

  • outpaced the S&P 500 by 131% over the last 26 years

  • have the lowest correlation to equities of any asset class

  • remained stable through the dot-com bubble and ’08 crisis

Got your attention yet? Marktechpost readers can skip the waitlist with this exclusive link.

See important disclosures at masterworks.com/cd