• AI Research Insights
  • Posts
  • AI News: Is ChatGPT really 175 Billion Parameters?; Largest Vision-Language Model+ Robot Experience; Flan-UL2 20B.....

AI News: Is ChatGPT really 175 Billion Parameters?; Largest Vision-Language Model+ Robot Experience; Flan-UL2 20B.....

This newsletter brings AI research news that is much more technical than most resources but still digestible and applicable

In this edition of our newsletter, we bring you the latest updates on language and vision models. Discover the new and improved Flan-UL2 20B from Google, the largest vision-language model PaLM-E, and the controversy surrounding ChatGPT's parameter count. We also introduce you to VoxFormer, a novel sparse voxel transformer, and discuss a new AI research proposal from UC Berkeley. Stay informed on the cutting-edge developments in the field by reading on!

Did you like Flan-T5? Now Check out the new open source Flan-UL2 20B: Flan-UL2 (20B params) from Google is so far the best open source LLM out there, as measured on MMLU (55.7) and BigBench Hard (45.9). It surpasses Flan-T5-XXL (11B). It's been instruction fine-tuned with a 2048 token window.

What happens when we train the largest vision-language model and add in robot experiences? Meet PaLM-E 🌴, a 562-billion parameter, general-purpose, embodied visual-language generalist - across robotics, vision, and language. PaLM-E enables robot planning directly from pixels – all in a single model, trained end-to-end. PaLM-E is the largest VLM reported to date.

Is ChatGPT really 175 Billion Parameters? 🤔This blog post from Owen concretely disproves this theory with publicly available information and verifiable, reproducible analysis. It is typical to store LLM weights as 8-bit integers in the INT8 format for lower latency inferencing, higher throughput and a 2x lower memory footprint compared to storing them in the float16 format. It takes 1 byte to store each INT8 parameter. Simple math shows the model will take 175GB of space to store.

VoxFormer: A novel sparse voxel transformer that improves the efficiency and accuracy of camera-based 3D semantic scene completion. The proposed method uses a two-stage approach that first predicts a sparse set of 3D voxels from RGB images and then refines them to obtain the final dense voxel representations. The method incorporates a transformer-based encoder-decoder network that is designed to handle both local and global context information.

🚀 How can meta-learning, self-attention & JAX power the next generation of Evolutionary Optimizers? A new AI paper from Deepmind leverages the insight that Evolution Strategies ES updates are inherently set operations & parametrize a new family of ES optimization algorithms using tiny Set Transformers. The weights of this attention-based ∇-free optimizer are meta-evolved on a class of diverse optimization tasks. The resulting learned evolution strategy (LES) is capable of strong transfer to previously unseen tasks, longer horizons & larger search spaces. Importantly, it generalizes to high-dim neural network tasks without having been meta-trained on such strong generalization!

D5 Task: A New AI Research from UC Berkeley Proposes A D5 Task And A Benchmark Dataset To Make LLMs Do Research. The D5 task, proposed by the researchers, is a goal-driven method of discovering dissimilarities in distributions using linguistic descriptions. This finding must meet two criteria: (1) it must be true (that is, the predicate is more often true for corpus A than B), and (2) it must be driven by the study purpose and hence be relevant, innovative, and noteworthy.