• AI Research Insights
  • Posts
  • 🚀 Exciting AI Updates: OpenAI just launched the official ChatGPT app for iOS + Google AI Introduces their latest PaLM model, PaLM 2 + Can pretrained language models (LMs) go beyond learning from labels and scalar rewards? ....

🚀 Exciting AI Updates: OpenAI just launched the official ChatGPT app for iOS + Google AI Introduces their latest PaLM model, PaLM 2 + Can pretrained language models (LMs) go beyond learning from labels and scalar rewards? ....

This newsletter brings AI research news that is much more technical than most resources but still digestible and applicable

OpenAI just launched the official ChatGPT app for iOS: According to Statista, the current number of smartphone users worldwide today is 6.92 billion. This means 86.29% of the world’s population owns a smartphone and will have access to ChatGPT soon. Now, you can instantly get answers to practically anything without logging into the web version of ChatGPT. Another unique new feature for the iPhone app is the history search bar. Never lose track of your old chats using this.

Salesforce AI Introduces CodeT5+: LLMs for Code Understanding and Generation. CodeT5+ is a family of encoder-decoder LLMs for code in which component modules can be flexibly combined to suit a wide range of downstream code tasks. CodeT5+ 16B achieves new SoTA results of 35.0% pass@1 and 54.5% pass@10 on the HumanEval code generation task against other open code LLMs, even surpassing the OpenAI code-Cushman-001 model.

Meet VideoChat: An End-to-End Chat-Centric Video Understanding System Developed by Merging Language and Visual Models. Researchers from the Shanghai AI Laboratory’s OpenGVLab, Nanjing University, the University of Hong Kong, the Shenzhen Institute of Advanced Technology, and the Chinese Academy of Sciences collaborated to create VideoChat. This innovative end-to-end chat-centric video understanding system employs state-of-the-art video and language models to enhance spatiotemporal reasoning, event localization, and causal relationship inference. The group developed a novel dataset containing thousands of videos and densely captioned descriptions and discussions given to ChatGPT chronologically.

Google AI Introduces their latest PaLM model, PaLM 2, which builds on our fundamental research and our latest infrastructure. It’s highly capable of a wide range of tasks and easy to deploy. PaLM 2 displays competitive performance in mathematical reasoning compared to GPT-4; the instruction-tuned model, Flan-PaLM 2, shows good performance. PaLM 2 achieves state of the art results on reasoning benchmark tasks such as WinoGrande and BigBench-Hard. It is significantly more multilingual than our previous large language model, PaLM, achieving better results on benchmarks such as XSum, WikiLingua and XLSum. PaLM 2 also improves translation capability over PaLM and Google Translate in languages like Portuguese and Chinese.

Can pretrained language models (LMs) go beyond learning from labels and scalar rewards? LeTI, a new LM finetuning paradigm that explores LMs' potential to learn from textual interactions & feedback, allowing LMs to understand not just if they were wrong but why. LeTI focuses on code generation tasks where models produce code from natural language instructions. This allows us to acquire automatic textual feedback in a natural and scalable way: error messages and stack traces from a Python interpreter.

What You See is What You Read? Improving Text-Image Alignment Evaluation. In this research from Google and Hebrew University, researchers first introduce SeeTRUE: a comprehensive evaluation set, spanning multiple datasets from both text-to-image and image-to-text generation tasks, with human judgements for whether a given text-image pair is semantically aligned. They then describe two automatic methods to determine alignment: the first involving a pipeline based on question generation and visual question answering models, and the second employing an end-to-end classification approach by finetuning multimodal pretrained models.

ETH Zurich unveils MaskFreeVIS: novel high-performing VIS without any mask annotations. This research proposes MaskFreeVIS, achieving highly competitive VIS performance while only using bounding box annotations for the object state. We leverage the rich temporal mask consistency constraints in videos by introducing the Temporal KNN-patch Loss (TK-Loss), providing strong mask supervision without any labels.

Featured AI Tools For This Newsletter Issue:

Bright Data

DoNotPay

AdCreative.ai

BuzzSumo

tinyEinstein

Find 100s of cool artificial intelligence (AI) tools. Our expert team reviews and provides insights into some of the most cutting-edge AI tools available. Check out AI Tools Club