• AI Research Insights
  • Posts
  • AI News: 🚀 Instruct-NeRF2NeRF | LLM training steps can be reduced by up to 80% | Detect text written by LMs with DIPPER | Ablating Concepts in Text-to-Image Diffusion Models....

AI News: 🚀 Instruct-NeRF2NeRF | LLM training steps can be reduced by up to 80% | Detect text written by LMs with DIPPER | Ablating Concepts in Text-to-Image Diffusion Models....

This newsletter brings AI research news that is much more technical than most resources but still digestible and applicable

GPT-4 Passes Medical Exams: Large language models (LLMs) have demonstrated remarkable capabilities in natural language understanding and generation across various domains, including medicine. A new research presents a comprehensive evaluation of GPT-4, a state-of-the-art LLM, on medical competency examinations and benchmark datasets. The study show GPT-4 exceeds on US Medical Licensing Exam by 20+ points. It is unable to find evidence of training data memorization. It outperforms LLMs fine-tuned on medical data. GPT-4 is much better at predicting likelihood answers are correct than GPT-3.

Meet Instruct-NeRF2NeRF: A new AI method for editing 3D scenes with Text-Instructions. Given a NeRF of a scene and the collection of images used to reconstruct it, this method uses an image-conditioned diffusion model (InstructPix2Pix) to iteratively edit the input images while optimizing the underlying scene, resulting in an optimized 3D scene that respects the edit instruction. The research team demonstrated that the proposed method can edit large-scale, real-world scenes and accomplish more realistic, targeted edits than prior work.

A group of researchers from Columbia University, Google and UCLA propose CoBIT, a Unicode-decoder architecture, pre-trained jointly on contrastive loss, image-to-text generation loss, and text-to-image generation loss. CoBIT can address a variety of vision and vision-language tasks in the manner of both zero-shot and fine-tuning. In the image below, the right-hand side displays the zero-shot generated images by CoBIT given novel prompts, and the zero-shot generated captions by CoBIT given the previously generated images as input.

This AI research from DFKI GmbH shows that LLM training steps can be reduced by up to 80% by reusing public and smaller models. The researchers introduce a cross-lingual and progressive transfer learning approach, called CLP-Transfer, that transfers models from a source language, for which pretrained models are publicly available, like English, to a new target language. As opposed to prior work, which focused on the cross-lingual transfer between two languages, the research team extended the transfer to the model size. Given a pretrained model in a source language, they aimed for a same-sized model in a target language. Instead of training a model from scratch, they exploit a smaller model that is in the target language but requires much fewer resources. Both small and source models are then used to initialize the token embeddings of the larger model based on the overlapping vocabulary of the source and target language. All remaining weights are reused from the model in the source language. This approach outperforms the sole cross-lingual transfer and can save up to 80% of the training steps compared to the random initialization.

The inventor of LSTM discovers conformal prediction, and writes a paper “Conformal Prediction for Time Series with Modern Hopfield Networks”. Conformal prediction + time series has reached a totally new level of 🔥. This isn’t your typical deep learning paper for time series, in a range of comprehensive testing the authors have benchmarked it pretty much against every other previous SOTA conformal prediction method for time series. HopCPT is a serious entrant and it has also put Austria on the conformal prediction map.

To detect text written by LMs like ChatGPT, many methods have recently emerged: DetectGPT, watermarks, and GPTZero. A new AI research presents DIPPER, a discourse paraphrase generation model that can rewrite multiple sentences of text and optionally leverage surrounding context. The research group use DIPPER to stress test current AI-generated text detectors, and they found that DIPPER paraphrases easily evade these detectors while approximately preserving input semantics.

Unit Scaling: In a recent research paper ‘Unit Scaling: Out-of-the-Box Low-Precision Training,’ researchers describe a scheme for designing neural networks that have approximate unit variance after every operation in the forward and backward pass. This can be seen as an alternative to (static) loss scaling, or its automatic variant, as used in Automatic Mixed Precision. Whereas both of those schemes rely on a single, global scaling factor for all the gradients, unit scaling is more fine-grained. A unit-scaled model adds scaling factors (constant scalar multiplications) to each operation in the computational graph to achieve this unit variance property. The result is a model which naturally produces tensors in the middle of the dynamic range provided by floating-point formats. There's no extra loss-scale hyperparameter—it works out-of-the-box!

Ablating Concepts in Text-to-Image Diffusion Models: CMU researchers propose an efficient method of ablating concepts in the pretrained model, i.e., preventing the generation of a target concept. Their algorithm learns to match the image distribution for a given target style, instance, or text prompt we wish to ablate to the distribution corresponding to an anchor concept, e.g., Grumpy Cat to Cats.

Do You Know Marktechpost has a community of 1.5 Million+ AI Professionals and Engineers?