Large Multimodal Models: Notes on CVPR 2023 Tutorial

06/26/2023
by   Chunyuan Li, et al.
0

This tutorial note summarizes the presentation on “Large Multimodal Models: Towards Building and Surpassing Multimodal GPT-4”, a part of CVPR 2023 tutorial on “Recent Advances in Vision Foundation Models”. The tutorial consists of three parts. We first introduce the background on recent GPT-like large models for vision-and-language modeling to motivate the research in instruction-tuned large multimodal models (LMMs). As a pre-requisite, we describe the basics of instruction-tuning in large language models, which is further extended to the multimodal space. Lastly, we illustrate how to build the minimum prototype of multimodal GPT-4 like models with the open-source resource, and review the recently emerged topics.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset
Success!
Error Icon An error occurred

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro