Sun, March 3, 9:00 AM
90 MINUTES
Large Multimodal Models: From Flamingo to GPT-4V and Gemini
Multimodal architectures showcase an incredible talent for interpreting and generating content across a spectrum of data types, including textual, visual, and at times, auditory formats. Such adaptability forges new frontiers for AI applications, enriching the way users engage with tasks by making them more natural and in tune with human perception. Our exploration commences with Flamingo, proceeds through LLaVa and MoE LLaVa, and culminates in an examination of the capabilities of GPT-4V and Gemini, which are at the forefront of Large Multimodal Models (LMMs).
PhD Candidate @ University of Tehran