Sun, March 3, 9:00 AM
90 MINUTES
Large Multimodal Models: From Flamingo to GPT-4V and Gemini

Multimodal architectures showcase an incredible talent for interpreting and generating content across a spectrum of data types, including textual, visual, and at times, auditory formats. Such adaptability forges new frontiers for AI applications, enriching the way users engage with tasks by making them more natural and in tune with human perception. Our exploration commences with Flamingo, proceeds through LLaVa and MoE LLaVa, and culminates in an examination of the capabilities of GPT-4V and Gemini, which are at the forefront of Large Multimodal Models (LMMs).

Hamed Ghasemi

PhD Candidate @ University of Tehran