Fri, March 1, 9:00 AM
90 MINUTES
Towards Understanding of Complex Activity Videos

Humans perform a wide range of complex activities, such as cooking hour-long recipes, assembling and repairing devices and performing surgeries. Many of these activities are procedural: they consist of sequences of steps that must be followed to achieve the desired goals. Learning complex procedural activities from videos allows us to design intelligent task assistants, robots and coaching platforms that either perform or guide users through different tasks. However, learning from complex activity videos has many challenges: the videos are long, uncurated and contain many task-irrelevant activities, different videos show different ways of performing the same task or step, gathering framewise annotation is not scalable to many videos and tasks, and steps are often fine-grained. In this talk, I will discuss methods for efficient and robust learning from complex activity videos that address these challenges. I present new methods that bring together and extend sequence alignment, deep attention models and compositional learning and enable learning in an unsupervised or a weakly-supervised fashion.

Ehsan Elhamifar

Associate Professor @ Northeastern University