Forecasting is a capability inherent in humans when navigating. Humans routinely plan their paths, considering the potential future movements of those around them. Similarly, to achieve comparable sophistication and safety, autonomous systems must embrace this predictive nature. Deep generative models have played a pivotal role in advancing these systems in recent years. The presentation begins with the introduction of generative models in trajectory forecasting. A novel automated assessment, described as an essential but previously unexplored approach, is presented to objectively evaluate the performance of these models, shedding light on how state-of-the-art models can generate forecasts that violate social norms and scene constraints. To mitigate that, the impact of additional visual cues that humans subconsciously exhibit when navigating space is explored. Moving on to a fine-grained representation, human body pose forecasting is discussed. A generic model is introduced to deal with not only clean environments but also real-world noisy observations. Employing a diffusion-based approach for pose forecasting, the task is framed as a denoising problem. The presentation closes by briefly exploring future directions in this field.
Computer Vision Researcher @ EPFL (École polytechnique fédérale de Lausanne)