Fri, April 11, 5:45 PM
60 MINUTES
Towards Next Generation of Deep Learning Architectures

Over more than a decade there has been an extensive research effort on how to effectively utilize recurrent neural networks and attention. While recurrent models aim to compress the data into a fixed-size memory (called hidden state), attention allows attending to the entire context window, capturing the direct dependencies of all tokens. This more accurate modeling of dependencies, however, comes with a quadratic cost, limiting the model to a fixed-length context. In this talk, we first review the recent advancements in deep learning architectures, their (dis)advantages, and finally discuss a new perspective to look at Transformers and recurrence neural networks, providing new insights to design more powerful large language models (LLMs) in the future.

Ali Behrouz

PhD student @ Cornell University