April 25, 2024 17:48


This talk will be held in a hybrid format, both in person at AIP Open Space of RIKEN AIP (Nihonbashi office) and online by Zoom. AIP Open Space: *only available to AIP researchers.

May 29, 2024: 14:30 pm – 15:30 pm (JST)

Adaptive Methods in Machine Learning and Why Adam Works so Well

Frederik Kunstner (University of British Columbia)

The success of the Adam optimizer has made it the default in settings where stochastic gradient descent (SGD) performs poorly. However, our theoretical understanding of why Adam performs better is lagging. The literature presents many competing interpretations and hypotheses, but we do not yet have a clear understanding of which (if any) captures the key problem that Adam “fixes” to outperform SGD. This talk presents empirical results that evaluate recently developed assumptions to model difficulties of modern architectures such as large language models, where a large performance gap between SGD and Adam has been observed. We isolate a key property of language problems — a large vocabulary with a heavy-tailed, unbalanced distribution of output classes — as a potential cause of this performance gap.

Frederik Kunstner is a 5th year PhD student at the University of British Columbia, working with Mark Schmidt. His work is at the intersection of the theory of optimization methods and their application to machine learning, focusing on modeling the difficulties involved in training modern models. Prior to his PhD, Frederik studied at EPFL in Switzerland, and had the opportunity to intern at the RIKEN Center for Advanced Intelligence Project with Emtiyaz Khan in Japan and the Max Planck Institute for Intelligent Systems with Philipp Hennig in Germany.

More Information

Date May 29, 2024 (Wed) 14:30 - 15:30
URL https://c5dc59ed978213830355fc8978.doorkeeper.jp/events/172656

Related Laboratories

last updated on June 13, 2024 10:33Laboratory