October 8, 2021 16:06


Title: Stochastic Gradient Descent with Multiplicative Noise

Stochastic gradient descent (SGD) is the main optimization algorithm behind the success of deep learning. Recently, it is shown that the stochastic noise in SGD is multiplicative, i.e., the strength of the noise crucially depends on the model parameter. In this talk, we show that the dynamics of SGD can be very surprising and unintuitive when the noise is multiplicative. For example, we show that (1) SGD may converge to a local maximum; (2) SGD may escape a saddle point arbitrarily slowly; (3) SGD may prefer sharp minima over the flat ones; and (4) AMSGrad may converge to a local maximum. If time allows, we also present some recent results that shed light on how SGD works under the multiplicative noise. This presentation is mainly based on the following three works of the speaker.
[1] https://arxiv.org/abs/2107.11774
[2] https://arxiv.org/abs/2105.09557
[3] https://arxiv.org/abs/2012.03636

Liu Ziyin. http://cat.phys.s.u-tokyo.ac.jp/~zliu/

Related Laboratories

last updated on October 13, 2021 13:19Laboratory