Talk by Mr. Ziyin Liu, University of Tokyo on “Stochastic Gradient Descent with Multiplicative Noise”

October 8, 2021 16:06

Description

Title: Stochastic Gradient Descent with Multiplicative Noise

Abstract:
Stochastic gradient descent (SGD) is the main optimization algorithm behind the success of deep learning. Recently, it is shown that the stochastic noise in SGD is multiplicative, i.e., the strength of the noise crucially depends on the model parameter. In this talk, we show that the dynamics of SGD can be very surprising and unintuitive when the noise is multiplicative. For example, we show that (1) SGD may converge to a local maximum; (2) SGD may escape a saddle point arbitrarily slowly; (3) SGD may prefer sharp minima over the flat ones; and (4) AMSGrad may converge to a local maximum. If time allows, we also present some recent results that shed light on how SGD works under the multiplicative noise. This presentation is mainly based on the following three works of the speaker.
[1] https://arxiv.org/abs/2107.11774
[2] https://arxiv.org/abs/2105.09557
[3] https://arxiv.org/abs/2012.03636

Bio:
Liu Ziyin. http://cat.phys.s.u-tokyo.ac.jp/~zliu/

Related Laboratories

last updated on June 19, 2025 14:26Laboratory

High-Dimensional Statistical Modeling Team (2017/3--2022/3)

Sunday	Monday	Tuesday	Wednesday	Thursday	Friday	Saturday
		Link to the event page for the 1st	Link to the event page for the 2nd	Link to the event page for the 3rd	Link to the event page for the 4th	5th
6th	7th	8th	Link to the event page for the 9th	Link to the event page for the 10th	11th	12th
13th	14th	15th	Link to the event page for the 16th	17th	18th	19th
20th	21th	22th	23th	24th	25th	26th
27th	28th	29th	30th	31th

Center for Advanced Intelligence Project

Videos