EPFL CIS and RIKEN AIP have started a seminar, titled “EPFL CIS – RIKEN AIP Joint Seminar series” from October, 2021.
EPFL is located in Switzerland and is one of the most vibrant and cosmopolitan science and technology institutions. EPFL has both a Swiss and international vocation and focuses on three missions: teaching, research and innovation.
The Center for Intelligent Systems (CIS) at EPFL, a joint initiative of the schools ENAC, IC, SB, STI and SV seeks to advance research and practice in the strategic field of intelligent systems.
RIKEN is Japan’s largest comprehensive research institution renowned for high-quality research in a diverse range of scientific disciplines.
RIKEN Center for Advanced Intelligence Project (AIP) houses more than 40 research teams ranging from fundamentals of machine learning and optimization, applications in medicine, materials, and disaster, to analysis of ethics and social impact of artificial intelligence.
【The 3rd Seminar】
Date and Time: October 20th 5:00pm – 6:00pm(JST)
Speaker: Taiji Suzuki (The University of Tokyo and AIP-RIKEN, Japan)
Title: Optimization theories of neural networks with its statistical perspective
In this talk, I discuss some optimization theories of deep learning and its impact on generalization ability. First, I present a deep learning optimization framework based on a noisy gradient descent in an infinite dimensional Hilbert space (gradient Langevin dynamics), and show generalization error and excess risk bounds for the solution obtained by the optimization procedure. The proposed framework can deal with finite and infinite width networks simultaneously unlike existing one such as neural tangent kernel and mean field analysis. It can be shown that deep learning can avoid the curse of dimensionality in a teacher-student setting, and eventually achieve better excess risk than kernel methods.
Next, I present a particle type optimization technique of two layer neural network in the mean field regime. The proposed method, called particle dual averaging (PDA), generalizes the dual averaging method in a finite dimensional convex optimization to the optimization over
probability distributions, and is justified by quantitative global convergence theory. In addition to that, I present a stochastic dual coordinate ascent version of PDA. Unlike PDA, it can achieve an exponential convergence in terms of the number of the outer loops.
Finally, (if I have time,) I will discuss the generalization error of preconditioned ridgeless regression in the overparameterized regime. In particular, I will discuss the optimal preconditioner for both the bias and variance and how it depends on label noise and shape of the signal.
Taiji Suzuki is currently an Associate Professor in the Department of Mathematical Informatics at the University of Tokyo. He also serves as the team leader of “Deep learning theory” team in AIP-RIKEN. He received his Ph.D. degree in information science and technology from the University of Tokyo in 2009. He has a broad research interest in statistical learning theory on deep learning, kernel methods and sparse estimation, and stochastic optimization for large-scale machine learning problems. He served as area chairs of premier conferences such as NeurIPS, ICML, ICLR, AISTATS and a program chair of ACML. He received the Outstanding Paper Award at ICLR in 2021, the MEXT Young Scientists’ Prize, Outstanding Achievement Award in 2017 from the Japan Statistical Society, Outstanding Achievement Award in 2016 from the Japan Society for Industrial and Applied Mathematics, and Best Paper Award in 2012 from IBISML.