深層学習理論チーム（チームリーダー　鈴木大慈）

2021/4/14 12:00

説明

Deep Learning Theory Team (https://aip.riken.jp/labs/generic_tech/deep_learn_theory/?lang=en) at RIKEN AIP

Speaker 1: Taiji Suzuki (30 mins)
Title: Overview of Recent Advances of Deep Learning Theory Researches
Abstract:
In this talk, I will overview recent development of our deep learning theory researches. The main problems of deep learning theory are roughly divided into (1) representation ability, (2) generalization ability, and (3) optimization. We have conducted several researches on these issues. As for (1) representation ability, we consider a setting where the true target function is included in some special function classes such as Besov space, and analyze the approximation ability of deep neural networks. We can show that deep learning can achieve the so-called adaptive approximation. Eventually, deep learning can achieve better rate of convergence to estimate such a “complicated” functions. As for (2) the generalization ability, we briefly introduce the compression ability based generalization ability, and we also discuss generalization ability by revealing connection to optimization ability. As for (3) the optimization ability, we discuss achievability of the global optimal solution by a gradient descent technique. In particular, we consider the global optimality in a mean field regime and discuss generalization error of the solution obtained by gradient descent type methods.

Speaker 2: Atsushi Nitanda (30 mins)
Title: Optimization for two-layer neural networks with quantitative global convergence analysis
Abstract:
The gradient-based method is known to achieve vanishing training error on overparameterized neural networks, despite the nonconvexity of the objective function. Recently, many studies are devoted to explaining the global convergence property. A common idea is to utilize overparameterization of neural networks to translate the training dynamics into function space and exploit the convexity of the objective with respect to the function. These approaches are mainly divided into two categories: the neural tangent kernel (NTK) and mean-field (MF) regimes which deal with different dynamics switched by the scaling factor of neural networks. In this presentation, I would like to talk about our recent advances on both regimes for overparameterized two-layer neural networks. First, for the NTK regime, we show that the averaged stochastic gradient descent can achieve the fast convergence rate under assumptions on the complexities of the target function and the RKHS associated with the NTK. Second, for the MF regime, we give the first quantitative global convergence rate analysis by proposing a new method for the entropic regularized empirical risk minimization in the probability space.

Speaker 3: Kenta Oono (30 mins)
Title: On over-smoothing of graph neural networks
Abstract:
Graph Neural Networks (GNNs) are a collective term of deep learning models for graph-structured data. Recent studies have empirically shown that GNNs performed well in many application fields such as biochemistry, computer vision, and knowledge graph analysis. However, theoretical characteristics of GNNs are less investigated compared with those of classical deep learning models such as fully-connected neural networks or convolutional neural networks. Over-smoothing is one of the challenges of current GNN models, where representations of nodes a GNN makes become indistinguishable as we increase the number of layers of the GNN. This problem prevents us from making a GNN model deep. In this talk, I introduce the over-smoothing problem and explain recent research on it.

Speaker 4: Sho Sonoda (30 mins)
Title: Functional analysis methods for neural network theory
Abstract:
Characterization of the typical solutions obtained by deep learning is an important open problem in machine learning theory. The speaker has been addressing this problem from the viewpoint of functional analysis by using the integral representation of neural networks. The integral representation is known to have a closed-form right inverse operator, called the ridgelet transform, which is related to both the Radon and the wavelet transforms. The speaker has recently shown with his collaborators that for the case of ridge regression by finite two-layer neural networks, the empirical risk minimizers are given by ridgelet transform in the limit of over-parametrization (S-Ishikawa-Ikeda, AISTATS2021). In this talk, the speaker will introduce the ridgelet transform and present recent results to characterize deep learning solutions.

日曜日	月曜日	火曜日	水曜日	木曜日	金曜日	土曜日
		1日のイベントページへのリンク	2日のイベントページへのリンク	3日のイベントページへのリンク	4日のイベントページへのリンク	5日
6日	7日	8日	9日のイベントページへのリンク	10日のイベントページへのリンク	11日	12日
13日	14日	15日のイベントページへのリンク	16日のイベントページへのリンク	17日のイベントページへのリンク	18日	19日
20日	21日	22日	23日	24日	25日	26日
27日	28日	29日	30日	31日

革新知能統合研究センター

動画ライブラリ