#### Description

**Deep Learning Theory Team** (https://aip.riken.jp/labs/generic_tech/deep_learn_theory/?lang=en) at RIKEN AIP

**Speaker 1**: Taiji Suzuki (30 mins)

**Title**: Overview of Recent Advances of Deep Learning Theory Researches

**Abstract**:

In this talk, I will overview recent development of our deep learning theory researches. The main problems of deep learning theory are roughly divided into (1) representation ability, (2) generalization ability, and (3) optimization. We have conducted several researches on these issues. As for (1) representation ability, we consider a setting where the true target function is included in some special function classes such as Besov space, and analyze the approximation ability of deep neural networks. We can show that deep learning can achieve the so-called adaptive approximation. Eventually, deep learning can achieve better rate of convergence to estimate such a “complicated” functions. As for (2) the generalization ability, we briefly introduce the compression ability based generalization ability, and we also discuss generalization ability by revealing connection to optimization ability. As for (3) the optimization ability, we discuss achievability of the global optimal solution by a gradient descent technique. In particular, we consider the global optimality in a mean field regime and discuss generalization error of the solution obtained by gradient descent type methods.

**Speaker 2**: Atsushi Nitanda (30 mins)

**Title**: Optimization for two-layer neural networks with quantitative global convergence analysis

**Abstract**:

The gradient-based method is known to achieve vanishing training error on overparameterized neural networks, despite the nonconvexity of the objective function. Recently, many studies are devoted to explaining the global convergence property. A common idea is to utilize overparameterization of neural networks to translate the training dynamics into function space and exploit the convexity of the objective with respect to the function. These approaches are mainly divided into two categories: the neural tangent kernel (NTK) and mean-field (MF) regimes which deal with different dynamics switched by the scaling factor of neural networks. In this presentation, I would like to talk about our recent advances on both regimes for overparameterized two-layer neural networks. First, for the NTK regime, we show that the averaged stochastic gradient descent can achieve the fast convergence rate under assumptions on the complexities of the target function and the RKHS associated with the NTK. Second, for the MF regime, we give the first quantitative global convergence rate analysis by proposing a new method for the entropic regularized empirical risk minimization in the probability space.

**Speaker 3**: Kenta Oono (30 mins)

**Title**: On over-smoothing of graph neural networks

**Abstract**:

Graph Neural Networks (GNNs) are a collective term of deep learning models for graph-structured data. Recent studies have empirically shown that GNNs performed well in many application fields such as biochemistry, computer vision, and knowledge graph analysis. However, theoretical characteristics of GNNs are less investigated compared with those of classical deep learning models such as fully-connected neural networks or convolutional neural networks. Over-smoothing is one of the challenges of current GNN models, where representations of nodes a GNN makes become indistinguishable as we increase the number of layers of the GNN. This problem prevents us from making a GNN model deep. In this talk, I introduce the over-smoothing problem and explain recent research on it.

**Speaker 4**: Sho Sonoda (30 mins)

**Title**: Functional analysis methods for neural network theory

**Abstract**:

Characterization of the typical solutions obtained by deep learning is an important open problem in machine learning theory. The speaker has been addressing this problem from the viewpoint of functional analysis by using the integral representation of neural networks. The integral representation is known to have a closed-form right inverse operator, called the ridgelet transform, which is related to both the Radon and the wavelet transforms. The speaker has recently shown with his collaborators that for the case of ridge regression by finite two-layer neural networks, the empirical risk minimizers are given by ridgelet transform in the limit of over-parametrization (S-Ishikawa-Ikeda, AISTATS2021). In this talk, the speaker will introduce the ridgelet transform and present recent results to characterize deep learning solutions.