Training Deep Neural Networks in a Subspace

SUPR uses JavaScript for certain functions. We cannot guarantee that you will be able to use the system with JavaScript disabled.

Dnr:

NAISS 2024/22-189

Type:

NAISS Small Compute

Principal Investigator:

Sourav Sarkar

Affiliation:

Uppsala universitet

Start Date:

2024-02-12

End Date:

2024-10-01

Primary Classification:

10201: Computer Sciences

Webpage:

Allocation

Alvis at C3SE: 750 GPU-h/month
Mimer at C3SE: 500 GiB

Abstract

Stochastic Gradient Descent (SGD), which is a linear optimization scheme, is universally used to train Deep Neural Networks (DNN). Despite it's success, it is not yet transparent why it proves to be so useful on non-convex loss landscapes that are typical in Deep Learning. And a generalization of the method to non-linear optimization schemes is also not feasible due to the huge parameter space of a DNN. There has been some progress in addressing both of these questions through empirical results in the literature that show that during SGD performed on DNN based image classification models, the trajectory of the model on the loss landscape evolves in only a small subspace of the entire parameter space. We aim to develop a low-rank optimization strategy to leverage this low-rank structure of the training dynamics. This would allow us to use second-order optimization techniques to train Neural Networks more efficiently, and also enable us to speed up the training process as it is restricted to a small subspace. We wish to carry out our investigations on DNNs designed for image classification.