Large deep networks achieve state of the art performance on several classification tasks, while at the same time are able to fully memorize arbitrary labelings of the training data. Recently, state of the art models have been observed to express smooth functions of their input data, and such regularity has been connected to a potential implicit regularization effect induced by the model architecture and stochastic gradient optimizers.
In this project, we study how the geometry of learning is affected by model size in the overparameterized regime, in relationship to the test error, for deep neural networks trained in practice, with the goal of understanding how model size biases learning towards recovering smooth interpolating functions of the training data, that at the same time are capable of generalizing to unseen data.
After establishing a relationship between robust interpolation and generalization in supervised learning, in this project we extend our study to self-supervised learning.