Group equivariant CNNs

Dnr:

NAISS 2023/6-349

Type:

NAISS Medium Storage

Principal Investigator:

Daniel Persson

Affiliation:

Chalmers tekniska hÃ¶gskola

Start Date:

2023-12-01

End Date:

2024-12-01

Primary Classification:

10799: Other Natural Sciences not elsewhere specified

Despite the overwhelming success of deep neural networks we are still at a loss for explaining exactly how deep learning works, and why it works so well. What is the mathematical framework underlying deep learning? One promising direction is to consider symmetries as an underlying design principle for network architectures. This can be implemented by constructing deep neural networks on a group G that acts transitively on the input data. This is directly relevant for instance in the case of spherical signals where G is a rotation group.
Even more generally, it is natural to consider the question of how to train neural networks in the case of "non-Euclidean data''. Relevant applications include omnidirectional computer vision, biomedicine, and climate observations, just to mention a few situations where data is naturally "non-flat''. Mathematically, this calls for developing a theory of deep learning on manifolds, or even more exotic structures, like graphs or algebraic varieties.
The aim of this project is to use techniques and theorems from mathematics and physics to develop a general framework for efficiently applying convolutional neural networks (CNNs) non-Euclidean data. The project aims to apply this formalism to concrete problems arising in autonomous driving, where a general framework for applying CNNs to non-Euclidean data is highly desirable. In particular, it will be applied to image recognition problems and object detection from Fisheye cameras, as well as for interpolated point clouds arising from the Lidars, mounted on the self-driving vehicle. This part of the project will be pursued in collaboration with Zenseact.
As is clear from our previous results presented in the report, equivariance improves per-sample efficiency, reducing the need for data augmentation. Group equivariance has successfully been implemented in convolutional neural networks. Recently, however, transformer networks have increased in popularity, in particular with impressive results in natural language processing. We have thus initiated studies toward exploring equivariance in transformers. As a first step we have developed, HEAL-SWIN, a model for training the Swin transformer on the sphere using the Healpix grid developed by Nasa. The model was successfully tested on the Woodscape data set of traffic fisheye images, with natural applications to autonomous driving. A preprint is available here:
https://arxiv.org/abs/2307.07313.