Scale-covariant deep networks

SUPR uses JavaScript for certain functions. We cannot guarantee that you will be able to use the system with JavaScript disabled.

Dnr:

NAISS 2025/5-378

Type:

NAISS Medium Compute

Principal Investigator:

Tony Lindeberg

Affiliation:

Kungliga Tekniska högskolan

Start Date:

2025-08-02

End Date:

2026-09-01

Primary Classification:

10207: Computer graphics and computer vision (System engineering aspects at 20208)

Secondary Classification:

20208: Computer Vision and learning System (Computer Sciences aspects in 10207)

Webpage:

https://www.kth.se/profile/tony/page/deep-networks

Allocation

Dardel-GPU at PDC: 2000 GPU-h/month
Klemming at PDC: 500 GiB
Alvis at C3SE: 500 GPU-h/month
Mimer at C3SE: 500 GiB
Dardel at PDC: 10 x 1000 core-h/month

Abstract

Due to the fact that objects in the world may be of different size and at different distances from the camera, there may in general be substantial (a priori unknown) scaling variabilities in the image data generated from a natural environment. Traditional deep networks are by default, however, not robust to such scaling variabilities. To address this problem, we will in this project develop scale-covariant deep networks, which obey provable covariance properties under spatial scaling transformations. Specifically, we will study extensions a previously proposed notion of scale-covariant and scale-invariant Gaussian derivative networks, to enable classification at scales that are not spanned by the training data. The research that we will perform will comprise extensions of the network architecture, including design parameters, to handle more complex data sets than considered in our previous work (Lindeberg 2022, Perzanowski and Lindeberg 2025), including extensive experimental work on comparing different scale-covariant network architectures on both single-scale image classification tasks and multi-scale scale generalisation tasks. The reason why we need better GPUs, than those that we have had previously access to, is to explore larger networks, that have more parameters and thereby a much better ability to learn the image structures needed to handle more complex datasets, which we believe could substantially improve the scale generalisation properties. In the work performed with the computing resources provided by our previous 6-month grant NAISS 2025/5-22, we have developed new scale-covariant and scale-invariant deep network architectures and shown that these architectures lead to both significantly better accuracy and better scale generalisation on one dataset compared to our previous work in (Lindeberg 2022, Perzanowski and Lindeberg 2025). We have also performed extensive ablation studies on two datasets, as well as initiated work on a third dataset, containing more complex image data than considered in our previous work In this proposal, we will continue our work on the new dataset as well as a possible fourth dataset and also perform a set of additional ablation studies, which requires access to high-performance GPU:s to be able to handle larger deep networks. Complementary GPU resources are specifically necessary to complete the work performed under the previous 6-month project into a solid journal publication. When that goal has been completed, we will use any possibly remaining GPU resources for initiating our next planned subproject within the project on ”Covariant and invariant deep networks” supported by Vetenskapsrådet that finances this research. References: Lindeberg (2022) "Scale-covariant and scale-invariant Gaussian derivative networks", Journal of Mathematical Imaging and Vision, 64(3): 223-242. Perzanowski and Lindeberg (2025) "Scale generalisation properties of extended scale-covariant and scale-invariant Gaussian derivative networks on image datasets with spatial scaling variations”, Journal of Mathematical Imaging and Vision, 67(29): 1-39.