ML for Computer Vision

Dnr:

NAISS 2023/6-307

Type:

NAISS Medium Storage

Principal Investigator:

Fredrik Kahl

Affiliation:

Chalmers tekniska högskola

Start Date:

2023-11-01

End Date:

2024-11-01

Primary Classification:

10207: Computer Vision and Robotics (Autonomous Systems)

Machine learning research conducted at the computer vision group at electrical
engineering, Chalmers.
More specifically, we are currently working on the three following problems, all related to the 3D scene perception, where we need access to the GPU cluster and be able to store large image data sets such as Megadepth, ImageNet and more.
1. Generalization and invariance
One of the key factors for the success of deep learning for 2D image interpretation is the convolutional layer, which induces translation equivariance. That is, if the input is translated, then the output is too. It drastically reduces the number of learnable parameters, compared to a fully connected layer, and increases generalization performance. For 3D data, like point clouds, meshes and graphs, other equivariances have been investigated for classification, but except for convolutions, they are not exploited in encoder-decoder models. We analyze, construct and develop efficient implementations for other equivariances, both for the encoder hϕ ◦ g(X) = g ◦ hϕ(X) and similarly for the decoder, fθ ◦ g(Z) = g ◦ fθ(Z) for all g belonging to a group. We will investigate the groups of SO(3), SE(3) and their subgroups, for instance, azimuthal rotations, especially for the application of image matching.
2. Dynamics and deformation.
Earlier attempts to extend implicit neural 3D scene representations to dynamic scenes and deformable objects, typically learn a model that controls how projection rays are bent to account for deformations observed in each input frame, with no consistency enforced over time or ability to interpolate between frames. To prevent the model from collapsing into a set of flat surfaces, some regularization of the deformation is required, which limits the amount of change that can be tolerated. We explore methods to overcome these limitations by providing regularization in terms of trajectories modelled in a learned deformation space. We intend to learn latent space representations of deformation fields and enforce continuity by exploiting the sequential nature of the data. The ambition is to decouple deformations from geometry, aggregate information over multiple scenes, and learn deformation models that can be quickly adapted to novel scenes, by relying on meta-learning.
3. Flexible and controllable 3D generative models.
Given a trained encoder-decoder model, we need to be able to perform inference on incomplete data. Typically, objects are only visible from a few viewpoints or even a single viewpoint, whereas the encoder-decoder model training usually relies on complete data. There are several possible approaches to cope with missing data: (i) Adapt the encoder so to accommodate missing data, (ii) Optimize the latent variable representation Z such that the it matches the given observations. More specifically, for a given 2D observation I, we add the loss function ||π ◦ fθ(Z) − I||2 and minimize over Z. We explore both approaches, including hybrid variants depending on the application scenario. We are currently focusing on controllable outdoors scenes where one can change conditions according to lighting, season and weather.