Semi-supervised multi-task deep learning

NAISS 2023/5-460


NAISS Medium Compute

Principal Investigator:

Atsuto Maki


Kungliga Tekniska högskolan

Start Date:


End Date:


Primary Classification:

10207: Computer Vision and Robotics (Autonomous Systems)




In the context of robotics perception, autonomous driving or smart cities, a complete understanding of the surroundings is needed and can be achieved by applying several computer vision techniques. Using multitask learning (MTL) is a natural direction towards having a complete scene understanding in a compact and resource­-efficient way by sharing computation between tasks, while also promising improved results. Three main challenges appear in the current approaches of using MTL: (i) there is only partial data labeling available for training, i.e. each sample can be labeled for none, some or all tasks, (ii) training becomes a joint optimization problem with different characteristics, and (iii) the network architecture has to be designed taking into account performance and usage of resources when deciding which layers to share. In this project, we focus on the first challenge of how to learn from partially labeled data. In previous projects, we have work on semi-supervised learning (SSL) methods for single-task problems with image classification and how sampling data for training impacts learning. We have also extended this SSL method to dense prediction tasks such as semantic segmentation. Now, we use the method for learning multiple tasks, using also the object detection task, and train jointly a model for all tasks in different labeling scenarios that arise in real-world scenarios, e.g. different amounts of labeled samples per task, full or partial overlap between the labeled sets for each task, etc. We use two common datasets with labeled and unlabeled data for the different tasks: the Cityscapes dataset and BDD100K, both in the domain of perception for autonomous driving. In SSL, the labeled and unlabeled sets are expected to belong to the same domain, e.g. data obtained from the same city, same weather conditions, etc. Leaving this limitation, we will continue by going into unsupervised domain adaptation, where the unlabeled data belongs to a different domain, and the model is evaluated in all domains, also in the multitask setting. BDD100K will allow us to do that since it has day and night data, different weather conditions, and we can also use both Cityscapes and BDD100K as two distinct domains. Finally, we will apply our method to a more specific problem that comes with its own challenges: road damage segmentation, i.e. segmentation of defects on the road pavement. In road damage segmentation, the class imbalance is even larger than in the aforementioned datasets since most unlabeled images do not contain any damage. In addition, the largest public dataset for this problem contains data from 6 different countries collected in different conditions, which offers a great way to evaluate the domain adaptation capabilities of our method.