Classical machine learning assumes that the distribution of training data is representative of inputs encountered by the trained system in deployment. In real-world applications, this is rarely accurate. For example, for a classifier trained to identify lesions or diagnose patients from chest X-ray images, the distribution of images is dependent on the equipment and processes of the radiology department in which they were collected. An ML system trained in one hospital may be inaccurate if used in a different hospital.
Domain adaptation (DA) is the problem of generalizing across input domains. Traditional DA requires strong assumptions on the similarity between training and test distributions, which are known to be violated in practical problems. In this project, we aim to develop theory and algorithms that are guaranteed to work well under much weaker assumptions. To do this, it is vital that the problems we study are representative of important application domains. For this reason, we will train deep neural network systems which learn to classify realistic images, generalizing between domains.