SUPR
Training AI models for Interactive BioImage Analysis
Dnr:

NAISS 2023/6-197

Type:

NAISS Medium Storage

Principal Investigator:

Wei Ouyang

Affiliation:

Kungliga Tekniska högskolan

Start Date:

2023-06-29

End Date:

2024-07-01

Primary Classification:

10799: Other Natural Sciences not elsewhere specified

Secondary Classification:

10299: Other Computer and Information Science

Tertiary Classification:

10610: Bioinformatics and Systems Biology (methods development to be 10203)

Allocation

Abstract

In the swiftly progressing field of life sciences, effectively interpreting large-scale bioimage data has become a central challenge. With the escalating volume of data generated, the traditional local data storage and processing methods are increasingly ill-suited to manage the requirements of advanced tasks such as AI-powered image analysis. This project addresses these issues by marrying the latest developments in deep learning with cutting-edge, cloud-based data management techniques. Our primary goal is to train foundation models for interactive bioimage analysis. We aim to harness recent advances in self-supervised learning, large language models, and diffusion models, to establish comprehensive models capable of undertaking a wide variety of image transformation tasks. These models will be trained on both existing public bioimage datasets and new data contributed over time by the users of the BioImage.IO portal, enhancing their capabilities and capacity for growth. Moreover, we recognize the crucial importance of robust data storage in enabling these ambitious aims. Efficient storage and retrieval of high-quality bioimages are instrumental in our project, and thus we are seeking a significant allocation of storage on the Alvis computing cluster. With a large storage capacity, we can ensure the smooth flow of data during model training and validation, while maintaining the quality and integrity of our ever-growing image dataset. To further address the data management challenge, we propose the BioEngine platform. This is a web-based platform built atop Hypha, with BioEngine integrating containerized services for scalable data management and AI model serving. This platform, which will use the requested storage on Alvis, provides adaptable solutions for bioimage data management and model serving in the cloud, while handling vast datasets efficiently. BioEngine will also support the test run feature in the BioImage Model Zoo website, part of the AI4Life project. We aim to facilitate user engagement, intending to provide a deployment toolkit for users to set up their servers, either on an institutional Kubernetes cluster or a workstation. Our objective is to establish a standard for managing and sharing bioimage data, in harmony with the BioImage Archive. To ensure the feasibility of this goal, our request for a considerable amount of storage on the Alvis cluster is crucial. This project, generously supported by AI4Life and the KAW Data-Driven Life Science Fellows program, aims to revolutionize biological image analysis by leveraging deep learning techniques and high-capacity, efficient data storage. With the storage capabilities of Alvis, we look forward to contributing significantly to data-driven life science research.