AI for Molecular Engineering

SUPR uses JavaScript for certain functions. We cannot guarantee that you will be able to use the system with JavaScript disabled.

Dnr:

NAISS 2026/3-549

Type:

NAISS Medium

Principal Investigator:

Rocio Mercado

Affiliation:

Chalmers tekniska högskola

Start Date:

2026-07-01

End Date:

2027-07-01

Primary Classification:

10201: Computer Sciences

Secondary Classification:

10203: Bioinformatics (Computational Biology) (Applications at 10610)

Tertiary Classification:

10403: Materials Chemistry

Webpage:

https://ailab.bio/

Allocation

Arrhenius Disk at NAISS: 35000 GiB
Arrhenius GPU at NAISS: 5000 GPU-h/month
Dardel-GH at PDC: 1000 GPU-h/month
Dardel-GPU at PDC: 1000 GPU-h/month
Klemming at PDC: 500 GiB
Arrhenius CPU at NAISS: 30 x 1000 core-h/month

Abstract

The AI Laboratory for Molecular Engineering (AIME), led by Assist. Prof. Rocío Mercado Oropeza in the Section for Data Science and AI, Department of Computer Science and Engineering at Chalmers University of Technology, develops AI-driven methods for molecular engineering at the intersection of machine learning, chemistry, and the life sciences. This proposal requests a Medium Compute allocation to support the group's research for 2026–2027. The group currently comprises one faculty member, four postdoctoral researchers, eleven PhD students, and approximately eight MSc/BSc thesis students, together with eleven co-advised students and postdocs, working across AI-driven drug and materials discovery and molecular simulation. This allocation marks our migration onto the new national supercomputer Arrhenius. We are consolidating and requesting an increase in our GPU compute, CPU compute, and project storage on Arrhenius, supplemented by Dardel (Dardel-GH and Dardel-GPU). We are also looking to consolidate (and decrease) our existing Large Storage resources into this application. Our research objectives are to: (1) train deep generative and language models for molecular design and optimization, including synthesizability-constrained generation, retrosynthesis, and the design of multi-target therapeutic modalities such as PROTACs and molecular glues; (2) develop large-scale multi-modal and representation-learning models for single-cell data and cell-image (phenotypic) analysis; (3) apply atomistic and coarse-grained molecular dynamics and ab initio methods both to understand biomolecular interactions (e.g., ternary-complex formation, membrane mechanics) and to generate training data for surrogate property models; and (4) discover sustainable materials, including PFAS alternatives and battery electrolytes, by coupling simulation-derived datasets with generative and predictive models. These efforts have produced a substantial body of peer-reviewed and open-access work (https://ailab.bio/publications) and tools, with further publications in preparation for leading computer science, cheminformatics, and bioinformatics venues. Consistent with the group's open-science commitment, all code, models, and datasets are released open-source (GitHub, Hugging Face, Zenodo) and adequately acknowledge NAISS resources. Our demonstrated usage motivates the requested scale-up: over the current project our team consumed roughly 15,000 GPU-h/month of GPU compute (≈211% of the Alvis allocation and ≈886% of the Dardel-GPU allocation) and ~85,000 core-h/month of CPU compute (≈300% of Dardel and ≈150% of Tetralith), while our project storage on Klemming reached ~81% of allocation. For 2026–2027 we therefore request, at the Medium ceilings, 5,000 GPU-h/month on Arrhenius GPU, plus 1,000 GPU-h/month on Dardel-GH and 1,000 GPU-h/month on Dardel-GPU; 30×1000 core-h/month on Arrhenius CPU; and 35,000 GiB of project storage on Arrhenius Disk (just over our current 28,000 GiB on Mimer of active data storage). We have a data management plan in place.