Representation Learning and Transfer Dynamics in Structured Data Regimes

SUPR uses JavaScript for certain functions. We cannot guarantee that you will be able to use the system with JavaScript disabled.

Dnr:

NAISS 2026/3-425

Type:

NAISS Medium

Principal Investigator:

Stefano Sarao Mannelli

Affiliation:

Chalmers tekniska högskola

Start Date:

2026-06-01

End Date:

2027-06-01

Primary Classification:

10210: Artificial Intelligence

Secondary Classification:

10308: Statistical physics and complex systems

Tertiary Classification:

10105: Computational Mathematics

Webpage:

https://stefsmlab.github.io/research/

Allocation

Arrhenius GPU at NAISS: 5000 GPU-h/month
Arrhenius Disk at NAISS: 5000 GiB
Arrhenius Flash at NAISS: 1000 GiB

Abstract

We request compute resources on both Alvis and Tetralith to support the continuation of an active research project focused on understanding representation learning dynamics in modern deep learning architectures. Specifically, this work investigates how heterogeneous, structured, and imbalanced data dictate the temporal evolution of generalization and memorization in diffusion models and transformers. We train generative architectures—primarily U-Nets for score-based diffusion and transformer architectures—across structurally varied data distributions (e.g., controlled Gaussian mixtures, Fashion-MNIST, and structured mathematical operations). Training is repeated across specific controlled variations (e.g., class variance, centroid geometry, sampling imbalance, and curriculum ordering) to systematically probe learning hierarchies. Consequently, the experimental pipeline is highly compute-intensive. GPU compute on Alvis (C3SE) is essential for the training and sample-generation phases. Having completed the initial infrastructure setup, our utilization has scaled significantly alongside our experimental throughput, with recent usage surpassing 10,000 GPU-hours in a single month. To sustain this momentum—specifically for tracking sample-level memorization at high resolution in diffusion models and executing computationally demanding multi-task curriculum learning schedules in transformers—we require an increased allocation. Based on current consumption trajectories, we anticipate a required usage of approximately 8,000 GPU-hours/month on Alvis, leveraging A100 and A40 nodes to accommodate large batch sizes and extensive generative sampling. CPU compute on Tetralith (NSC) remains equally critical for the theoretical and analytical components of the project. We expect to utilize a consistent share of the Tetralith allocation (approximately 100,000–200,000 core-hours/month) to support two primary tasks: 1. **Post-training evaluation:** Tracking sample-level memorization requires optimized, large-scale L2 distance computations (nearest-neighbor algorithms) across tens of thousands of generated and training samples at multiple log-spaced checkpoints. 2. **Theoretical model simulations:** Developing analytical frameworks for high-dimensional learning dynamics requires vectorized simulations, numerical diagonalizations, and the evaluation of complex spectral equations for Gaussian Equivalent Process matrices. These solvers demand the multi-core parallelism and high-memory capacity of Tetralith nodes. Our research group comprises the PI, two postdoctoral researchers, three PhD students, and one visiting student. Our methodology relies entirely on the synergy between theoretical derivations (CPU) and large-scale empirical validation (GPU). The combination of Alvis and Tetralith provides the necessary infrastructure for this dual-natured research. The requested increase to 8,000 GPU-hours/month remains well within the NAISS Medium Compute bounds, reflecting a documented, evidence-based scaling of our research output. To illustrate the scope and recent productivity of this specific allocation, below are representative publications directly supported by this project: 1. Sharp description of local minima in the loss landscape of high-dimensional two-layer ReLU neural networks (Accepted to ICML 2026) 2. The Interplay of Data Structure and Imbalance in the Learning Dynamics of Diffusion Models (Submitted to NeurIPS 2026) 3. A Theory of Initialisation's Impact on Specialisation (ICLR 2025)