SUPR
Data storage for unprocessed simulation data
Dnr:

NAISS 2024/4-5

Type:

NAISS Large Storage

Principal Investigator:

Shahab Fatemi

Affiliation:

Umeå universitet

Start Date:

2025-01-01

End Date:

2026-01-01

Primary Classification:

10303: Fusion, Plasma and Space Physics

Allocation

Abstract

We produce substantial simulation datasets from our three-dimensional, time-dependent plasma model (www.amitiscode.com). These datasets consist mainly of raw, unprocessed data generated on Vega, a EuroHPC center in Slovenia. Our access to Vega is through a Regular Access EuroHPC project with Fatemi as the project leader. Presently, we have a storage capacity of only 30,000 GB on Vega. However, on average, we generate ~4,000 GB of data per week, necessitating data transfer from Vega to another data center, such as dCache/SweStore. Over the one year of our access to Vega in 2024-2025, we anticipate generating around 200,000 GB of data. This means that, although we do not currently have the 200,000 GB of data, we will accumulate it over the specified period. Our strategy involves the continuous transfer and storage of this data from Vega to dCache. From there, we will incrementally transfer smaller chunks of data to our local servers at the Department of Physics at Umeå University (UmU) for post-processing and archiving. Once the data is on our local servers, we will analyze it and store the processed data on our local file servers at the Department of Physics. To ensure data safety, we prefer to keep the data on dCache until our data processing finishes. As a short-term solution, we applied for a medium-sized project of 100 TB, of which nearly 60% is filled by now. This is certainly not enough for our needs. We would greatly appreciate obtaining 200 TB of space on dCache. This storage would allow us to manage and process our data more effectively without the constant need to free up space. Our project on Vega is ongoing, and managing the increasing data volume is crucial for the success of our research and the efficiency of our data processing workflow.