SUPR
Data storage for unprocessed simulation data
Dnr:

NAISS 2024/6-163

Type:

NAISS Medium Storage

Principal Investigator:

Shahab Fatemi

Affiliation:

Umeå universitet

Start Date:

2024-05-29

End Date:

2025-06-01

Primary Classification:

10303: Fusion, Plasma and Space Physics

Webpage:

Allocation

Abstract

We are producing substantial simulation datasets from our three-dimensional, time-dependent plasma model (www.amitiscode.com). These datasets consist mainly of raw, unprocessed data generated on Vega, a EuroHPC center located in Slovenia. Our access to Vega is through a Regular Access EuroHPC project. Presently, we have a storage capacity of only 30,000 GB on Vega, but we generate ~6,000 GB of data per week, necessitating the transfer of data from Vega to another data center, such as dCache. Over the next nine months, which is the duration of our access to Vega, we anticipate generating over 200,000 GB of data. This means that, although we do not currently have the 200,000 GB of data, we will accumulate it over the specified period. Our strategy involves the continuous transfer and storage of this data from Vega to dCache. From there, we will incrementally transfer smaller chunks of data to our local servers at the Department of Physics at Umeå University (UmU) for post-processing and archiving. Once the data is on our local servers, we will analyze it and store the processed data on our local file servers at the Department of Physics. To ensure data safety, we prefer to keep the data on dCache until our data processing procedures are complete. We are aware that the maximum storage capacity we can obtain on dCache is 100 TB, which is only half of the total data we expect to generate. This presents a significant challenge for us, but currently, it is the only solution left to us. Our computational project on Vega cannot wait and we need to use our granted time, hence, we should generate simulation files. We are currently in discussions with HPC2N at UmU to explore possible solutions, and indeed, it was their suggestion to us to apply for dCache. If feasible, we would greatly appreciate obtaining 200 TB of space on dCache instead of the current limit of 100 TB. This additional storage would allow us to manage and process our data more effectively without the constant need to free up space. However, granting access to 100 TB at the moment is highly valuable and will solve our data storage problem until this autumn, until we find another solution for the extra storage we need. Our project on Vega has just started, and managing the increasing data volume is crucial for the success of our research and the efficiency of our data processing workflow.