There is strong consensus that alterations in the pancreatic tissue microenvironment (e.g., immune cell infiltration, β-cell death, fibrosis) play a central role in pancreatic disease progression.
My current project is focused on building a computational infrastructure to analyze large-scale spatial transcriptomics data from pancreatic tissue. The aim is to optimize and benchmark analysis pipelines for spatial transcriptomics datasets derived from both healthy donors and individuals with pancreatic diseases. This work involves generating and processing raw spatial transcriptomics data, starting from FASTQ files, through Snakemake/Nextflow pipelines in Singularity containers, and continuing with downstream analysis using computationally intensive tools such as Cell2Deconvolution.
The size and complexity of these data present a major storage and computational challenge. A single dataset requires extensive intermediate files, which cannot be immediately discarded, since iterative reanalysis and troubleshooting are essential.
From my previous usage patterns, I have realized that I require significantly more storage than initially anticipated. In particular:
Each sample generates multiple large files (BAM, TAB, HD5AD, etc.) that together demand several hundred GB of space. Moreover, the data cannot be compressed or archived too early, as iterative reanalysis and comparison between pipeline versions are necessary.
Finally, the storage load grows with the number of samples, since simultaneous processing requires keeping multiple datasets accessible at once.
For this reason, I am requesting additional storage on Klemming to enable my research.