Cactus: a user-friendly and reproducible ATAC-Seq and mRNA-Seq analysis pipeline for data preprocessing, differential analysis, and enrichment analysis
Dnr:
NAISS 2024/22-874
Type:
NAISS Small Compute
Principal Investigator:
Christian Riedel
Affiliation:
Karolinska Institutet
Start Date:
2024-07-01
End Date:
2025-07-01
Primary Classification:
10203: Bioinformatics (Computational Biology) (applications to be 10610)
The widespread use of ATAC-Seq and mRNA-Seq generates a need for methods to thoroughly analyze the resulting data. Here we introduce Cactus (Chromatin accessibility and transcriptomics unifying software), a pipeline that streamlines the individual or integrated analysis of ATAC-Seq and mRNA-Seq data. The pipeline is written in Nextflow for efficient scaling, caching, and parallelization, and all tools are packaged in Singularity/Docker containers or conda/Mamba virtual environments for simple installation and high reproducibility. Cactus conducts preprocessing on raw sequencing reads, followed by differential analysis between conditions. Results are split into Differential Analysis Subsets (DASs) based on significances thresholds, direction of change, annotated genomic regions, and experiment type. Then, Cactus computes enrichments of entries from internal (i.e., DASs) and external (i.e., ontologies, pathways, DNA binding motifs, ChIP-Seq (Chromatin immunoprecipitation sequencing) binding sites, and chromatin states) databases in DASs and presents results in barplots and customizable heatmaps. The visualization of the results is simplified by the generation of merged PDFs, merged tables, and formatted Excel tables. In conclusion, Cactus can assist researchers in gaining comprehensive insights from chromatin accessibility and gene expression data in a quick, user-friendly, and reproducible manner.