The systemic inflammatory autoimmune diseases (SIADs) constitute a group of diseases characterized by several common clinical manifestations. Studies of the genetic background to SIAD have so far revealed that several risk genes are involved in inflammatory pathways, however, much of the genetic risk is still unknown.
In this project we will combine data generated by targeted sequencing of 1000 patients each with the SIADs SLE, Sjögren's syndrome and myositis and 1000 healthy individuals within the project "DISSECT" (PI Lars Rönnblom) to identify genetic variants associated with susceptibility to SIAD.
Target enrichment of 1900 immune-related genes for the 4000 samples has been performed in Kerstin Lindblad Toh's lab at BMC, Uppsala, using a NimbleGen SeqCapEZ custom-made Immunoarray. Libraries were prepared and pooled in batches of eight or ten before delivery to the SNP&SEQ platform for sequencing. All sequence data has now been delivered and in this project the data will be analysed together for all three diseases. We also aim to include SweGen data as additional helalthy control individuals to boost the study’s statistical power. In addition to diagnosis, genetic associations to sub-phenotypes (clinical variables) will also be investigated. The phenotypic data contains sensitive personal information retrieved from the patients’ medical records.
During the 6 months we have had the project on Bianca we have been working with migrating our sensitive data stored on Milou to this project on Bianca. Data analysis was delayed due to the project being under-staffed during September-October, but since November we have been performing genotype calling separately and together for UK and Scandinavian samples to investigate at what level the sequence/genotype data can be combined. The data analyses will continue in 2018 with joint genotype calling and variant quality control for all 4000 samples in the project. We are also submitting an application to use individual level data from the SweGen project as additional control individuals in our analyses.
For this project we would need storage space of around 80TB to encompass the individual fastq, BAM and genomic VCF files for the 4000 project samples as well as the files that will be generated during the analysis, especially in the analyses also including SweGen data. The individual genomic VCF files are needed for the combined genotyping of the cohort. The BAM files are used during the sample QC for performing LASER analysis of ancestry. We would appreciate if also the fastq files could be stored in this project, since this solution currently is the one best suited for storing sensitive patient data in a secure manner. The fastq files are used only if we encounter problems and need to go back to the raw data. For the analysis we would need around 10 000 core-hours/month on average.
Ethical permissions: Dnr 2015/450, Dnr 00-227, Dnr 2016/155. Dnr 2009/013, Dnr 2007/1121-32, 2009/1934-32, 2012/736-32.