
View on GitHub


Test Coverage

The README describes the content of the delivered data.

Root level
The root folder, which is named by the project id, contains one report folder
and one folder for each sample. Each sample folder is accompanied by a .lst-file
containing a list of the files in the folder and a .md5-file containing the
MD5-checksums of the files in the folder. Use the MD5-checksums to verify the
integrity of the files after transfer.


ProjectID -> 00-Report
00-Report folder contain sequence QC, aggregate statistic and used software
version report.


A tab delimited file with sequence, alignment and variant statistics for each
sample in the project.

SequenceQC folder contain sequence QC data, which provide information about the
quality and other features of the fastq-files, both on sample and lane level.
The reports are organized by sequence run.

Information from piper, about data sources and software version
that have been used.

ProjectID -> SampleN
Each sample will contain the following subfolders:


ProjectID -> SampleN -> 00-Reports
Contains two type of reports, snpEff summary and sample report.


--snpEff summary report
SnpEff generated reports, summary.csv shows basic statistics about the
analyzed variants, genes.txt file is a tab separated file having counts of
number of variants affecting each transcript and gene.

--Sample report
Report summarizing information about the analysis, alignment and variants.

ProjectID -> SampleN -> 01-QC
Qualimap QC report from mapping

Picard Mark duplicates output

GATK Variant calling evaluation for snp

GATK Variant calling evaluation for indels

ProjectID -> SampleN -> 02-FASTQ
SLURM (or bash) script that can be used for generating FASTQ files from
a BAM file. The script is formatted to be submitted and run on a
UPPMAX SLURM cluster. See top of the script file for further usage

ProjectID -> SampleN -> 03-BAM
BAM file (and index file) generated by piper. It's named by
using the sample name and the modifications that have been applied to it,
according to:
  -clean => applied gatk IndelRealigner on bam file
  -dedup => marked duplicates on bam file

Note that variant calling has been performed after recalibrating the base quality
scores of this BAM file (BQSR). However, due to the drastic increase in file
size during base quality recalibration, the BAM file without recalibrated
base quality scores is delivered. If recalibrated base qualities are required
for downstream applications, the script and resources below can be used to
obtain a recalibrated BAM file.
SLURM (or bash) script that can be used to apply recalibration data and
generate a recalibrated BAM file. See top of script file for further usage

Recalibration covariate data to be used for obtaining a recalibrated BAM
file, using the script above.

ProjectID -> SampleN -> 04-VCF
Contains the final VCF files and indexes, in total 6 files.

Genomic VCF (gVCF) (and index file) containing sequencing information for both
variant and non-variant positions. Can be used for downstream cohort calling.

VCF file containing variants called from the recalibrated BAM file and
annotated with variation effects generated with snpEff.