


In addition, we analyzed our in-house NA12878 WGS data (NA12878 in-house) sequenced at ∼60× (Illumina X Ten, PE150, PCR-free) ( 6). WGS (Illumina HiSeq 2500, PE150, PCR-free) FASTQ files for NA12878 (NA12878 GIAB) and the Ashkenazim trio (NA24143, NA24149, and NA24385) were downloaded ( .gov/giab, 300×) ( 4) and downsampled to ∼60×. ( B) Precision FP = false-positive calls] and number of FP. ( A) Sensitivity/Recall TP = true-positive and FN = false-negative calls] and number of FN. Variant (SNP + indel) calling performance of the four investigated pipelines in single-sample analyses as well as population (pop.) calling and trio analyses. 2 A), suggesting considerable influence of input sequencing reads on PEMapper/PECaller.įig. The difference between downloaded and our in-house data was pronounced in the sensitivity of the PEMapper/PECaller single-sample pipeline as well ( Fig. In precision, only minor differences were observed among pipelines, except for the PEMapper/PECaller population calling and the GENALICE MAP single-sample calling pipelines, which performed with the lowest and with distinctly lower precision, respectively, using downloaded FASTQ files ( Fig. 2 A) but with a 112× faster total run time and a 45× lower disk footprint ( Fig. GENALICE MAP showed sensitivity comparable to BWA/GATK ( Fig. As expected, BWA/GATK showed the highest sensitivity but fell behind the other three pipelines regarding run time and disk footprint. 2 A), making it less suitable for clinical sequencing. Indeed, PEMapper/PECaller resulted in the highest number of false-negative calls ( Fig. 1) nor as sensitive in variant calling as BWA/GATK ( Fig. In our benchmarking, PEMapper/PECaller was, although powerful, neither the fastest pipeline ( Fig. 1 BWA/GATK 3.5 best practices Isaac default GENALICE MAP best practices except for max_cigar_complexity = 18, max_context_call_density = 3, and min_map_quality = 1. Analysis parameters: PEMapper/PECaller according to ref. Minimal disk footprints for variant calling (†) were assessed, and thus for GENALICE MAP the size of the optional BAM file was not counted. Run times shown are for single-sample analyses of the downloaded NA12878 Genome in a Bottle (GIAB) data (legend of Fig. Notably, hs37d5 contains noncanonical bases which PEMapper/PECaller (downloaded March 29, 2017) was unable to interpret and which were therefore replaced with Ns for this pipeline. We mapped reads to the GRCh37-like reference genome hs37d5 ( 8), except for the Isaac pipeline running on BaseSpace Onsite not supporting custom reference genomes, where GRCh37 was used. Solid black and dotted red outlines indicate population calling and trio analysis options, respectively. Stepwise description, run time, disk footprint, and hardware specifications for the four investigated read mapping and variant calling pipelines.
