The following report details the methods used to determine appropriate filter thresholds for SNV variant calls.
At the site level, three major filters were applied to obtain a high-quality variant set:
QD
- variant quality normalized by read depthSOR
- strand odds ratioFS
FisherstrandTo find the optimal filter thresholds, we created simulated data in which positions of real variants are known, and examined each metric independently and in combination.
Note: filter thresholds were only optimized for SNVs.
N2.bam
) using bamsurgeon (20d431e
). The genome browser snapshot below shows an inserted variant (in red).wi-gatk-nf
pipeline.To reduce complexity, the filter thresholds were optimized independently.
The optimal QD
threshold was determined as follows:
QD
value passed the threshold. For example, the table below illustrates a few variants with a filter threshold QD > 10
. Variants with QD
that failed the filter are classified as undetected.CHROM | POS | QD | sim1_genotype | sim2_genotype | sim3_genotype | ==> | pass_QD_filter | is_detected |
---|---|---|---|---|---|---|---|---|
I | 1352 | 110 | 1/1 | 1/1 | 1/1 | QD threshold is 10 | yes | yes |
I | 2566 | 90 | 1/1 | 0/0 | 1/1 | QD threshold is 10 | yes | yes |
I | 3847 | 2 | 0/0 | 1/1 | 0/0 | QD threshold is 10 | no | no |
I | 4975 | 38 | 1/1 | 0/0 | 0/0 | QD threshold is 10 | no | no |
I | 5590 | 298 | 1/1 | 1/1 | 1/1 | QD threshold is 10 | yes | yes |
CHROM | POS | is_detected | is_in_truth | category |
---|---|---|---|---|
I | 1352 | yes | yes | true positive |
I | 2566 | yes | no | false positve |
I | 3847 | no | no | true negative |
I | 4975 | no | yes | false negative |
I | 5590 | yes | yes | true positive |
in_truth | not_in_truth | |
---|---|---|
detected | count of true positive | count of false positive |
not_detected | count of false negative | count of true negative |
QD
filter.SOR
threshold.FS
threshold.