The following report details the methods used to determine appropriate filter thresholds for SNV variant calls.
At the site level, three major filters were applied to obtain a high-quality variant set:
QD - variant quality normalized by read depthSOR - strand odds ratioFS FisherstrandTo find the optimal filter thresholds, we created simulated data in which positions of real variants are known, and examined each metric independently and in combination.
Note: filter thresholds were only optimized for SNVs.
N2.bam) using bamsurgeon (20d431e). The genome browser snapshot below shows an inserted variant (in red).wi-gatk-nf pipeline.To reduce complexity, the filter thresholds were optimized independently.
The optimal QD threshold was determined as follows:
QD value passed the threshold. For example, the table below illustrates a few variants with a filter threshold QD > 10. Variants with QD that failed the filter are classified as undetected.| CHROM | POS | QD | sim1_genotype | sim2_genotype | sim3_genotype | ==> | pass_QD_filter | is_detected |
|---|---|---|---|---|---|---|---|---|
| I | 1352 | 110 | 1/1 | 1/1 | 1/1 | QD threshold is 10 | yes | yes |
| I | 2566 | 90 | 1/1 | 0/0 | 1/1 | QD threshold is 10 | yes | yes |
| I | 3847 | 2 | 0/0 | 1/1 | 0/0 | QD threshold is 10 | no | no |
| I | 4975 | 38 | 1/1 | 0/0 | 0/0 | QD threshold is 10 | no | no |
| I | 5590 | 298 | 1/1 | 1/1 | 1/1 | QD threshold is 10 | yes | yes |
| CHROM | POS | is_detected | is_in_truth | category |
|---|---|---|---|---|
| I | 1352 | yes | yes | true positive |
| I | 2566 | yes | no | false positve |
| I | 3847 | no | no | true negative |
| I | 4975 | no | yes | false negative |
| I | 5590 | yes | yes | true positive |
| in_truth | not_in_truth | |
|---|---|---|
| detected | count of true positive | count of false positive |
| not_detected | count of false negative | count of true negative |
QD filter.SOR threshold.FS threshold.