Overview

The following report details the methods used to determine appropriate filter thresholds for SNV variant calls.


At the site level, three major filters were applied to obtain a high-quality variant set:

To find the optimal filter thresholds, we created simulated data in which positions of real variants are known, and examined each metric independently and in combination.

Note: filter thresholds were only optimized for SNVs.


Creating simulated data




QD filter threshold

To reduce complexity, the filter thresholds were optimized independently.

The optimal QD threshold was determined as follows:

CHROM POS QD sim1_genotype sim2_genotype sim3_genotype ==> pass_QD_filter is_detected
I 1352 110 1/1 1/1 1/1 QD threshold is 10 yes yes
I 2566 90 1/1 0/0 1/1 QD threshold is 10 yes yes
I 3847 2 0/0 1/1 0/0 QD threshold is 10 no no
I 4975 38 1/1 0/0 0/0 QD threshold is 10 no no
I 5590 298 1/1 1/1 1/1 QD threshold is 10 yes yes


CHROM POS is_detected is_in_truth category
I 1352 yes yes true positive
I 2566 yes no false positve
I 3847 no no true negative
I 4975 no yes false negative
I 5590 yes yes true positive


in_truth not_in_truth
detected count of true positive count of false positive
not_detected count of false negative count of true negative



SOR threshold


FS treshold


QD, SOR, FS filter thresholds in combination