Bias model training and quality check report

Preprocessing report

The image below should look closely like a Tn5 or DNase bias enzyme motif.

Training report

The val loss (validation loss) will decrease and saturate after a few epochs.

Bias model performance in peaks and non-peaks

Counts Metrics: The pearsonr in non-peaks should be greater than 0 (higher the better). The pearsonr in peaks should be greater than -0.3 (otherwise the bias model could potentially be capturing AT bias). MSE (Mean Squared Error) will be high in peaks.

Profile Metrics: Median JSD (Jensen Shannon Divergence between observed and predicted) lower the better. Median norm JSD is median of the min-max normalized JSD where min JSD is the worst case JSD i.e JSD of observed with uniform profile and max JSD is the best case JSD i.e 0. Median norm JSD is higher the better. Both JSD and median norm JSD are sensitive to read-depth. Higher read-depth results in better metrics.

nonpeaks.pearsonr nonpeaks.mse peaks.pearsonr peaks.mse
counts_metrics 0.61 1.36 -0.27 10.44
nonpeaks.median_jsd nonpeaks.median_norm_jsd peaks.median_jsd peaks.median_norm_jsd
profile_metrics 0.58 0.21 0.39 0.37

TFModisco motifs learnt from bias model (bias.h5) model

TFModisco motifs generated from profile contribution scores of the bias model. cwm_fwd, cwm_rev are the forward and reverse complemented consolidated motifs from contribution scores in subset of random peaks. These CWM motifs should be free from any Transcription Factor (TF) motifs and should contain either only bias motifs or random repeats. For each of these motifs, we use TOMTOM to find the top-3 closest matches (match_0, match_1, match_2) from a database consisting of both MEME TF motifs and heterogenous enzyme bias motifs that we have repeatedly seen in our datasets. The qvals (qval0,qval1,qval2) should be high (> 0.0001) if the closest hit is a TF motif (i.e indicating that the closest match is not the correct match) - this is also generally verifiable by eye as the closest match will look nothing like the CWMs. The qvals should be low if the closest hit is enzyme bias motif and generally verifiable that the top match looks like the CWM. The first 3-5 motifs in the list below should look like enzyme bias motif.

What to do if you find an obvious TF motif in the list?
Do not use this bias model as it will regress the contribution of the TF motifs (along with bias motifs) from the chrombpnet_nobias.h5. Reduce the bias_threshold_factor argument input to the chrombpnet bias pipeline or chrombpnet bias train command used in training the bias model and retrain a new bias model. For more intuition about this argument refer to the FAQ section in wiki.

What to do if you are unsure if a given CWM motif is resembling the match_0 logo for example?
Get marginal footprint on the match_0 motif logo (using the command chrombpnet footprints and make sure that the bias models footprint is closer to that of controls with no motif inserted - for examples look at FAQ )

pattern NumSeqs cwm_fwd cwm_rev match0 qval0 match0_logo match1 qval1 match1_logo match2 qval2 match2_logo
pos__0 12199 TN5_2 7.879600e-04 TN5_1 2.010440e-03 TN5_3 0.010525
pos__1 11227 TN5_2 1.366560e-09 TN5_1 3.963980e-07 TN5_7 0.000165
pos__2 2094 TN5_3 8.212560e-12 TN5_1 4.261020e-04 TN5_2 0.004530
pos__3 1047 TN5_1 4.619900e-02 TN5_7 4.619900e-02 TN5_3 0.220736
pos__4 695 TN5_6 6.714620e-22 TN5_8 8.525420e-01 TEAD1_HUMAN.H11MO.0.A 0.873030
pos__5 651 TN5_3 5.074320e-14 TN5_4 1.767550e-04 TN5_5 0.000177
pos__6 571 TN5_3 3.792580e-07 TN5_4 2.079370e-05 TN5_5 0.000021
pos__7 544 TN5_7 1.095890e-06 TN5_3 9.273860e-03 TN5_4 0.031471
pos__8 252 ZNF384_MA1125.1 5.886680e-02 PRDM6_HUMAN.H11MO.0.C 5.886680e-02 STAT1_MOUSE.H11MO.0.A 0.206505
pos__9 78 TN5_8 1.333560e-09 TN5_2 3.271250e-03 TN5_1 0.042943
pos__10 78 SPDEF_ETS_3 5.358310e-01 SPDEF_ETS_6 5.358310e-01 Hoxc10.mouse_homeodomain_2 0.535831
pos__11 62 TN5_4 2.317790e-01 TN5_5 2.317790e-01 KLF4_MA0039.3 0.857639
pos__12 45 ZNF384_MA1125.1 1.170330e-02 SOX7_HMG_2 5.004340e-01 PRDM6_HUMAN.H11MO.0.C 0.500434
pos__13 24 TN5_1 8.920400e-02 ZN322_HUMAN.H11MO.0.B 8.647850e-01 RHOXF1_homeodomain_1 0.864785

TFModisco motifs generated from counts contribution scores of the bias model. cwm_fwd, cwm_rev are the forward and reverse complemented consolidated motifs from contribution scores in subset of random peaks. These motifs should be free from any Transcription Factor (TF) motifs and should contain motifs either weakly related to bias motifs or random repeats. For each of these motifs, we use TOMTOM to find the top-3 closest matches (match_0, match_1, match_2) from a database consisting of both MEME TF motifs and heterogenous enzyme bias motifs that we have repeatedly seen in our datasets. The qvals should be high (> 0.0001) if the closest hit is a TF motif (i.e indicating that the closest match is not the correct match, this is also generally verifiable by eye and making sure the closest match looks nothing like the CWMs).

What to do if you find an obvious TF motif in the list?
Do not use this bias model as it will regress the contribution of the TF motifs (along with bias motifs) from the chrombpnet_nobias.h5. Reduce the bias_threshold_factor argument input to the chrombpnet bias pipeline or chrombpnet bias train command used in training the bias model and retrain a new bias model. For more intuition about this argument refer to the FAQ section in wiki.

What to do if you are unsure if a given CWM motif is resembling the match_0 logo for example?
Get marginal footprint on the match_0 motif logo (using the command chrombpnet footprints and make sure that the bias models footprint is closer to that of controls with no motif inserted - for examples look at FAQ )

pattern NumSeqs cwm_fwd cwm_rev match0 qval0 match0_logo match1 qval1 match1_logo match2 qval2 match2_logo
pos__0 113 TN5_2 0.180941 NR2C1_HUMAN.H11MO.0.C 0.180941 NR2C1_MOUSE.H11MO.0.C 0.180941
pos__1 107 SP2_HUMAN.H11MO.0.A 0.000101 SP2_MOUSE.H11MO.0.B 0.000101 SP3_HUMAN.H11MO.0.B 0.001941
pos__2 106 SP1_HUMAN.H11MO.0.A 0.001531 SP2_HUMAN.H11MO.0.A 0.001531 SP2_MOUSE.H11MO.0.B 0.001531
pos__3 102 SP2_HUMAN.H11MO.0.A 0.000183 SP2_MOUSE.H11MO.0.B 0.000183 SP1_MOUSE.H11MO.0.A 0.001056
pos__4 102 SP2_HUMAN.H11MO.0.A 0.000132 SP2_MOUSE.H11MO.0.B 0.000132 ZFX_MOUSE.H11MO.0.B 0.001439
pos__5 97 ZFX_MOUSE.H11MO.0.B 0.153356 TN5_2 0.153356 ZN331_HUMAN.H11MO.0.C 0.153356
pos__6 92 FEZF1_HUMAN.H11MO.0.C 0.064121 PTF1A_HUMAN.H11MO.0.B 0.137296 PTF1A_MOUSE.H11MO.0.A 0.137296
pos__7 88 KLF6_HUMAN.H11MO.0.A 0.324902 KLF6_MOUSE.H11MO.0.B 0.324902 KLF3_HUMAN.H11MO.0.B 0.503348
pos__8 85 TN5_2 0.026556 CONVENTIONAL_TN5_1 0.026556 ZFX_MOUSE.H11MO.0.B 0.244101
pos__9 85 CONVENTIONAL_TN5_1 0.029943 TN5_2 0.059287 ZN335_HUMAN.H11MO.0.A 0.059287
pos__10 82 KLF3_HUMAN.H11MO.0.B 0.985254 KLF3_MOUSE.H11MO.0.A 0.985254 KLF5_MOUSE.H11MO.0.A 0.985254
pos__11 74 MTF1_HUMAN.H11MO.0.C 0.189276 MTF1_MOUSE.H11MO.0.C 0.189276 MTF1_C2H2_1 0.189276
pos__12 71 NKX2-8_MA0673.1 0.462193 NKX2-8_homeodomain_2 0.462193 SMAD4_HUMAN.H11MO.0.B 0.462193
pos__13 69 TN5_2 0.013038 HTF4_MOUSE.H11MO.0.A 0.028343 KLF6_HUMAN.H11MO.0.A 0.028343
pos__14 64 TN5_2 0.016168 ZFX_MOUSE.H11MO.0.B 0.037342 CTCF_MOUSE.H11MO.0.A 0.131206
pos__15 58 TN5_2 0.243212 SP1_HUMAN.H11MO.0.A 0.243212 ZFX_MOUSE.H11MO.0.B 0.348652
pos__16 56 SP2_HUMAN.H11MO.0.A 0.018828 SP2_MOUSE.H11MO.0.B 0.018828 KLF6_HUMAN.H11MO.0.A 0.033111
pos__17 49 NFIB_MOUSE.H11MO.0.C 0.067635 MXI1_HUMAN.H11MO.0.A 0.067635 MXI1_MOUSE.H11MO.0.A 0.067635
pos__18 46 NFIB_MOUSE.H11MO.0.C 0.097052 HES7_MA0822.1 0.132777 HES7_bHLH_1 0.132777
pos__19 42 SP1_MOUSE.H11MO.0.A 0.010706 KLF16_C2H2_1 0.010706 KLF16_MA0741.1 0.010706
pos__20 36 SP2_HUMAN.H11MO.0.A 0.001043 SP2_MOUSE.H11MO.0.B 0.001043 SP3_HUMAN.H11MO.0.B 0.001163
pos__21 33 CTCF_HUMAN.H11MO.0.A 0.585457 CTCF_MOUSE.H11MO.0.A 0.585457 RFX5_HUMAN.H11MO.0.A 0.585457
pos__22 33 TBX20_MOUSE.H11MO.0.C 0.060229 ASCL1_MA1100.1 0.060229 ASCL2_MOUSE.H11MO.0.C 0.060229
pos__23 28 SP2_HUMAN.H11MO.0.A 0.000110 SP2_MOUSE.H11MO.0.B 0.000110 SP1_HUMAN.H11MO.0.A 0.000241
pos__24 24 CONVENTIONAL_TN5_1 0.010219 TN5_2 0.010219 THAP1_HUMAN.H11MO.0.C 0.092492
pos__25 24 ASCL1_MA1100.1 0.003704 BHA15_HUMAN.H11MO.0.B 0.003704 BHA15_MOUSE.H11MO.0.A 0.003704
pos__26 23 HES5_bHLH_2 0.002660 HES5_MA0821.1 0.002660 HES5_bHLH_1 0.002660