Experimental

This module serves two functions: as a staging area for extensions of Hail not ready for inclusion in the main package, and as a library of lightly reviewed community submissions.

Contribution Guidelines

Submissions from the community are welcome! The criteria for inclusion in the experimental module are loose and subject to change:

  1. Function docstrings are required. Hail uses NumPy style docstrings.
  2. Tests are not required, but are encouraged. If you do include tests, they must run in no more than a few seconds. Place tests as a class method on Tests in python/hail/tests/test_experimental.py
  3. Code style is not strictly enforced, aside from egregious violations. We do recommend using autopep8 though!

Genetics Methods

ld_score(entry_expr, annotation_exprs, …) Calculate LD scores.
hail.experimental.ld_score(entry_expr, annotation_exprs, position_expr, window_size) → hail.table.Table[source]

Calculate LD scores.

Example

>>> # Load genetic data into MatrixTable
>>> mt = hl.import_plink(bed='data/ldsc.bed',
...                      bim='data/ldsc.bim',
...                      fam='data/ldsc.fam')
>>> # Create locus-keyed Table with numeric variant annotations
>>> ht = hl.import_table('data/ldsc.annot',
...                      types={'BP': hl.tint,
...                             'binary': hl.tfloat,
...                             'continuous': hl.tfloat})
>>> ht = ht.annotate(locus=hl.locus(ht.CHR, ht.BP))
>>> ht = ht.key_by('locus')
>>> # Annotate MatrixTable with external annotations
>>> mt = mt.annotate_rows(univariate_annotation=1,
...                       binary_annotation=ht[mt.locus].binary,
...                       continuous_annotation=ht[mt.locus].continuous)
>>> # Annotate MatrixTable with alt allele count stats
>>> mt = mt.annotate_rows(stats=hl.agg.stats(mt.GT.n_alt_alleles()))
>>> # Create standardized genotype entry
>>> mt = mt.annotate_entries(GT_std=hl.or_else(
...     (mt.GT.n_alt_alleles() - mt.stats.mean)/mt.stats.stdev, 0.0))
>>> # Calculate LD scores using standardized genotypes
>>> ht_scores = hl.experimental.ld_score(entry_expr=mt.GT_std,
...                                      annotation_exprs=[
...                                         mt.univariate_annotation,
...                                         mt.binary_annotation,
...                                         mt.continuous_annotation],
...                                      position_expr=mt.cm_position,
...                                      window_size=1)

Warning

ld_score() will fail if entry_expr results in any missing values. The special float value nan is not considered a missing value.

Further reading

For more in-depth discussion of LD scores, see:

Parameters:
  • entry_expr (NumericExpression) – Expression for entries of genotype matrix (e.g. mt.GT.n_alt_alleles()).
  • annotation_exprs (NumericExpression or) – list of NumericExpression Annotation expression(s) to partition LD scores.
  • position_expr (NumericExpression) – Expression for position of variant (e.g. mt.cm_position or mt.locus.position).
  • window_size (int or float) – Size of variant window used to calculate LD scores, in units of position.
Returns:

Table – Locus-keyed table with LD scores for each variant and annotation.