Experimental¶
This module serves two functions: as a staging area for extensions of Hail not ready for inclusion in the main package, and as a library of lightly reviewed community submissions.
Contribution Guidelines¶
Submissions from the community are welcome! The criteria for inclusion in the experimental module are loose and subject to change:
- Function docstrings are required. Hail uses NumPy style docstrings.
- Tests are not required, but are encouraged. If you do include tests, they must
run in no more than a few seconds. Place tests as a class method on
Tests
inpython/hail/tests/test_experimental.py
- Code style is not strictly enforced, aside from egregious violations. We do recommend using autopep8 though!
Genetics Methods¶
ld_score (entry_expr, annotation_exprs, …) |
Calculate LD scores. |
-
hail.experimental.
ld_score
(entry_expr, annotation_exprs, position_expr, window_size) → hail.table.Table[source]¶ Calculate LD scores.
Example
>>> # Load genetic data into MatrixTable >>> mt = hl.import_plink(bed='data/ldsc.bed', ... bim='data/ldsc.bim', ... fam='data/ldsc.fam')
>>> # Create locus-keyed Table with numeric variant annotations >>> ht = hl.import_table('data/ldsc.annot', ... types={'BP': hl.tint, ... 'binary': hl.tfloat, ... 'continuous': hl.tfloat}) >>> ht = ht.annotate(locus=hl.locus(ht.CHR, ht.BP)) >>> ht = ht.key_by('locus')
>>> # Annotate MatrixTable with external annotations >>> mt = mt.annotate_rows(univariate_annotation=1, ... binary_annotation=ht[mt.locus].binary, ... continuous_annotation=ht[mt.locus].continuous)
>>> # Annotate MatrixTable with alt allele count stats >>> mt = mt.annotate_rows(stats=hl.agg.stats(mt.GT.n_alt_alleles()))
>>> # Create standardized genotype entry >>> mt = mt.annotate_entries(GT_std=hl.or_else( ... (mt.GT.n_alt_alleles() - mt.stats.mean)/mt.stats.stdev, 0.0))
>>> # Calculate LD scores using standardized genotypes >>> ht_scores = hl.experimental.ld_score(entry_expr=mt.GT_std, ... annotation_exprs=[ ... mt.univariate_annotation, ... mt.binary_annotation, ... mt.continuous_annotation], ... position_expr=mt.cm_position, ... window_size=1)
Warning
ld_score()
will fail ifentry_expr
results in any missing values. The special float valuenan
is not considered a missing value.Further reading
For more in-depth discussion of LD scores, see:
- LD Score regression distinguishes confounding from polygenicity in genome-wide association studies (Bulik-Sullivan et al, 2015)
- Partitioning heritability by functional annotation using genome-wide association summary statistics (Finucane et al, 2015)
Parameters: - entry_expr (
NumericExpression
) – Expression for entries of genotype matrix (e.g.mt.GT.n_alt_alleles()
). - annotation_exprs (
NumericExpression
or) –list
ofNumericExpression
Annotation expression(s) to partition LD scores. - position_expr (
NumericExpression
) – Expression for position of variant (e.g.mt.cm_position
ormt.locus.position
). - window_size (
int
orfloat
) – Size of variant window used to calculate LD scores, in units ofposition
.
Returns: Table
– Locus-keyed table with LD scores for each variant and annotation.