Datasets¶
Warning
All functionality described on this page is experimental. Datasets and method are subject to change.
This page describes genetic datasets that are hosted in a public repository
on Google Cloud Platform and are available for use through Hail’s
load_dataset()
function.
To load a dataset from this repository into a Hail pipeline, provide the name,
version, and reference genome build of the dataset you would like to use as
strings to the load_dataset()
function. The available dataset names,
versions, and reference genome builds are listed in the table below.
Name | Versions | Reference Genomes |
---|---|---|
1000_genomes | phase3 | GRCh37, GRCh38 |
Ensembl_CDS_regions | release_93 | GRCh37, GRCh38 |
Ensembl_cDNA_regions | release_93 | GRCh37, GRCh38 |
Ensembl_human_reference_genome | release_93 | GRCh37, GRCh38 |
Ensembl_low_complexity_regions | release_93 | GRCh37, GRCh38 |
Ensembl_ncRNA_regions | release_93 | GRCh37, GRCh38 |
Ensembl_peptide_sequences | release_93 | GRCh37, GRCh38 |
GERP_elements | GERP++ | GRCh37, GRCh38 |
GERP_scores | GERP++ | GRCh37, GRCh38 |
GTEx_eQTL_associations | v7 | GRCh37 |
GTEx_exons | v7 | GRCh37, GRCh38 |
GTEx_genes | v7 | GRCh37, GRCh38 |
GTEx_transcripts | v7 | GRCh37, GRCh38 |