Python API¶
This is the API documentation for Hail
, and provides detailed information
on the Python programming interface.
Use import hail as hl
to access this functionality.
hail.Table |
Hail’s distributed implementation of a dataframe or SQL table. |
hail.GroupedTable |
Table grouped by row that can be aggregated into a new table. |
hail.MatrixTable |
Hail’s distributed implementation of a structured matrix. |
hail.GroupedMatrixTable |
Matrix table grouped by row or column that can be aggregated into a new matrix table. |
Modules
Module functions
-
hail.
init
(sc=None, app_name='Hail', master=None, local='local[*]', log='hail.log', quiet=False, append=False, min_block_size=1, branching_factor=50, tmp_dir='/tmp', default_reference='GRCh37', force_ir=False)[source]¶ Initialize Hail and Spark.
Parameters: - sc (pyspark.SparkContext, optional) – Spark context. By default, a Spark context will be created.
- app_name (
str
) – Spark application name. - master (
str
) – Spark master. - local (
str
) – Local-mode master, used if master is not defined here or in the Spark configuration. - log (
str
) – Local path for Hail log file. Does not currently support distributed file systems like Google Storage, S3, or HDFS. - quiet (
bool
) – Print fewer log messages. - append (
bool
) – Append to the end of the log file. - min_block_size (
int
) – Minimum file block size in MB. - branching_factor (
int
) – Branching factor for tree aggregation. - tmp_dir (
str
) – Temporary directory for Hail files. Must be a network-visible file path. - default_reference (
str
) – Default reference genome. Either'GRCh37'
or'GRCh38'
.
-
hail.
default_reference
()[source]¶ Returns the default reference genome
'GRCh37'
.Returns: ReferenceGenome
-
hail.
get_reference
(name) → hail.genetics.reference_genome.ReferenceGenome[source]¶ Returns the reference genome corresponding to name.
Notes
Hail’s built-in references are
'GRCh37'
andGRCh38'
. The contig names and lengths come from the GATK resource bundle: human_g1k_v37.dict and Homo_sapiens_assembly38.dict.If
name='default'
, the value ofdefault_reference()
is returned.Parameters: name ( str
) – Name of a previously loaded reference genome or one of Hail’s built-in references:'GRCh37'
,'GRCh38'
, and'default'
.Returns: ReferenceGenome