Basic Methods for Working with Hail Data¶
Get Data Into and Out of Hail¶
Import¶
Import data from a non-Hail format into a Hail format, using one of the import_* methods.
description: | Import a .tsv file as a table. |
---|---|
code: | >>> table = hl.import_table('data/kt_example1.tsv', impute=True, key='ID')
>>> table.show()
+-------+-------+-----+-------+-------+-------+-------+-------+
| ID | HT | SEX | X | Z | C1 | C2 | C3 |
+-------+-------+-----+-------+-------+-------+-------+-------+
| int32 | int32 | str | int32 | int32 | int32 | int32 | int32 |
+-------+-------+-----+-------+-------+-------+-------+-------+
| 1 | 65 | M | 5 | 4 | 2 | 50 | 5 |
| 2 | 72 | M | 6 | 3 | 2 | 61 | 1 |
| 3 | 70 | F | 7 | 3 | 10 | 81 | -5 |
| 4 | 60 | F | 8 | 2 | 11 | 90 | -10 |
+-------+-------+-----+-------+-------+-------+-------+-------+
|
dependencies: |
Export¶
Export Hail data to a non-Hail format, using one of the export_* methods.
description: | Export a matrix table as a VCF. |
---|---|
code: | >>> hl.export_vcf(mt, 'output/example.vcf.bgz')
|
dependencies: |
Write¶
Write data in a Hail format to disk using one of
the write() methods, e.g. Table.write()
or MatrixTable.write()
.
description: | Write a matrix table to disk. |
---|---|
code: | >>> mt.write('output/example.mt')
|
dependencies: |
Examine your data¶
Explore the schema¶
Matrix Table¶
description: | Get information about the fields and keys of a matrix table. |
---|---|
code: | >>> mt.describe()
----------------------------------------
Global fields:
'populations': array<str>
----------------------------------------
Column fields:
's': str
'is_case': bool
'pheno': struct {
is_case: bool,
is_female: bool,
age: float64,
height: float64,
blood_pressure: float64,
cohort_name: str
}
----------------------------------------
Row fields:
'locus': locus<GRCh37>
'alleles': array<str>
'rsid': str
'qual': float64
----------------------------------------
Entry fields:
'GT': call
'AD': array<int32>
'DP': int32
'GQ': int32
'PL': array<int32>
----------------------------------------
Column key: ['s']
Row key: ['locus', 'alleles']
Partition key: ['locus']
----------------------------------------
|
dependencies: |
Table¶
description: | Get information about the fields and keys of a table. |
---|---|
code: | >>> ht.describe()
----------------------------------------
Global fields:
None
----------------------------------------
Row fields:
'locus': locus<GRCh37>
'alleles': array<str>
----------------------------------------
Key: ['locus', 'alleles']
----------------------------------------
|
dependencies: |
Expression¶
description: | Get information about a specific field in a table or matrix table. |
---|---|
code: | >>> mt.s.describe()
--------------------------------------------------------
Type:
str
--------------------------------------------------------
Source:
<hail.matrixtable.MatrixTable object at 0x60e42f518>
Index:
['column']
--------------------------------------------------------
|
dependencies: | |
understanding: | We can select fields from a table or matrix table with an expression like
|
View your data locally¶
Table¶
description: | View the first n rows of a table. |
---|---|
code: | >>> ht.show(5)
+-------+-------+-----+-------+-------+-------+-------+-------+
| ID | HT | SEX | X | Z | C1 | C2 | C3 |
+-------+-------+-----+-------+-------+-------+-------+-------+
| int32 | int32 | str | int32 | int32 | int32 | int32 | int32 |
+-------+-------+-----+-------+-------+-------+-------+-------+
| 1 | 65 | M | 5 | 4 | 2 | 50 | 5 |
| 2 | 72 | M | 6 | 3 | 2 | 61 | 1 |
| 3 | 70 | F | 7 | 3 | 10 | 81 | -5 |
| 4 | 60 | F | 8 | 2 | 11 | 90 | -10 |
+-------+-------+-----+-------+-------+-------+-------+-------+
|
dependencies: |
Matrix Table¶
description: | View the columns, rows, or entries of a matrix table. |
---|---|
code: | >>> mt.rows().show()
>>> mt.cols().show()
>>> mt.entries().show()
|
understanding: | Unlike tables, matrix tables do not have a |
dependencies: |
|
Expression¶
description: | View an expression. |
---|---|
code: | >>> mt.rsid.show()
+---------------+--------------+-------------+
| locus | alleles | rsid |
+---------------+--------------+-------------+
| locus<GRCh37> | array<str> | str |
+---------------+--------------+-------------+
| 20:10579373 | ["C","T"] | rs78689061 |
| 20:13695607 | ["T","G"] | rs34414644 |
| 20:13698129 | ["G","A"] | rs78509779 |
| 20:14306896 | ["G","A"] | rs6042672 |
| 20:14306953 | ["G","T"] | rs6079391 |
| 20:15948325 | ["AG","A"] | NA |
| 20:15948326 | ["GAAA","G"] | NA |
| 20:17479423 | ["T","C"] | rs185188648 |
| 20:17600357 | ["G","A"] | rs11960 |
| 20:17640833 | ["A","C"] | NA |
+---------------+--------------+-------------+
|
dependencies: | |
understanding: |
|