ReferenceGenome¶
-
class
hail.genetics.
ReferenceGenome
(name, contigs, lengths, x_contigs=[], y_contigs=[], mt_contigs=[], par=[])[source]¶ An object that represents a reference genome.
Examples
>>> contigs = ["1", "X", "Y", "MT"] >>> lengths = {"1": 249250621, "X": 155270560, "Y": 59373566, "MT": 16569} >>> par = [("X", 60001, 2699521)] >>> my_ref = hl.ReferenceGenome("my_ref", contigs, lengths, "X", "Y", "MT", par)
Parameters: - name (
str
) – Name of reference. Must be unique and NOT one of Hail’s predefined references:'GRCh37'
,'GRCh38'
, and'default'
. - contigs (
list
ofstr
) – Contig names. - lengths (
dict
ofstr
toint
) – Dict of contig names to contig lengths. - x_contigs (
str
orlist
ofstr
) – Contigs to be treated as X chromosomes. - y_contigs (
str
orlist
ofstr
) – Contigs to be treated as Y chromosomes. - mt_contigs (
str
orlist
ofstr
) – Contigs to be treated as mitochondrial DNA. - par (
list
oftuple
of (str, int, int)) – List of tuples with (contig, start, end)
Attributes
contigs
Contig names. lengths
Dict of contig name to contig length. mt_contigs
Mitochondrial contigs. name
Name of reference genome. par
Pseudoautosomal regions. x_contigs
X contigs. y_contigs
Y contigs. Methods
__init__
Initialize self. add_liftover
Register a chain file for liftover. add_sequence
Load the reference sequence from a FASTA file. contig_length
Contig length. from_fasta_file
Create reference genome from a FASTA file. has_liftover
True
if a liftover chain file is available from this reference genome to the destination reference.has_sequence
True if the reference sequence has been loaded. read
Load reference genome from a JSON file. remove_liftover
Remove liftover to dest_reference_genome. remove_sequence
Remove the reference sequence. write
“Write this reference genome to a file in JSON format. -
add_liftover
(chain_file, dest_reference_genome)[source]¶ Register a chain file for liftover.
Notes
This method can only be run once per reference genome. Use
has_liftover()
to test whether a chain file has been registered.The chain file format is described here.
Chain files are hosted on google cloud for Hail’s built-in references:
GRCh37 to GRCh38 gs://hail-common/references/grch37_to_grch38.over.chain.gz
GRCh38 to GRCh37 gs://hail-common/references/grch38_to_grch37.over.chain.gz
Public download links are available here.
Parameters: - chain_file (
str
) – Path to chain file. Can be compressed (GZIP) or uncompressed. - dest_reference_genome (
str
orReferenceGenome
) – Reference genome to convert to.
- chain_file (
-
add_sequence
(fasta_file, index_file)[source]¶ Load the reference sequence from a FASTA file.
Notes
This method can only be run once per reference genome. Use
has_sequence()
to test whether a sequence is loaded.FASTA and index files are hosted on google cloud for Hail’s built-in references:
GRCh37
- FASTA file:
gs://hail-common/references/human_g1k_v37.fasta.gz
- Index file:
gs://hail-common/references/human_g1k_v37.fasta.fai
GRCh38
- FASTA file:
gs://hail-common/references/Homo_sapiens_assembly38.fasta.gz
- Index file:
gs://hail-common/references/Homo_sapiens_assembly38.fasta.fai
Public download links are available here.
Parameters: - fasta_file (
str
) – Path to FASTA file. Can be compressed (GZIP) or uncompressed. - index_file (
str
) – Path to FASTA index file. Must be uncompressed.
- FASTA file:
-
contig_length
(contig)[source]¶ Contig length.
Parameters: contig ( str
) – Contig name.Returns: int
– Length of contig.
-
contigs
¶ Contig names.
Returns: list
ofstr
-
classmethod
from_fasta_file
(name, fasta_file, index_file, x_contigs=[], y_contigs=[], mt_contigs=[], par=[])[source]¶ Create reference genome from a FASTA file.
Parameters: - name (
str
) – Name for new reference genome. - fasta_file (
str
) – Path to FASTA file. Can be compressed (GZIP) or uncompressed. - index_file (
str
) – Path to FASTA index file. Must be uncompressed. - x_contigs (
str
orlist
ofstr
) – Contigs to be treated as X chromosomes. - y_contigs (
str
orlist
ofstr
) – Contigs to be treated as Y chromosomes. - mt_contigs (
str
orlist
ofstr
) – Contigs to be treated as mitochondrial DNA. - par (
list
oftuple
of (str, int, int)) – List of tuples with (contig, start, end)
Returns: - name (
-
has_liftover
(dest_reference_genome)[source]¶ True
if a liftover chain file is available from this reference genome to the destination reference.Parameters: dest_reference_genome ( str
orReferenceGenome
)Returns: bool
-
lengths
¶ Dict of contig name to contig length.
Returns: list
ofstr
-
mt_contigs
¶ Mitochondrial contigs.
Returns: list
ofstr
-
name
¶ Name of reference genome.
Returns: str
-
classmethod
read
(path)[source]¶ Load reference genome from a JSON file.
Notes
The JSON file must have the following format:
{"name": "my_reference_genome", "contigs": [{"name": "1", "length": 10000000}, {"name": "2", "length": 20000000}, {"name": "X", "length": 19856300}, {"name": "Y", "length": 78140000}, {"name": "MT", "length": 532}], "xContigs": ["X"], "yContigs": ["Y"], "mtContigs": ["MT"], "par": [{"start": {"contig": "X","position": 60001},"end": {"contig": "X","position": 2699521}}, {"start": {"contig": "Y","position": 10001},"end": {"contig": "Y","position": 2649521}}] }
name must be unique and not overlap with Hail’s pre-instantiated references:
'GRCh37'
,'GRCh38'
, and'default'
.The contig names in xContigs, yContigs, and mtContigs must be present in contigs. The intervals listed in par must have contigs in either xContigs or yContigs and must have positions between 0 and the contig length given in contigs.Parameters: path ( str
) – Path to JSON file.Returns: ReferenceGenome
-
remove_liftover
(dest_reference_genome)[source]¶ Remove liftover to dest_reference_genome.
Parameters: dest_reference_genome ( str
orReferenceGenome
)Returns: bool
-
write
(output)[source]¶ “Write this reference genome to a file in JSON format.
Examples
>>> my_rg = hl.ReferenceGenome("new_reference", ["x", "y", "z"], {"x": 500, "y": 300, "z": 200}) >>> my_rg.write("output/new_reference.json")
Notes
Use
read
to reimport the exported reference genome in a new HailContext session.Parameters: output ( str
) – Path of JSON file to write.
-
x_contigs
¶ X contigs.
Returns: list
ofstr
-
y_contigs
¶ Y contigs.
Returns: list
ofstr
- name (