Interval

class hail.representation.Interval(start, end)[source]

A genomic interval marked by start and end loci.

Parameters:
  • start (Locus) – inclusive start locus
  • end (Locus) – exclusive end locus

Attributes

end Locus object referring to the end of the interval (exclusive).
start Locus object referring to the start of the interval (inclusive).

Methods

__init__
contains True if the supplied locus is contained within the interval.
overlaps True if the the supplied interval contains any locus in common with this one.
parse Parses a genomic interval from string representation.
contains(locus)[source]

True if the supplied locus is contained within the interval.

This membership check is left-inclusive, right-exclusive. This means that the interval 1:100-101 includes 1:100 but not 1:101.

Type:locus: Locus.
Return type:bool
end

Locus object referring to the end of the interval (exclusive).

Return type:Locus
overlaps(interval)[source]

True if the the supplied interval contains any locus in common with this one.

The statement

>>> interval1.overlaps(interval2)

is equivalent to

>>> interval1.contains(interval2.start) or interval2.contains(interval1.start)
Type:interval: Interval
Return type:bool
static parse(string)[source]

Parses a genomic interval from string representation.

Examples:

>>> interval_1 = Interval.parse('X:100005-X:150020')
>>> interval_2 = Interval.parse('16:29500000-30200000')
>>> interval_3 = Interval.parse('16:29.5M-30.2M')  # same as interval_2
>>> interval_4 = Interval.parse('16:30000000-END')
>>> interval_5 = Interval.parse('16:30M-END')  # same as interval_4
>>> interval_6 = Interval.parse('1-22')  # autosomes
>>> interval_7 = Interval.parse('X')  # all of chromosome X

There are several acceptable representations.

CHR1:POS1-CHR2:POS2 is the fully specified representation, and we use this to define the various shortcut representations.

In a POS field, start (Start, START) stands for 0.

In a POS field, end (End, END) stands for max int.

In a POS field, the qualifiers m (M) and k (K) multiply the given number by 1,000,000 and 1,000, respectively. 1.6K is short for 1600, and 29M is short for 29000000.

CHR:POS1-POS2 stands for CHR:POS1-CHR:POS2

CHR1-CHR2 stands for CHR1:START-CHR2:END

CHR stands for CHR:START-CHR:END

Note that the start locus must precede the start locus.

Return type:Interval
start

Locus object referring to the start of the interval (inclusive).

Return type:Locus