Missingness

All values in Hail can be missing.

Expressions deal with missingness in a natural way. For example:

  • a missing value plus another value is always missing.
  • a conditional with a missing predicate is missing.
  • when aggregating a sum of values, the missing values are ignored.

Hail has a collection of primitive operations for dealing with missingness.

To start, let’s create expressions representing missing and non-missing values.

In [1]:
import hail as hl
hl.init()
Running on Apache Spark version 2.2.0
SparkUI available at http://172.31.20.142:4040
Welcome to
     __  __     <>__
    / /_/ /__  __/ /
   / __  / _ `/ / /
  /_/ /_/\_,_/_/_/   version devel-f7631a0c96cd
NOTE: This is a beta version. Interfaces may change
  during the beta period. We recommend pulling
  the latest changes weekly.
In [2]:
na = hl.null(hl.tint32)
x = hl.literal(5)

To evaluate an expression, ask for its value. Let’s look at a few expression involving missingness.

In [3]:
print(na.value)
None
In [4]:
print(x.value)
5
In [5]:
hl.is_defined(na).value
Out[5]:
False
In [6]:
hl.is_defined(x).value
Out[6]:
True
In [7]:
hl.is_missing(na).value
Out[7]:
True
In [8]:
hl.or_else(na, x).value
Out[8]:
5
In [9]:
hl.or_else(x, na).value
Out[9]:
5
In [10]:
hl.or_missing(True, x).value
Out[10]:
5
In [11]:
print(hl.or_missing(False, x).value)
None

The above is equivalent to:

In [12]:
print(hl.case().when(False, x).or_missing())
<Int32Expression of type int32>

Missingness propagates up

In Python, None + 5 is an error. In Hail, operating on a missing value doesn’t produce an error, but rather produces a missing result.

In [13]:
(x + 5).value
Out[13]:
10
In [14]:
print((na + 5).value)
None
In [15]:
a = hl.array([1, 1, 2, 3, 5, 8, 13, 21])
In [16]:
a[x].value
Out[16]:
8
In [17]:
print(a[na].value)
None