BlockMatrix

class hail.linalg.BlockMatrix(jbm)[source]

Hail’s block-distributed matrix of tfloat64 elements.

Danger

This functionality is experimental. It may not be tested as well as other parts of Hail and the interface is subject to change.

A block matrix is a distributed analogue of a two-dimensional NumPy ndarray with shape (n_rows, n_cols) and NumPy dtype float64. Import the class with:

>>> from hail.linalg import BlockMatrix

Under the hood, block matrices are partitioned like a checkerboard into square blocks with side length a common block size. Blocks in the final row or column of blocks may be truncated, so block size need not evenly divide the matrix dimensions. Block size defaults to the value given by default_block_size().

Operations and broadcasting

The core operations are consistent with NumPy: +, -, *, and / for element-wise addition, subtraction, multiplication, and division; @ for matrix multiplication; T for transpose; and ** for element-wise exponentiation to a scalar power.

For element-wise binary operations, each operand may be a block matrix, an ndarray, or a scalar (int or float). For matrix multiplication, each operand may be a block matrix or an ndarray. If either operand is a block matrix, the result is a block matrix. Binary operations between block matrices require that both operands have the same block size.

To interoperate with block matrices, ndarray operands must be one or two dimensional with dtype convertible to float64. One-dimensional ndarrays of shape (n) are promoted to two-dimensional ndarrays of shape (1, n), i.e. a single row.

Block matrices support broadcasting of +, -, *, and / between matrices of different shapes, consistent with the NumPy broadcasting rules. There is one exception: block matrices do not currently support element-wise “outer product” of a single row and a single column, although the same effect can be achieved for * by using @.

Warning

For binary operations, if the first operand is an ndarray and the second operand is a block matrix, the result will be a ndarray of block matrices. To achieve the desired behavior for + and *, place the block matrix operand first; for -, /, and @, first convert the ndarray to a block matrix using from_numpy().

Indexing and slicing

Block matrices also support NumPy-style 2-dimensional indexing and slicing, with two differences. First, slices start:stop:step must be non-empty with positive step. Second, even if only one index is a slice, the resulting block matrix is still 2-dimensional.

For example, for a block matrix bm with 10 rows and 10 columns:

  • bm[0, 0] is the element in row 0 and column 0 of bm.
  • bm[0:1, 0] is a block matrix with 1 row, 1 column, and element bm[0, 0].
  • bm[2, :] is a block matrix with 1 row, 10 columns, and elements from row 2 of bm.
  • bm[:3, -1] is a block matrix with 3 rows, 1 column, and the first 3 elements of the last column of bm.
  • bm[::2, ::2] is a block matrix with 5 rows, 5 columns, and all evenly-indexed elements of bm.

Use filter(), filter_rows(), and filter_cols() to subset to non-slice subsets of rows and columns, e.g. to rows [0, 2, 5].

Block-sparse representation

By default, block matrices compute and store all blocks explicitly. However, some applications involve block matrices in which:

  • some blocks consist entirely of zeroes.
  • some blocks are not of interest.

For example, statistical geneticists often want to compute and manipulate a banded correlation matrix capturing “linkage disequilibrium” between nearby variants along the genome. In this case, working with the full correlation matrix for tens of millions of variants would be prohibitively expensive, and in any case, entries far from the diagonal are either not of interest or ought to be zeroed out before downstream linear algebra.

To enable such computations, block matrices do not require that all blocks be realized explicitly. Implicit (dropped) blocks behave as blocks of zeroes, so we refer to a block matrix in which at least one block is implicitly zero as a block-sparse matrix. Otherwise, we say the matrix is block-dense. The property is_sparse() encodes this state.

Dropped blocks are not stored in memory or on write(). In fact, blocks that are dropped prior to an action like export() or to_numpy() are never computed in the first place, nor are any blocks of upstream operands on which only dropped blocks depend! In addition, linear algebra is accelerated by avoiding, for example, explicit addition of or multiplication by blocks of zeroes.

Block-sparse matrices may be created with sparsify_band(), sparsify_rectangles(), sparsify_row_intervals(), and sparsify_triangle().

The following methods naturally propagate block-sparsity:

  • Addition and subtraction “union” realized blocks.
  • Element-wise multiplication “intersects” realized blocks.
  • Transpose “transposes” realized blocks.
  • abs() and sqrt() preserve the realized blocks.
  • sum() along an axis realizes those blocks for which at least one block summand is realized.

These following methods always result in a block-dense matrix:

The following methods fail if any operand is block-sparse, but can be forced by first applying densify().

  • Element-wise division between two block matrices.
  • Multiplication by a scalar or broadcasted vector which includes an infinite or nan value.
  • Division by a scalar or broadcasted vector which includes a zero, infinite or nan value.
  • Division of a scalar or broadcasted vector by a block matrix.
  • Element-wise exponentiation by a negative exponent.
  • Natural logarithm, log().

Attributes

T Matrix transpose.
block_size Block size.
is_sparse Returns True if block-sparse.
n_cols Number of columns.
n_rows Number of rows.
shape Shape of matrix.

Methods

__init__ Initialize self.
abs Element-wise absolute value.
cache Persist this block matrix in memory.
default_block_size Default block side length.
densify Restore all dropped blocks as explicit blocks of zeros.
diagonal Extracts diagonal elements as ndarray.
entries Returns a table with the indices and value of each block matrix entry.
export Exports a stored block matrix as a delimited text file.
export_rectangles Export rectangular regions from a stored block matrix to delimited text files.
fill Creates a block matrix with all elements the same value.
filter Filters matrix rows and columns.
filter_cols Filters matrix columns.
filter_rows Filters matrix rows.
from_entry_expr Create a block matrix using a matrix table entry expression.
from_numpy Distributes a NumPy ndarray as a block matrix.
fromfile Creates a block matrix from a binary file.
log Element-wise natural logarithm.
persist Persists this block matrix in memory or on disk.
random Creates a block matrix with standard normal or uniform random entries.
read Reads a block matrix.
sparsify_band Filter to a diagonal band.
sparsify_rectangles Filter to blocks overlapping the union of rectangular regions.
sparsify_row_intervals Creates a block-sparse matrix by filtering to an interval for each row.
sparsify_triangle Filter to the upper or lower triangle.
sqrt Element-wise square root.
sum Sums array elements over one or both axes.
to_numpy Collects the block matrix into a NumPy ndarray.
tofile Collects and writes data to a binary file.
unpersist Unpersists this block matrix from memory/disk.
write Writes the block matrix.
write_from_entry_expr Writes a block matrix from a matrix table entry expression.
T

Matrix transpose.

Returns:BlockMatrix
abs()[source]

Element-wise absolute value.

Returns:BlockMatrix
block_size

Block size.

Returns:int
cache()[source]

Persist this block matrix in memory.

Notes

This method is an alias for persist("MEMORY_ONLY").

Returns:BlockMatrix – Cached block matrix.
static default_block_size()[source]

Default block side length.

densify()[source]

Restore all dropped blocks as explicit blocks of zeros.

Returns:BlockMatrix
diagonal()[source]

Extracts diagonal elements as ndarray.

Returns:numpy.ndarray
entries()[source]

Returns a table with the indices and value of each block matrix entry.

Examples

>>> import numpy as np
>>> block_matrix = BlockMatrix.from_numpy(np.array([[5, 7], [2, 8]]), 2)
>>> entries_table = block_matrix.entries()
>>> entries_table.show()
+-------+-------+-------------+
|     i |     j |       entry |
+-------+-------+-------------+
| int64 | int64 |     float64 |
+-------+-------+-------------+
|     0 |     0 | 5.00000e+00 |
|     0 |     1 | 7.00000e+00 |
|     1 |     0 | 2.00000e+00 |
|     1 |     1 | 8.00000e+00 |
+-------+-------+-------------+

Notes

The resulting table may be filtered, aggregated, and queried, but should only be directly exported to disk if the block matrix is very small.

For block-sparse matrices, only realized blocks are included. To force inclusion of zeroes in dropped blocks, apply densify() first.

The resulting table has the following fields:

  • i (tint64, key field) – Row index.
  • j (tint64, key field) – Column index.
  • entry (tfloat64) – Value of entry.
Returns:Table – Table with a row for each entry.
static export(path_out, delimiter='\t', header=None, add_index=False, parallel=None, partition_size=None, entries='full')[source]

Exports a stored block matrix as a delimited text file.

Examples

Consider the following matrix.

>>> import numpy as np
>>> nd = np.array([[1.0, 0.8, 0.7],
...                [0.8, 1.0 ,0.3],
...                [0.7, 0.3, 1.0]])
>>> BlockMatrix.from_numpy(nd).write('output/example.bm', force_row_major=True)

Export the full matrix as a file with tab-separated values:

>>> BlockMatrix.export('output/example.bm', 'output/example.tsv')

Export the upper-triangle of the matrix as a block gzipped file of comma-separated values.

>>> BlockMatrix.export(path_in='output/example.bm',
...                    path_out='output/example.csv.bgz',
...                    delimiter=',',
...                    entries='upper')

Export the full matrix with row indices in parallel as a folder of gzipped files, each with a header line for columns idx, A, B, and C.

>>> BlockMatrix.export(path_in='output/example.bm',
...                    path_out='output/example.gz',
...                    header=' '.join(['idx', 'A', 'B', 'C']),
...                    add_index=True,
...                    parallel='header_per_shard',
...                    partition_size=2)

This produces two compressed files which uncompress to:

idx A   B   C
0   1.0 0.8 0.7
1   0.8 1.0 0.3
idx A   B   C
2   0.7 0.3 1.0

Warning

The block matrix must be stored in row-major format, as results from BlockMatrix.write() with force_row_major=True and from BlockMatrix.write_from_entry_expr(). Otherwise, export() will fail.

Notes

The five options for entries are illustrated below.

Full:

1.0 0.8 0.7
0.8 1.0 0.3
0.7 0.3 1.0

Lower triangle:

1.0
0.8 1.0
0.7 0.3 1.0

Strict lower triangle:

0.8
0.7 0.3

Upper triangle:

1.0 0.8 0.7
1.0 0.3
1.0

Strict upper triangle:

0.8 0.7
0.3

The number of columns must be less than \(2^{31}\).

The number of partitions (file shards) exported equals the ceiling of n_rows / partition_size. By default, there is one partition per row of blocks in the block matrix. The number of partitions should be at least the number of cores for efficient parallelism. Setting the partition size to an exact (rather than approximate) divisor or multiple of the block size reduces superfluous shuffling of data.

If parallel is None, these file shards are then serially concatenated by one core into one file, a slow process. See other options below.

It is highly recommended to export large files with a .bgz extension, which will use a block gzipped compression codec. These files can be read natively with Python’s gzip.open and R’s read.table.

Parameters:
  • path_in (str) – Path to input block matrix, stored row-major on disk.
  • path_out (str) – Path for export. Use extension .gz for gzip or .bgz for block gzip.
  • delimiter (str) – Column delimiter.
  • header (str, optional) – If provided, header is prepended before the first row of data.
  • add_index (bool) – If True, add an initial column with the absolute row index.
  • parallel (str, optional) – If 'header_per_shard', create a folder with one file per partition, each with a header if provided. If 'separate_header', create a folder with one file per partition without a header; write the header, if provided, in a separate file. If None, serially concatenate the header and all partitions into one file; export will be slower. If header is None then 'header_per_shard' and 'separate_header' are equivalent.
  • partition_size (int, optional) – Number of rows to group per partition for export. Default given by block size of the block matrix.
  • entries (str) -- Describes which entries to export. One of: `’full’, ``'lower', 'strict_lower', 'upper', 'strict_upper'.
static export_rectangles(path_out, rectangles, delimiter='\t', n_partitions=None)[source]

Export rectangular regions from a stored block matrix to delimited text files.

Examples

Consider the following block matrix:

>>> import numpy as np
>>> nd = np.array([[ 1.0,  2.0,  3.0,  4.0],
...                [ 5.0,  6.0,  7.0,  8.0],
...                [ 9.0, 10.0, 11.0, 12.0],
...                [13.0, 14.0, 15.0, 16.0]])

Filter to the three rectangles and write.

>>> rectangles = [[0, 1, 0, 1], [0, 3, 0, 2], [1, 2, 0, 4]]
>>>
>>> (BlockMatrix.from_numpy(nd)
...     .sparsify_rectangles(rectangles)
...     .write('output/example.bm', force_row_major=True))

Export the three rectangles to TSV files:

>>> BlockMatrix.export_rectangles(
...     path_in='output/example.bm',
...     path_out='output/example',
...     rectangles = rectangles)

This produces three files in the folder output/example.

The first file is rect-0_0-1-0-1:

1.0

The second file is rect-1_0-3-0-2:

1.0 2.0
5.0 6.0
9.0 10.0

The third file is rect-2_1-2-0-4:

5.0 6.0 7.0 8.0

Warning

The block matrix must be stored in row-major format, as results from BlockMatrix.write() with force_row_major=True and from BlockMatrix.write_from_entry_expr(). Otherwise, export() will fail.

Notes

This method exports rectangular regions of a stored block matrix to delimited text files, in parallel by region.

The block matrix can be sparse so long as all blocks overlapping the rectangles are present, i.e. this method does not currently support implicit zeros.

Each rectangle is encoded as a list of length four of the form [row_start, row_stop, col_start, col_stop], where starts are inclusive and stops are exclusive. These must satisfy 0 <= row_start <= row_stop <= n_rows and 0 <= col_start <= col_stop <= n_cols.

For example [0, 2, 1, 3] corresponds to the row-index range [0, 2) and column-index range [1, 3), i.e. the elements at positions (0, 1), (0, 2), (1, 1), and (1, 2).

Each file name encodes the index of the rectangle in rectangles and the bounds as formatted in the example.

The number of rectangles must be less than \(2^{29}\).

Parameters:
  • path_in (srt) – Path to input block matrix, stored row-major on disk.
  • path_out (str) – Path for folder of exported files.
  • rectangles (list of list of int) – List of rectangles of the form [row_start, row_stop, col_start, col_stop].
  • delimiter (str) – Column delimiter.
  • n_partitions (int, optional) – Maximum parallelism of export. Defaults to (and cannot exceed) the number of rectangles.
classmethod fill(n_rows, n_cols, value, block_size=None)[source]

Creates a block matrix with all elements the same value.

Examples

Create a block matrix with 10 rows, 20 columns, and all elements equal to 1.0:

>>> bm = BlockMatrix.fill(10, 20, 1.0)
Parameters:
  • n_rows (int) – Number of rows.
  • n_cols (int) – Number of columns.
  • value (float) – Value of all elements.
  • block_size (int, optional) – Block size. Default given by default_block_size().
Returns:

BlockMatrix

filter(rows_to_keep, cols_to_keep)[source]

Filters matrix rows and columns.

Notes

This method has the same effect as BlockMatrix.filter_cols() followed by BlockMatrix.filter_rows() (or vice versa), but filters the block matrix in a single pass which may be more efficient.

Parameters:
  • rows_to_keep (list of int) – Indices of rows to keep. Must be non-empty and increasing.
  • cols_to_keep (list of int) – Indices of columns to keep. Must be non-empty and increasing.
Returns:

BlockMatrix

filter_cols(cols_to_keep)[source]

Filters matrix columns.

Parameters:cols_to_keep (list of int) – Indices of columns to keep. Must be non-empty and increasing.
Returns:BlockMatrix
filter_rows(rows_to_keep)[source]

Filters matrix rows.

Parameters:rows_to_keep (list of int) – Indices of rows to keep. Must be non-empty and increasing.
Returns:BlockMatrix
classmethod from_entry_expr(entry_expr, block_size=None)[source]

Create a block matrix using a matrix table entry expression.

Examples

>>> mt = hl.balding_nichols_model(3, 25, 50)
>>> bm = BlockMatrix.from_entry_expr(mt.GT.n_alt_alleles())

Notes

Do not use this method if you want to store a copy of the resulting block matrix. Instead, use write_from_entry_expr() followed by BlockMatrix.read().

If a pipelined transformation significantly downsamples the rows of the underlying matrix table, then repartitioning the matrix table ahead of this method will greatly improve its performance.

The method will fail if any values are missing. To be clear, special float values like nan are not missing values.

Parameters:
Returns:

BlockMatrix

classmethod from_numpy(ndarray, block_size=None)[source]

Distributes a NumPy ndarray as a block matrix.

Examples

>>> import numpy as np
>>> a = np.random.rand(10, 20)
>>> bm = BlockMatrix.from_numpy(a)

Notes

The ndarray must have two dimensions, each of non-zero size.

The number of entries must be less than \(2^{31}\).

Parameters:
  • ndarray (numpy.ndarray) – ndarray with two dimensions, each of non-zero size.
  • block_size (int, optional) – Block size. Default given by default_block_size().
Returns:

BlockMatrix

classmethod fromfile(uri, n_rows, n_cols, block_size=None)[source]

Creates a block matrix from a binary file.

Examples

>>> import numpy as np
>>> a = np.random.rand(10, 20)
>>> a.tofile('/local/file') 

To create a block matrix of the same dimensions:

>>> bm = BlockMatrix.fromfile('file:///local/file', 10, 20) 

Notes

This method, analogous to numpy.fromfile, reads a binary file of float64 values in row-major order, such as that produced by numpy.tofile or BlockMatrix.tofile().

Binary files produced and consumed by tofile() and fromfile() are not platform independent, so should only be used for inter-operating with NumPy, not storage. Use BlockMatrix.write() and BlockMatrix.read() to save and load block matrices, since these methods write and read blocks in parallel and are platform independent.

A NumPy ndarray must have type float64 for the output of func:numpy.tofile to be a valid binary input to fromfile(). This is not checked.

The number of entries must be less than \(2^{31}\).

Parameters:
  • uri (str, optional) – URI of binary input file.
  • n_rows (int) – Number of rows.
  • n_cols (int) – Number of columns.
  • block_size (int, optional) – Block size. Default given by default_block_size().

See also

from_numpy()

is_sparse

Returns True if block-sparse.

Notes

A block matrix is block-sparse if at least of its blocks is dropped, i.e. implicitly a block of zeros.

Returns:bool
log()[source]

Element-wise natural logarithm.

Returns:BlockMatrix
n_cols

Number of columns.

Returns:int
n_rows

Number of rows.

Returns:int
persist(storage_level='MEMORY_AND_DISK')[source]

Persists this block matrix in memory or on disk.

Notes

The BlockMatrix.persist() and BlockMatrix.cache() methods store the current block matrix on disk or in memory temporarily to avoid redundant computation and improve the performance of Hail pipelines. This method is not a substitution for BlockMatrix.write(), which stores a permanent file.

Most users should use the “MEMORY_AND_DISK” storage level. See the Spark documentation for a more in-depth discussion of persisting data.

Parameters:storage_level (str) – Storage level. One of: NONE, DISK_ONLY, DISK_ONLY_2, MEMORY_ONLY, MEMORY_ONLY_2, MEMORY_ONLY_SER, MEMORY_ONLY_SER_2, MEMORY_AND_DISK, MEMORY_AND_DISK_2, MEMORY_AND_DISK_SER, MEMORY_AND_DISK_SER_2, OFF_HEAP
Returns:BlockMatrix – Persisted block matrix.
classmethod random(n_rows, n_cols, block_size=None, seed=0, uniform=False)[source]

Creates a block matrix with standard normal or uniform random entries.

Examples

Create a block matrix with 10 rows, 20 columns, and standard normal entries:

>>> bm = BlockMatrix.random(10, 20)
Parameters:
  • n_rows (int) – Number of rows.
  • n_cols (int) – Number of columns.
  • block_size (int, optional) – Block size. Default given by default_block_size().
  • seed (int) – Random seed.
  • uniform (bool) – If True, entries are drawn from the uniform distribution on [0,1]. If False, entries are drawn from the standard normal distribution.
Returns:

BlockMatrix

classmethod read(path)[source]

Reads a block matrix.

Parameters:path (str) – Path to input file.
Returns:BlockMatrix
shape

Shape of matrix.

Returns:(int, int) – Number of rows and number of columns.
sparsify_band(lower=0, upper=0, blocks_only=False)[source]

Filter to a diagonal band.

Examples

Consider the following block matrix:

>>> import numpy as np
>>> nd = np.array([[ 1.0,  2.0,  3.0,  4.0],
...                [ 5.0,  6.0,  7.0,  8.0],
...                [ 9.0, 10.0, 11.0, 12.0],
...                [13.0, 14.0, 15.0, 16.0]])
>>> bm = BlockMatrix.from_numpy(nd, block_size=2)

Filter to a band from one below the diagonal to two above the diagonal and collect to NumPy:

>>> bm.sparsify_band(lower=-1, upper=2).to_numpy()
array([[ 1.,  2.,  3.,  0.],
       [ 5.,  6.,  7.,  8.],
       [ 0., 10., 11., 12.],
       [ 0.,  0., 15., 16.]])

Set all blocks fully outside the diagonal to zero and collect to NumPy:

>>> bm.sparsify_band(lower=0, upper=0, blocks_only=True).to_numpy()
array([[ 1.,  2.,  0.,  0.],
       [ 5.,  6.,  0.,  0.],
       [ 0.,  0., 11., 12.],
       [ 0.,  0., 15., 16.]])

Notes

This method creates a block-sparse matrix by zeroing out all blocks which are disjoint from a diagonal band. By default, all elements outside the band but inside blocks that overlap the band are set to zero as well.

The band is defined in terms of inclusive lower and upper indices relative to the diagonal. For example, the indices -1, 0, and 1 correspond to the sub-diagonal, diagonal, and super-diagonal, respectively. The diagonal band contains the elements at positions \((i, j)\) such that

\[\mathrm{lower} \leq j - i \leq \mathrm{upper}.\]

lower must be less than or equal to upper, but their values may exceed the dimensions of the matrix, the band need not include the diagonal, and the matrix need not be square.

Parameters:
  • lower (int) – Index of lowest band relative to the diagonal.
  • upper (int) – Index of highest band relative to the diagonal.
  • blocks_only (bool) – If False, set all elements outside the band to zero. If True, only set all blocks outside the band to blocks of zeros; this is more efficient.
Returns:

BlockMatrix – Sparse block matrix.

sparsify_rectangles(rectangles)[source]

Filter to blocks overlapping the union of rectangular regions.

Examples

Consider the following block matrix:

>>> import numpy as np
>>> nd = np.array([[ 1.0,  2.0,  3.0,  4.0],
...                [ 5.0,  6.0,  7.0,  8.0],
...                [ 9.0, 10.0, 11.0, 12.0],
...                [13.0, 14.0, 15.0, 16.0]])
>>> bm = BlockMatrix.from_numpy(nd)

Filter to blocks covering three rectangles and collect to NumPy:

>>> bm.sparsify_rectangles([[0, 1, 0, 1], [0, 3, 0, 2], [1, 2, 0, 4]]).to_numpy()
array([[ 1.,  2.,  3.,  4.],
       [ 5.,  6.,  7.,  8.],
       [ 9., 10.,  0.,  0.],
       [13., 14.,  0.,  0.]])

Notes

This method creates a block-sparse matrix by zeroing out (dropping) all blocks which are disjoint from the union of a set of rectangular regions. Partially overlapping blocks are not modified.

Each rectangle is encoded as a list of length four of the form [row_start, row_stop, col_start, col_stop], where starts are inclusive and stops are exclusive. These must satisfy 0 <= row_start <= row_stop <= n_rows and 0 <= col_start <= col_stop <= n_cols.

For example [0, 2, 1, 3] corresponds to the row-index range [0, 2) and column-index range [1, 3), i.e. the elements at positions (0, 1), (0, 2), (1, 1), and (1, 2).

The number of rectangles must be less than \(2^{29}\).

Parameters:rectangles (list of list of int) – List of rectangles of the form [row_start, row_stop, col_start, col_stop].
Returns:BlockMatrix – Sparse block matrix.
sparsify_row_intervals(starts, stops, blocks_only=False)[source]

Creates a block-sparse matrix by filtering to an interval for each row.

Examples

Consider the following block matrix:

>>> import numpy as np
>>> nd = np.array([[ 1.0,  2.0,  3.0,  4.0],
...                [ 5.0,  6.0,  7.0,  8.0],
...                [ 9.0, 10.0, 11.0, 12.0],
...                [13.0, 14.0, 15.0, 16.0]])
>>> bm = BlockMatrix.from_numpy(nd, block_size=2)

Set all elements outside the given row intervals to zero and collect to NumPy:

>>> (bm.sparsify_row_intervals(starts=[1, 0, 2, 2],
...                            stops= [2, 0, 3, 4])
...    .to_numpy())
array([[ 0.,  2.,  0.,  0.],
       [ 0.,  0.,  0.,  0.],
       [ 0.,  0., 11.,  0.],
       [ 0.,  0., 15., 16.]])

Set all blocks fully outside the given row intervals to blocks of zeros and collect to NumPy:

>>> (bm.sparsify_row_intervals(starts=[1, 0, 2, 2],
...                            stops= [2, 0, 3, 4],
...                            blocks_only=True)
...    .to_numpy())
array([[ 1.,  2.,  0.,  0.],
       [ 5.,  6.,  0.,  0.],
       [ 0.,  0., 11., 12.],
       [ 0.,  0., 15., 16.]])

Notes

This method creates a block-sparse matrix by zeroing out all blocks which are disjoint from all row intervals. By default, all elements outside the row intervals but inside blocks that overlap the row intervals are set to zero as well.

starts and stops must both have length equal to the number of rows. The interval for row i is [starts[i], stops[i]). In particular, 0 <= starts[i] <= stops[i] <= n_cols is required for all i.

This method requires the number of rows to be less than \(2^{31}\).

Parameters:
  • starts (list of int) – Start indices for each row (inclusive).
  • stops (list of int) – Stop indices for each row (exclusive).
  • blocks_only (bool) – If False, set all elements outside row intervals to zero. If True, only set all blocks outside row intervals to blocks of zeros; this is more efficient.
Returns:

BlockMatrix – Sparse block matrix.

sparsify_triangle(lower=False, blocks_only=False)[source]

Filter to the upper or lower triangle.

Examples

Consider the following block matrix:

>>> import numpy as np
>>> nd = np.array([[ 1.0,  2.0,  3.0,  4.0],
...                [ 5.0,  6.0,  7.0,  8.0],
...                [ 9.0, 10.0, 11.0, 12.0],
...                [13.0, 14.0, 15.0, 16.0]])
>>> bm = BlockMatrix.from_numpy(nd, block_size=2)

Filter to the upper triangle and collect to NumPy:

>>> bm.sparsify_triangle().to_numpy()
array([[ 1.,  2.,  3.,  4.],
       [ 0.,  6.,  7.,  8.],
       [ 0.,  0., 11., 12.],
       [ 0.,  0.,  0., 16.]])

Set all blocks fully outside the upper triangle to zero and collect to NumPy:

>>> bm.sparsify_triangle(blocks_only=True).to_numpy()
array([[ 1.,  2.,  3.,  4.],
       [ 5.,  6.,  7.,  8.],
       [ 0.,  0., 11., 12.],
       [ 0.,  0., 15., 16.]])

Notes

This method creates a block-sparse matrix by zeroing out all blocks which are disjoint from the (non-strict) upper or lower triangle. By default, all elements outside the triangle but inside blocks that overlap the triangle are set to zero as well.

Parameters:
  • lower (bool) – If False, keep the upper triangle. If True, keep the lower triangle.
  • blocks_only (bool) – If False, set all elements outside the triangle to zero. If True, only set all blocks outside the triangle to blocks of zeros; this is more efficient.
Returns:

BlockMatrix – Sparse block matrix.

sqrt()[source]

Element-wise square root.

Returns:BlockMatrix
sum(axis=None)[source]

Sums array elements over one or both axes.

Examples

>>> import numpy as np
>>> nd = np.array([[ 1.0,  2.0,  3.0],
...                [ 4.0,  5.0,  6.0]])
>>> bm = BlockMatrix.from_numpy(nd)
>>> bm.sum()
21.0
>>> bm.sum(axis=0).to_numpy()
array([[5., 7., 9.]])
>>> bm.sum(axis=1).to_numpy()
array([[ 6.],
       [15.]])
Parameters:axis (int, optional) – Axis over which to sum. By default, sum all elements. If 0, sum over rows. If 1, sum over columns.
Returns:float or BlockMatrix – If None, returns a float. If 0, returns a block matrix with a single row. If 1, returns a block matrix with a single column.
to_numpy()[source]

Collects the block matrix into a NumPy ndarray.

Examples

>>> bm = BlockMatrix.random(10, 20)
>>> a = bm.to_numpy()

Notes

The number of entries must be less than \(2^{31}\).

The resulting ndarray will have the same shape as the block matrix.

Returns:numpy.ndarray
tofile(uri)[source]

Collects and writes data to a binary file.

Examples

>>> import numpy as np
>>> bm = BlockMatrix.random(10, 20)
>>> bm.tofile('file:///local/file') 

To create a numpy.ndarray of the same dimensions:

>>> a = np.fromfile('/local/file').reshape((10, 20)) 

Notes

This method, analogous to numpy.tofile, produces a binary file of float64 values in row-major order, which can be read by functions such as numpy.fromfile (if a local file) and BlockMatrix.fromfile().

Binary files produced and consumed by tofile() and fromfile() are not platform independent, so should only be used for inter-operating with NumPy, not storage. Use BlockMatrix.write() and BlockMatrix.read() to save and load block matrices, since these methods write and read blocks in parallel and are platform independent.

The number of entries must be less than \(2^{31}\).

Parameters:uri (str, optional) – URI of binary output file.

See also

to_numpy()

unpersist()[source]

Unpersists this block matrix from memory/disk.

Notes

This function will have no effect on a dataset that was not previously persisted.

Returns:BlockMatrix – Unpersisted block matrix.
write(path, force_row_major=False)[source]

Writes the block matrix.

Parameters:
  • path (str) – Path for output file.
  • force_row_major (bool) – If True, transform blocks in column-major format to row-major format before writing. If False, write blocks in their current format.
static write_from_entry_expr(path, block_size=None)[source]

Writes a block matrix from a matrix table entry expression.

Examples

>>> mt = hl.balding_nichols_model(3, 25, 50)
>>> BlockMatrix.write_from_entry_expr(mt.GT.n_alt_alleles(),
...                                   'output/model.bm')

Notes

The resulting file can be loaded with BlockMatrix.read().

If a pipelined transformation significantly downsamples the rows of the underlying matrix table, then repartitioning the matrix table ahead of this method will greatly improve its performance.

The method will fail if any values are missing. To be clear, special float values like nan are not missing values.

Parameters: