String functions

json(x) Convert an expression to a JSON string expression.
hamming(s1, s2) Returns the Hamming distance between the two strings.
delimit(collection[, delimiter]) Joins elements of collection into single string delimited by delimiter.
entropy(s) Returns the Shannon entropy of the character distribution defined by the string.
hail.expr.functions.json(x) → hail.expr.expressions.typed_expressions.StringExpression[source]

Convert an expression to a JSON string expression.

Examples

>>> hl.json([1,2,3,4,5]).value
'[1,2,3,4,5]'
>>> hl.json(hl.struct(a='Hello', b=0.12345, c=[1,2], d={'hi', 'bye'})).value
'{"a":"Hello","c":[1,2],"b":0.12345,"d":["bye","hi"]}'
Parameters:x – Expression to convert.
Returns:StringExpression – String expression with JSON representation of x.
hail.expr.functions.hamming(s1, s2) → hail.expr.expressions.typed_expressions.Int32Expression[source]

Returns the Hamming distance between the two strings.

Examples

>>> hl.hamming('ATATA', 'ATGCA').value
2
>>> hl.hamming('abcdefg', 'zzcdefz').value
3

Notes

This method will fail if the two strings have different length.

Parameters:
Returns:

Expression of type tint32

hail.expr.functions.delimit(collection, delimiter=', ') → hail.expr.expressions.typed_expressions.StringExpression[source]

Joins elements of collection into single string delimited by delimiter.

Examples

>>> a = ['Bob', 'Charlie', 'Alice', 'Bob', 'Bob']
>>> hl.delimit(a).value
'Bob,Charlie,Alice,Bob,Bob'

Notes

If the element type of collection is not tstr, then the str() function will be called on each element before joining with the delimiter.

Parameters:
Returns:

StringExpression – Joined string expression.

hail.expr.functions.entropy(s) → hail.expr.expressions.typed_expressions.Float64Expression[source]

Returns the Shannon entropy of the character distribution defined by the string.

Examples

>>> hl.entropy('ac').value
1.0
>>> hl.entropy('accctg').value
1.79248

Notes

For a string of length \(n\) with \(k\) unique characters \(\{ c_1, \dots, c_k \}\), let \(p_i\) be the probability that a randomly chosen character is \(c_i\), e.g. the number of instances of \(c_i\) divided by \(n\). Then the base-2 Shannon entropy is given by

\[H = \sum_{i=1}^k p_i \log_2(p_i).\]
Parameters:s (StringExpression)
Returns:Expression of type tfloat64