User-Defined Functions (UDFs)

User-Defined Functions (UDFs) are user-programmable routines that operate on a single row and emit a value, just like regular TQL expressions. UDF function bodies may contain multiple statements, delimited by ;, as long as the last statement returns a value. This allows you to extend the TQL Expression Language to encode custom pieces of functionality specific to your problem space.

UDFs are defined as a Python string.

/** 
 * This is the optional java-style function doc for udf 
 * @param some parameters 
 * @return return value of the function 
 */ 
function foo() { 
  function body 
} 

UDF’s contain two parts:

  1. Function doc(optional)

  2. Function definition(starts with keyword function)

There are three levels of UDFs, which we will cover in this guide:

  1. Table/Project Level UDFs

  2. Query level UDFs

  3. Inline expression UDFs

from zeenk.tql import *
-----------------------------------------------------------------
           Version: 20.1.17-SNAPSHOT
 Version Timestamp: 2021-08-28 15:05:50
       Version Age: 6 days, 2 hours, 18 minutes, 37 seconds
   Filesystem Root: /Users/smaguire/.tql/files
 Working Directory: /Users/smaguire/.tql
Configuration File: /Users/smaguire/.tql/tql-conf.yml
       Api Gateway: http://localhost:9000
    Service Status: Icarus: ONLINE, Daedalus: ONLINE
    Service Uptime: 1 days, 1 hours, 13 minutes, 34 seconds
-----------------------------------------------------------------

Table-level UDFs

Table UDFs can be added via the TQL function upload_udf(src1,src2, ...) and will be available to any subsequent queries just like any builtin function.

You can validate/upload/delete/get udfs on a table level

Defining UDF function source strings

Tip: multiple function strings can be provided and they will be uploaded in the order of declaration. If a UDF calls another UDF in it's source, make sure all it's dependencies are declared first.
div2_func_src = '''
/**
* given an integer, long, or floating point number, return the value / 2
* @param x Number
* @return Double
*/

function div2(x) { 
  x / 2 
}
'''

div6_func_src = '''
/**
* given an integer, long, or floating point number, return the value / 6
* @param x Number
* @return Double
*/

function div6(x) { 
  div2(x) / 3 
}
'''

UDF Validation

Validate a UDF to make sure there are no compilation errors, before uploading it to a table.

validate_udf(1, div2_func_src, div6_func_src)

Uploading a UDF to a table/project

Table/project-level UDFs are uploaded using the tql function upload_udf(project_id, src). Validation on the UDF will be performed during the upload.

# clean up udfs from previous run
delete_udf(1, 'div2')
delete_udf(1, 'div6')

# upload this UDF
upload_udf(1, div2_func_src, div6_func_src)

Once the UDF has been successfully saved on the table, it may be used in TQL queries, just like any other native TQL function:

select('div2(3)', 'div6(18)').from_events(1).limit(3)

Query results:

partition "_default"
_c0 _c1
0 1.5 3.0
1 1.5 3.0
2 1.5 3.0
query produced 3 rows x 2 columns in 0.3 seconds

Looking up Table-level UDFs

Find a Table-level UDF function definition by name:

get_udf(1, 'div6')
            <tr>
                            <td id="T_12707_row0_col0" class="data row0 col0" >div6</td>
                    <td id="T_12707_row0_col1" class="data row0 col1" >/**
  • given an integer, long, or floating point number, return the value / 6

  • @param x Number

  • @return Double */

function div6(x) { div2(x) / 3 }

</table style=”display:inline; padding-right: 20px”>

Deleting a UDF

# delete a udf by function name
delete_udf(1, 'div6')

# trying to retrieve it again will result in a TQLNotFoundException
get_udf(1, 'div6')
function div6 doesn't exist on project: 1 (404)



---------------------------------------------------------------------------

TQLAnalysisNotFound                       Traceback (most recent call last)

<ipython-input-1-f53f5e57b2fc> in <module>
      3 
      4 # trying to retrieve it again will result in a TQLNotFoundException
----> 5 get_udf(1, 'div6')


~/dev/nanigans/noumena/noumena-python/noumena/noumena/tql/udf.py in get_udf(project_id, function_name)
     79 
     80     """
---> 81     function_src = icarus.get(f'projects/{project_id}/udf/{function_name}')
     82     return UDF({function_name: function_src})
     83 


~/dev/nanigans/noumena/noumena-python/noumena/noumena/tql/icarus.py in get(url, params, print_json)
     20     url = os.path.join(ICARUS_BASE, url)
     21     resp = requests.get(url, params=params)
---> 22     return _handle_response(url, resp, print_json=print_json)
     23 
     24 


~/dev/nanigans/noumena/noumena-python/noumena/noumena/tql/icarus.py in _handle_response(url, resp, print_json)
     54                 reason = resp.content
     55             print(f'{reason} (404)')
---> 56             raise TQLAnalysisNotFound(f'{url} {reason} (404)')
     57         if resp.status_code == 400:
     58             try:


TQLAnalysisNotFound: http://localhost:9000/v1/projects/1/udf/div6 function div6 doesn't exist on project: 1 (404)

List all existing UDFs on a table

list_udfs(1)
name source

name source
STRING_LIST /** * Convert all my_list elements to strings. * * Examples: * STRING_LIST([0.2,-1.3,5]) = ['0.2','-1.3','5'] * * @param my_list A list. * @return my_string_list my_list converted to strings. */

function string_list(my_list) { MAP(my_list, (x) -> TO_STRING(x)) }

LIST_INSERT /**

  • Insert insert_list into my_list at index, pushing subsequent elements down

  • by the number of elements of insert_list.

  • Examples:

  • LIST_INSERT(RANGE(-3,3), RANGE(99,102), 3) = [-3,-2,-1,99,100,101,0,1,2]

  • @param my_list A list.

  • @param insert_list List to be inserted into my_list.

  • @param index Index at which insert_list will be inserted.

  • @return my_longer_list my_list with insert_list inserted at index. */

function list_insert(my_list, insert_list, index) { list_size = SIZE(my_list); insert_size = SIZE(insert_list); indices = RANGE(0, list_size + insert_size); MAP(indices, (i) -> IF(i < index, my_list[i], IF(i < index + insert_size, insert_list[i - index], my_list[i - insert_size]))) }

LIST_UNION /**

  • Union list1 and list2 by appending list2 to list1.

  • Identical to LIST_APPEND().

  • Examples:

  • LIST_UNION(RANGE(-3,3), RANGE(99,102)) = [-3,-2,-1,0,1,2,99,100,101]

  • @param list1 A list.

  • @param list2 List to be appended to list1.

  • @return my_longer_list list1 with list2 appended. */

function list_union(list1, list2) { LIST_INSERT(list1, list2, SIZE(list1)) }

LIST_CONCAT_WS /**

  • Concatenate the elements of my_list using delimiter.

  • Examples:

  • LIST_CONCAT_WS(‘,’,[3,2,1,11,-1,-20]) = ‘3,2,1,11,-1,-20’ // CSV numeric list.

  • @param delimiter String.

  • @param my_list List to be concatenated using delimiter.

  • @return my_list_concat_ws String with the elements of my_list concatenated, separated by delimiter. */

function list_concat_ws(delimiter, my_list) { REDUCE(MAP(my_list, (x) -> TO_STRING(x)), (x,y) -> CONCAT_WS(delimiter, x, y)) }

LISTS_TO_MAP /**

  • Turn two lists of equal-length keys and values into JSON then into a map.

  • Examples:

  • LISTS_TO_MAP([‘v’],[1]) = {‘v’:1}

  • LISTS_TO_MAP([‘a’,’b’,’c’],[1,2,3]) = {‘a’:1, ‘b’:2, ‘c’:3}

  • LISTS_TO_MAP([-2.0, 10, 40], [0,0,0]) = {‘-2’:0, ‘10’:0, ‘40’:0}

  • @param keys A list of keys.

  • @param values A list of values.

  • @return my_map keys and values combined and converted into a map. */

function lists_to_map(keys, values) { idx = RANGE(0,SIZE(keys)); output = MAP(idx, (i) -> CONCAT(‘”’, TO_STRING(keys[i]), ‘”:’, JSON(values[i])));

JSON_TO_MAP(CONCAT(‘{’, LIST_CONCAT_WS(‘,’,output), ‘}’)) }

MAP_KEYS /**

  • Retrieve a list of the keys of my_map.

  • Examples:

  • MAP_KEYS(JSON_TO_MAP(‘{“key1”:”value1”,”key2”:”value2”}’)) = [‘key1’,’key2’]

  • MAP_KEYS({‘foo’: 1, ‘bar’: 2}) = [‘bar’, ‘foo’]

  • MAP_KEYS({‘’: 1, ‘bar’: 2}) = [‘’, ‘bar’]

  • MAP_KEYS({‘’: 1, ‘’: 2}) = [‘’]

  • @param my_map A map, e.g., {cat=1, num=2}.

  • @return keys List of keys, e.g., [‘cat’,’num’]. */

function map_keys(my_map) { MAP_TO_LIST(my_map, (k,v) -> k) }

RANGE_STEP /**

  • Return a list of numbers from start to end in steps of step.

  • Examples:

  • RANGE_STEP(-2,10,2) = [-2,0,2,4,6,8]

  • RANGE_STEP(10,-2,-2) = [10,8,6,4,2,0]

  • start=MILLIS(TO_DATETIME(‘2020-01-01 00:00:00.000’));

  • end=MILLIS(TO_DATETIME('2020-01-06 00:00:00.001'));
    
  • step=86400000; // 1 day in milliseconds
    
  • RANGE_STEP(start, end, step)
    
  •   = [1577836800000, 1577923200000, 1578009600000, 1578096000000, 1578182400000, 1578268800000]
    
  • @param start Numeric scalar, e.g., -2.

  • @param end Numeric scalar, e.g., 10.

  • @param step Numeric scalar, e.g., 2. Nonzero allowed.

  • @return step_list Numeric list, e.g., [-2,0,2,4,6,8]. */

function range_step(start, end, step) { num_steps = (end - start) / step; steps = RANGE(0,CEIL(num_steps)); step_list = MAP(steps, (x) -> x * step + start); step_list }

TIMESTAMP_RANGE /**

  • Return a list of integer millisecond timestamps from start to

  • end in steps of step.

  • Examples:

  • start=’2020-01-01 00:00:00.000’;

  • end='2020-01-06 00:00:00.001';
    
  • step=86400000; // 1 day in milliseconds
    
  • TIMESTAMP_RANGE(start, end, step)
    
  •   = [1577836800000, 1577923200000, 1578009600000, 1578096000000, 1578182400000, 1578268800000]
    
  • TIMESTAMP_RANGE(‘2020-05-01’,’2020-05-06’,86400*1000)

  •   = [1588291200000, 1588377600000, 1588464000000, 1588550400000, 1588636800000]
    
  • @param start Datetime-parseable string, e.g., ‘2020-01-01 00:00:00.000’.

  • @param end Datetime-parseable string, e.g., ‘2020-02-02 00:00:00.001’.

  • @param step Numeric scalar in milliseconds.

  • @return step_list Millisecond timestamp list, e.g., [1588291200000, 1588377600000, …]. */

function timestamp_range(start, end, step) { start = MILLIS(TO_DATETIME(start)); end = MILLIS(TO_DATETIME(end)); RANGE_STEP(start, end, step) }

PERIODIC_BIN /**

  • Round the remainder of x into periodic bins (e.g., % 7, 24, 365 for weekly, hourly,

  • or annual use cases). Bins start at 0,1,2,….

  • Scale the periodic remainder of x % period integers num_bins bins. To

  • preserve or adjust the scale/units of period, do pre-/post-adjustments: e.g.,

  • scale * PERIODIC_BIN() + location.

  • Examples:

  • // Bin a number in (0,30) days.

  • PERIODIC_BIN(30 * 0.8, 1, 24) = 0 // Numeric hour of day.

  • PERIODIC_BIN(30 * 0.8, 7, 7) = 3 // Numeric day of week.

  • PERIODIC_BIN(30 * 0.8, 1, 4) = 0 // Four 6-hour blocks during each day.

  • @param x Scalar.

  • @param period Scalar.

  • @param num_bins Scalar.

  • @return x_bin The periodic integer bin that x falls into. */

function periodic_bin(x, period, num_bins) { FLOOR(num_bins * (x % period)/period ) }

BETWEEN /**

  • If start <= x < end, return 1. Otherwise return 0.

  • Examples:

  • start = 700; end = -700; // Days

  • reference_time = MILLIS(TO_DATETIME('2022-01-01 9:00:00.000'));
    
  • // Replace this expression...
    
  • IF(reference_time - end*24*3600*1000 &lt;= timestamp
    
  •    AND timestamp &lt; reference_time - start*24*3600*1000, 1, 0)
    
  • =
    
  • // With this more readable expression using the UDF.
    
  • BETWEEN(reference_time - timestamp,
    
  •         start*24*3600*1000,
    
  •         end*24*3600*1000)
    
  • BETWEEN(1, 0, 0) = 0

  • BETWEEN(0, 0, 0) = 0

  • BETWEEN(0, 0, 1) = 1

  • BETWEEN(0, 1, -1) = 0

  • BETWEEN(0, -1, 1) = 1

  • BETWEEN(‘b’, ‘a’, ‘c’) = 1

  • BETWEEN(‘a’, ‘b’, ‘c’) = 0

  • BETWEEN(NOW(), 0, 1) = 0

  • BETWEEN(NOW(), 0, null) = 0

  • BETWEEN(NOW(), 0, 2000000000000) = 1

  • BETWEEN(NOW(), null, 2000000000000) = 1

  • BETWEEN(NOW(), ‘’, ‘2000000000000’) = 1

  • BETWEEN(NOW(), 0, null) = 0

  • @param x Scalar.

  • @param start Scalar.

  • @param end Scalar.

  • @return is_x_between IF(start <= x < end, 1, 0). */

function between(x, start, end) { IF(start <= x AND x < end, 1, 0) }

HISTOGRAM_BIN /**

  • Count the number of elements of my_list in [lower_edge, upper_edge).

  • Examples:

  • HISTOGRAM_BIN([1,1,2,4],1,2) = 2

  • HISTOGRAM_BIN([1,1,2,4],1,2.01) = 3

  • HISTOGRAM_BIN([1,1,2,4],2,4) = 1

  • @param my_list A numeric list.

  • @param lower_edge Scalar.

  • @param upper_edge Scalar.

  • @return count Number of elements of my_list in [lower_edge, upper_edge). */

function histogram_bin(my_list, lower_edge, upper_edge) { count = SUM(MAP(my_list, (x) -> IF(BETWEEN(x, lower_edge, upper_edge), 1, 0))); count }

RANK /**

  • Retrieve the value of rank r from my_list. Rank is

  • determined by top as descending (TRUE) or ascending (FALSE).

  • Examples:

  • RANK(RANGE(90,101), 3, TRUE) = 98 // (3rd in […98,99,100])

  • // (3rd order statistic of 100 draws from random uniform.

  • RANK(RANDOM_LIST(100), 3, TRUE) > 0.8

  • @param my_list A list.

  • @param r Rank to retrieve in sorted list.

  • @param top Descending (TRUE) or ascending (FALSE).

  • @return r_value The value of the r^th-rank element of my_list */

function rank(my_list, r, top) { index = IF(top, SIZE(my_list) - r, r - 1); value = SORT(my_list)[index]; value }

LIST_APPEND /**

  • Append insert_list to my_list (at the end).

  • Examples:

  • LIST_APPEND(RANGE(-3,3), RANGE(99,102)) = [-3,-2,-1,0,1,2,99,100,101]

  • @param my_list A list.

  • @param insert_list List to be appended to my_list.

  • @return my_longer_list my_list with insert_list appended. */

function list_append(my_list, insert_list) { LIST_INSERT(my_list, insert_list, SIZE(my_list)) }

DROP_NULLS /**

  • Remove all null elements from my_list, reducing its length.

  • Examples:

  • DROP_NULLS([null]) = []

  • DROP_NULLS([2,2,4,4]) = [2,2,4,4]

  • DROP_NULLS([null,null,2,2,4,4]) = [2,2,4,4]

  • DROP_NULLS([‘’]) = [‘’]

  • DROP_NULLS([‘null’]) = [‘null’]

  • DROP_NULLS([‘a’, null, null]) = [‘a’]

  • DROP_NULLS([‘a’, null, ‘a’]) = [‘a’, ‘a’]

  • @param my_list A list, e.g., [null,null,2,2,4,4].

  • @return my_drop_nulls_list A list containing only the non-null elements, e.g., [2,2,4,4] */

function DROP_NULLS(my_list) { FILTER(my_list, (x) -> x!=null) }

WHICH /**

  • Return a list of indices of my_list elements that are truthy.

  • Examples:

  • WHICH([0,10,2,-4.3]) = [1,2,3]

  • WHICH([0,1,0,1]) = [1,3]

  • // Strings are false, except LOWER(‘tRuE’)=’true’.

  • WHICH([FALSE,’hi’,FALSE,’hi’]) = []

  • WHICH([‘tRuE’,TRUE,null]) = [0,1]

  • WHICH(null) = null

  • WHICH([]) = []

  • WHICH([null]) = []

  • WHICH(‘string’) = null

  • WHICH(‘tRuE’) = null

  • @param my_list Numeric list, e.g. [0,10,2,-4.3].

  • @return indices List of indices of elements with truthy values, e.g., [1,2,3]. */

function which(my_list) { // Return the list indices that are truthy. idx = RANGE(0,SIZE(my_list)); FILTER(idx, (x) -> IF(my_list[x], TRUE, null)) }

WHICH_MAX /**

  • Return the index corresponding to the maximum value of my_list.

  • Examples:

  • WHICH_MAX([0,10,2,-4.3]) = [1]

  • WHICH_MAX([1,3,4,5,2,-4]) = [3]

  • WHICH_MAX([]) = []

  • WHICH_MAX(null) = null

  • WHICH_MAX([null,-2]) = [1]

  • WHICH_MAX([-2,-2.0,null,-3]) = [0,1]

  • WHICH_MAX([‘c’,-2]) = [1]

  • WHICH_MAX([‘c’,’d’]) = []

  • @param my_list Numeric list, e.g. [0,10,2,-4.3].

  • @return index Numeric scalar, e.g., 1. */

function which_max(my_list) { // Return the index of the maximum value. max_value = AG_MAX(my_list); index = WHICH(MAP(my_list, (x) -> IF(x=max_value, 1, 0))); index }

IS_NUMERIC /**

  • Test whether x can be cast to numeric/double: TO_DOUBLE(x)!=null.

  • Examples:

  • IS_NUMERIC(‘c’) = FALSE

  • IS_NUMERIC([‘c’]) = FALSE

  • IS_NUMERIC({‘c’:1}) = FALSE

  • IS_NUMERIC(null) = FALSE

  • IS_NUMERIC(‘1’) = TRUE

  • IS_NUMERIC(‘-1.23’) = TRUE

  • IS_NUMERIC(1) = TRUE

  • IS_NUMERIC(0) = TRUE

  • IS_NUMERIC(-1) = TRUE

  • IS_NUMERIC(-1.23) = TRUE

  • @param x An object.

  • @return is_numeric A boolean denoting whether x can be cast to numeric/double. */

function IS_NUMERIC(x) { // If x can be cast to double without returning null. IF(TO_DOUBLE(x)=null, FALSE, TRUE) }

MAP_VALUES /**

  • Retrieve a list of the keys of my_map.

  • Examples:

  • MAP_VALUES(JSON_TO_MAP(‘{“key1”:”value1”,”key2”:”value2”}’)) = [‘value1’,’value2’]

  • MAP_VALUES({‘foo’: 1, ‘bar’: 2}) = [2, 1]

  • MAP_VALUES({‘’: 1, ‘bar’: 2}) = [1, 2]

  • MAP_VALUES({‘’: 1, ‘’: 2}) = [2]

  • @param my_map A map, e.g., {cat=1, num=2}.

  • @return keys List of values, e.g., [‘1’,’2’]. */

function map_values(my_map) { MAP_TO_LIST(my_map, (k,v) -> v) }

MODE /**

  • Return a list containing the mode or most common value(s), in case

  • of ties, of my_list.

  • Examples:

  • MODE([1,2,2,2,4,4,4]) = [2,4]

  • MODE([1,2.0,2,2,4,4,4]) = [2,4]

  • MODE([‘1’,’2’,’2’,’2’,’4’,’4’,’4’]) = [2,4]

  • MODE([‘1’,’L2’,’L2’,’L2’,’4’,’4’,’4’]) = [4,’L2’]

  • MODE([1]) = [1]

  • MODE(null) = [] // TODO: MODE() appears to ignore nulls by default.

  • MODE([null]) = [] // TODO: MODE() appears to ignore nulls by default.

  • MODE([‘’]) = [‘’]

  • MODE([‘c’]) = [‘c’]

  • MODE([‘c’,1]) = [1,’c’]

  • MODE([1, ‘’]) = [‘’,1]

  • MODE([]) = []

  • MODE([1, null]) = [1]

  • @param my_list A list, e.g., [1,2,2,2,4,4,4].

  • @return mode A list containing the mode or most common value of my_list. */

function mode(my_list) { my_map_hist = COUNT_UNIQ(STRING_LIST(my_list));

mode_indices = WHICH_MAX(MAP_VALUES(my_map_hist)); my_map_keys = MAP_KEYS(my_map_hist); values = MAP(mode_indices, (x) -> v = my_map_keys[x]; IF(IS_NUMERIC(v), TO_DOUBLE(v), v)); values }

MODE_NS /**

  • Return a list containing the mode or most common value(s), in case

  • of ties, of my_list, ignoring nulls.

  • Examples:

  • MODE([null,null,2,2,2,4,4,4]) = [2,4]

  • MODE([null,null,1,2.0,2,2,4,4,4]) = [2,4]

  • MODE([null,null,’1’,’2’,’2’,’2’,’4’,’4’,’4’]) = [2,4]

  • MODE([null,null,’1’,’L2’,’L2’,’L2’,’4’,’4’,’4’]) = [4,’L2’]

  • MODE(null) = [] // TODO: MODE() appears to ignore nulls by default.

  • MODE([null]) = [] // TODO: MODE() appears to ignore nulls by default.

  • MODE([‘’]) = [‘’]

  • MODE([‘c’]) = [‘c’]

  • MODE([‘c’,1]) = [1,’c’]

  • MODE([]) = []

  • @param my_list A list, e.g., [null,null,2,2,2,4,4,4].

  • @return mode A list containing the mode or most common value of my_list. */

function mode_ns(my_list) { MODE(DROP_NULLS(my_list)) }

BIN_INDEX /**

  • Return the index of the bin_edges where x is between bin_edges[x_index] and

  • bin_edges[x_index + 1]. Return -1 if x is less than bin_edges[0].

  • Examples:

  • BIN_INDEX(2, [-2,0,3,10,20]) = 1 // (index) because 0<=2<3.

  • BIN_INDEX(30 * 0.8 - 5, [-2,0,3,10,20]) = 3

  • BIN_INDEX(null, [0,1]) = -1

  • BIN_INDEX(‘’, [0,1]) = -1

  • BIN_INDEX(‘abc’, [0,1]) = 1

  • BIN_INDEX(‘-1’, [0,1]) = -1

  • BIN_INDEX(‘2’, [0,1]) = 1

  • BIN_INDEX(‘0’, [null,1]) = 0

  • BIN_INDEX(‘0’, [null,null]) = 1

  • BIN_INDEX(‘0’, [‘’,’’]) = 1

  • BIN_INDEX(‘0’, [-100,-1]) = 1

  • BIN_INDEX(‘0’, [-1,-100]) = 1 // TODO: -1?

  • BIN_INDEX(‘1’, [1,3,2]) = 0 // TODO: -1?

  • @param x Scalar.

  • @param base A pre-sorted (increasing) numeric list, e.g., [-2,0,3,10,40].

  • @return x_index The log-bin that x falls into in units of x. */

function bin_index(x, bin_edges) { num_edges = SIZE(bin_edges); idx=RANGE(0,num_edges - 1); indices = FILTER(idx, (i) -> BETWEEN(x, bin_edges[i], bin_edges[i + 1]));

// Deal with the first and last bin edges. IF(x < bin_edges[0], -1, IF(bin_edges[num_edges - 1] <= x, num_edges - 1, indices[0])) }

LIST_INSERT_VALUE /**

  • Insert value into my_list at index, pushing subsequent elements down by one

  • (+1) index.

  • Examples:

  • LIST_INSERT_VALUE(RANGE(-3,3), 100, 3) = [-3,-2,-1,100,0,1,2]

  • LIST_INSERT_VALUE([], null, 0) = [null]

  • LIST_INSERT_VALUE([], ‘’, 0) = [‘’]

  • LIST_INSERT_VALUE([null], ‘’, 0) = [‘’, null]

  • LIST_INSERT_VALUE([‘’], null, 0) = [null, ‘’]

  • LIST_INSERT_VALUE([‘’], ‘’, 0) = [‘’, ‘’]

  • LIST_INSERT_VALUE([0,1], null, 0) = [null, 0, 1]

  • LIST_INSERT_VALUE([], ‘a’, 0) = [‘a’]

  • LIST_INSERT_VALUE([], [], 0) = [[]]

  • @param my_list A list.

  • @param value Value to be inserted into my_list.

  • @param index Index at which value will be inserted.

  • @return my_longer_list my_list with value inserted at index. */

function list_insert_value(my_list, value, index) { list_size = SIZE(my_list); indices = RANGE(0,list_size + 1); MAP(indices, (i) -> IF(i < index, my_list[i], IF(i = index, value, my_list[i - 1]))) }

LIST_PREPEND_VALUE /**

  • Append value to my_list (at the beginning), pushing subsequent elements down

  • by one (+1) index.

  • Examples:

  • LIST_PREPEND_VALUE(RANGE(-3,3), 100) = [100,-3,-2,-1,0,1,2]

  • LIST_PREPEND_VALUE([], null) = [null]

  • LIST_PREPEND_VALUE([], ‘’) = [‘’]

  • LIST_PREPEND_VALUE([null], ‘’) = [‘’, null]

  • LIST_PREPEND_VALUE([‘’], null) = [null, ‘’]

  • LIST_PREPEND_VALUE([‘’], ‘’) = [‘’, ‘’]

  • LIST_PREPEND_VALUE([0,1], null) = [null, 0, 1]

  • LIST_PREPEND_VALUE([], ‘a’) = [‘a’]

  • LIST_PREPEND_VALUE([], []) = [[]]

  • @param my_list A list.

  • @param value Value to be prepended to my_list.

  • @return my_longer_list my_list with value prepended. */

function list_prepend_value(my_list, value) { LIST_INSERT_VALUE(my_list, value, 0) }

BIN_VALUE /**

  • Return the value of the largest bin_edge <= x.

  • For x < the smallest bin_edge, return smallest bin_edge - 1.

  • Examples:

  • BIN_VALUE(2, [-2,0,3,10,20]) = 0 // (value) because 0<=2<3.

  • BIN_VALUE(30 * 0.8 - 5, [-2,0,3,10,20]) = 10

  • BIN_VALUE(0, [0, 1]) = 0

  • BIN_VALUE(-2, [0, 1]) = -1

  • BIN_VALUE(1, [0, 1]) = 1

  • BIN_VALUE(2, [0, 1]) = 1

  • BIN_VALUE(null, [0, 1]) = -1

  • BIN_VALUE(0, [3, null]) = 2

  • BIN_VALUE(0, [10, 1]) = 9 // TODO: bins should be monotonic?

  • @param x The numerical value you would like to bin.

  • @param bin_edges A list of increasing numbers that define the smaller (inclusive) edges of the bins.

  • @return bin_edge The value of the largest bin_edge <= x. */

function bin_value(x, bin_edges) { // TODO: Check if redundant with WHICH_BIN(). // bin_edges = [0, 1, 2, 3, 4, 5, 8, 10]; bin_index = BIN_INDEX(x, bin_edges);

// Set the bin for ‘less than bin_edges[0]’ to bin_edges[0] - 1, since // we can’t know the true minimum value of all numbers in the bin. bin_edges = LIST_PREPEND_VALUE(bin_edges, bin_edges[0] - 1);

// Return the value of the bin, not the index. // Increase the bin_index by 1—BIN_INDEX() returns -1, 0, … SIZE(bin_index) - 1. // We prepended a value for the smallest bin, so we shift up all indices by 1. bin_edges[bin_index + 1] }

STR /**

  • Rename to_string() as just str().

  • @param x Scalar or string.

  • @return y Numeric scalar. */

function str(x) { TO_STRING(x) }

WHICH_BIN_INDEX /**

  • Return the maximum index of the bin_edges where x is between bin_edges[x_index]

  • and bin_edges[x_index + 1] (smaller and larger edges of the bin). Return -1 if x is

  • less than bin_edges[0].

  • Note: If bin_edges is not sorted, return the largest index where bid_edges[index]<=x.

  • Examples:

  • WHICH_BIN_INDEX(2, [-2,0,3,10,20]) = 1 // (index) because 0<=2<3.

  • WHICH_BIN_INDEX(30 * 0.8 - 5, [-2,0,3,10,20]) = 3

  • WHICH_BIN_INDEX(0, [1,2]) = -1

  • WHICH_BIN_INDEX(1, [1,2]) = 0

  • WHICH_BIN_INDEX(2, [1,2]) = 1

  • WHICH_BIN_INDEX(3, [1,2]) = 1

  • WHICH_BIN_INDEX(1, [2,1]) = 1 // TODO: -1. Bins not sorted.

  • WHICH_BIN_INDEX(1, [0,2,1]) = 2 // TODO: -1. Bins not sorted.

  • WHICH_BIN_INDEX(1, [null,1,2]) = 1

  • WHICH_BIN_INDEX(1, [1,null,2]) = 1 // TODO: 0. null should be ignored.

  • WHICH_BIN_INDEX(1, [1,2,null]) = 2 // TODO: 0. null should be ignored.

  • WHICH_BIN_INDEX(1, [‘c’,1,2]) = 1

  • WHICH_BIN_INDEX(1, [1,’c’,2]) = 0

  • WHICH_BIN_INDEX(1, [1,2,’c’]) = 0

  • WHICH_BIN_INDEX(null, [1,2]) = -1 // TODO: null. null should be null.

  • WHICH_BIN_INDEX([], [1,2]) = 1 // TODO: null. [] should be null. Or throw an error.

  • WHICH_BIN_INDEX(1, []) = -1 // TODO: null or -1? No bins should be null, or less than the smallest bin.

  • WHICH_BIN_INDEX(1, null) = -1 // TODO: null or -1? No bins should be null, or less than the smallest bin.

  • @param x Scalar.

  • @param bin_edges A pre-sorted (increasing) numeric list, e.g., [-2,0,3,10,40].

  • @return x_index The index of the bin that x falls into in units of x. */

function which_bin_index(x, bin_edges) { COALESCE(AG_MAX(WHICH(MAP(bin_edges, (b) -> IF(b <= x, 1, 0)))), -1) }

WHICH_BIN /**

  • Retrieve the left (smaller) edge of the bin where bin_edges[k] <= x < bind_edges[k+1].

  • Return the value bin_edge where x is between bin_edges[k] and bin_edges[k + 1]

  • (smaller and larger edges of the bin). Return bin_edges[0] - 1 if x is less than

  • bin_edges[0].

  • Examples:

  • WHICH_BIN(2, [-2,0,3,10,20]) = 0 // (value) because 0<=2<3.

  • WHICH_BIN(30 * 0.8 - 5, [-2,0,3,10,20]) = 10

  • WHICH_BIN(0, [10,20]) = 9

  • WHICH_BIN(10, [10,20]) = 10

  • WHICH_BIN(20, [10,20]) = 20

  • WHICH_BIN(30, [10,20]) = 20

  • WHICH_BIN(10, [20,10]) = 10 // TODO: 10? Bins not sorted.

  • WHICH_BIN(10, [0,20,10]) = 10 // TODO: 10? Bins not sorted.

  • WHICH_BIN(10, [null,10,20]) = 10

  • WHICH_BIN(10, [10,null,20]) = null // TODO: 10. null should be ignored.

  • WHICH_BIN(10, [10,20,null]) = null // TODO: 10. null should be ignored.

  • WHICH_BIN(10, [‘c’,10,20]) = 10

  • WHICH_BIN(10, [10,’c’,20]) = 10

  • WHICH_BIN(10, [10,20,’c’]) = 10

  • WHICH_BIN(null, [10,20]) = 9 // TODO: null. null should be null.

  • WHICH_BIN([], [10,20]) = 20 // TODO: null. [] should be null. Or throw an error.

  • WHICH_BIN(10, []) = null // No bins, no bin value.

  • WHICH_BIN(10, null) = null // No bins, no bin value.

  • @param x Scalar.

  • @param base A pre-sorted (increasing) numeric list, e.g., [-2,0,3,10,40].

  • @return bin_edge The bin that x falls into in units of x. */

function which_bin(x, bin_edges) { index = WHICH_BIN_INDEX(x, bin_edges); IF(index>=0, bin_edges[index], bin_edges[0] - 1) }

LIST_SUB /**

  • Subset my_list to the elements in indices.

  • Examples:

  • LIST_SUB([‘a’,’b’,’c’,’d’], [0,3]) = [‘a’,’d’]

  • LIST_SUB(null,[]) = null

  • LIST_SUB([],null) = null

  • [][null] = null

  • null[null] = null

  • null[[]] = null

  • @param my_list A list.

  • @param indices A list of integer indices.

  • @return list_subset List with only elements in indices. */

function list_sub(my_list, indices) { IF(my_list=null, null, MAP(indices, (x) -> my_list[x])) }

MOST_RECENT_TIES /**

  • Find and return a list of the most recent event(s) of a given type.

  • Examples:

  • // No recent records from the first record.

  • MOST_RECENT_TIES(timestamp, ‘request’) = []

  • // Current record is the most recent from itself + 1 millisecond.

  • SIZE(MOST_RECENT_TIES(timestamp+1, type)) = 1

  • // First record is the most recent from itself + 1 millisecond.

  • MOST_RECENT_TIES(timestamp+1, type) = [${@timeline.events[0]}]

  • // First record has no events preceding it.

  • MOST_RECENT_TIES(timestamp, type) = []

  • // First record is not most recent to itself.

  • MOST_RECENT_TIES(timestamp, type) != timeline.events[0] = TRUE

  • MOST_RECENT_TIES(timestamp - 1, type) != timeline.events[0] = TRUE

  • // Erroneous inputs return the empty list (no recent event).

  • MOST_RECENT_TIES(timestamp, ‘INVALID_TYPE’) = []

  • MOST_RECENT_TIES(timestamp, ‘’) = []

  • MOST_RECENT_TIES(timestamp, null) = []

  • MOST_RECENT_TIES(null, type) = []

  • MOST_RECENT_TIES(null, null) = []

  • MOST_RECENT_TIES(0, type) = []

  • // // Collect the third event.

  • // third_event = timeline.events[2];

  • // // Overwrite the first two events with the third event’s timestamp & type.

  • // ignore_output_var = MAP([0,1],

  • // (i) -> SET_PROPERTY(timeline.events[i],’type’,third_event.type);

  • // SET_PROPERTY(timeline.events[i],’timestamp’,third_event.timestamp);

  • // ‘ignored_output’);

  • //

  • // // See whether there are multiple most recent events

  • // // and whether the 3rd one is last in the list.

  • // recent_events = MOST_RECENT_TIES(third_event.timestamp + 1, third_event.type);

  • // (recent_events[2] = third_event AND SIZE(recent_events) = 3)

  • TRUE

  • @param current_time Timestamp in milliseconds.

  • @param filter_type String denoting the event type, e.g., ‘request’.

  • @return event_list_including_ties A list of events to accommodate potential ties. */

function most_recent_ties(current_time, filter_type) { filtered_events = FILTER(timeline.events, (x) -> x.type=filter_type); time_filtered_events = FILTER(filtered_events, (x) -> x.timestamp < current_time); filtered_timestamps = MAP(time_filtered_events, (x) -> x.timestamp); event_list_including_ties = MAP(WHICH_MAX(filtered_timestamps), (x) -> time_filtered_events[x]); event_list_including_ties }

MOST_RECENT /**

  • Find and return the most recent event of a given type using a uniform random

  • tie-breaker.

  • Examples:

  • // No recent records from the first record.

  • MOST_RECENT(timestamp, ‘request’) = null

  • // First record is the most recent from itself + 1 millisecond.

  • MOST_RECENT(timestamp+1, type) = ${@timeline.events[0]}

  • // First record has no events preceding it.

  • MOST_RECENT(timestamp, type) = null

  • // First record is not most recent to itself.

  • MOST_RECENT(timestamp, type) != timeline.events[0]

  • MOST_RECENT(timestamp - 1, type) != timeline.events[0]

  • // Erroneous inputs return null (no recent event).

  • MOST_RECENT(timestamp, ‘INVALID_TYPE’) = null

  • MOST_RECENT(timestamp, ‘’) = null

  • MOST_RECENT(timestamp, null) = null

  • MOST_RECENT(null, type) = null

  • MOST_RECENT(null, null) = null

  • MOST_RECENT(0, type) = null

  • // // Collect the third event.

  • // third_event = timeline.events[2];

  • // // Overwrite the first two events with the third event’s timestamp & type.

  • // ignore_output_var = MAP([0,1],

  • // (i) -> SET_PROPERTY(timeline.events[i],’type’,third_event.type);

  • // SET_PROPERTY(timeline.events[i],’timestamp’,third_event.timestamp);

  • // ‘ignored_output’);

  • //

  • // // Confirm that the most recent event from 1 millisecond after the third event

  • // // is the third event and that the third event is not most recent from itself.

  • // // TODO: Swap the first logical below with = third_event. Go backwards in the

  • // // list of events, rather than grabbing the first tie.

  • // (MOST_RECENT(third_event.timestamp + 1, third_event.type) = timeline.events[0] // first_event

  • // AND MOST_RECENT(third_event.timestamp, third_event.type) != third_event)

  • TRUE

  • @param current_time Timestamp in milliseconds.

  • @param filter_type String denoting the event type, e.g., ‘request’.

  • @return most_recent_event A single event using a uniform random tie-breaker. */

function most_recent(current_time, filter_type) { most_recent_events = MOST_RECENT_TIES(current_time, filter_type); idx = FLOOR(RANDOM() * SIZE(most_recent_events)); most_recent_event = most_recent_events[idx]; most_recent_event }

PREV /**

  • Return the next event with the same type as my_event.

  • Examples:

  • NEXT(self) != self // The next event is not the current event.

  • NEXT(timeline.events[0]) = NEXT_EVENT_(timeline.events[0].timestamp, timeline.events[0].type)

  • @param my_event An event from timeline.events or the current in-focus event, event / self.

  • @return next_event The next event of the same type as my_event. */

function prev(my_event) { MOST_RECENT(my_event.timestamp, my_event.type) }

// Future version. // function prev(my_event) { // idx = FILTER(RANGE_STEP(my_event.index, -1, -1), // (i) -> timeline.events[i].type = my_event.type, // (n,x) -> n > 1)[0] // timeline.events[idx] // }

CUMSUM /**

  • Return a list of the partial cumulative sum of the elements of my_list.

  • Examples:

  • CUMSUM(RANGE(0,6)) = [0,1,3,6,10,15]

  • CUMSUM([null]) = [null] // TODO: [null] or null?

  • CUMSUM([null, null]) = [null, null] // TODO: [null] or null?

  • CUMSUM([‘’, null]) = [null,null]

  • CUMSUM([‘’, ‘’]) = [null,null]

  • CUMSUM([1, 2, ‘3’]) = [1,3,6.0]

  • CUMSUM([1, 2, null]) = [1, 3, null]

  • CUMSUM([1, null, 2]) = [1, null, null]

  • CUMSUM([1, 2, ‘’]) = [1, 3, null]

  • CUMSUM([‘a’]) = [null]

  • CUMSUM([‘1/0’]) = [null]

  • @param my_list A numeric list.

  • @return my_list_cumsum The partial/cumulative sum of my_list for elements 0 through index. */

function cumsum(my_list) { my_sum = 0; // Initialize in the surrounding context. // my_sum locally increments using the surrounding context of ‘my_sum’ to initialize. MAP(my_list, (x) -> my_sum = my_sum + x; my_sum) // my_sum = 0 since all of the changes to my_sum were local inside the lambda expression. }

EXPAND_COLS_FORMAT /**

  • Return a list of [prefix0:value[0], prefix1:value[1], …]. Such lists of

  • key:value pairs can be easily expanded into multiple columns when building

  • datasets.

  • prefix=’bs’ is a common use case when working with bootstrap

  • ensembles.

  • Examples:

  • EXPAND_COLS_FORMAT(‘bs’,[-1,0,1]) = [‘bs0:-1.0’,’bs1:0’,’bs2:1’]

  • EXPAND_COLS_FORMAT(‘z’,[‘A’,’B’,’C’]) = [‘z0:A’,’z1:B’,’z2:C’]

  • @param prefix An input string such as ‘bs’.

  • @param values A list of values to be prefixed and enumerated into key-value pairs.

  • @return expandable_list A list of key:value pairs compatible with expand_columns. */

function expand_cols_format(prefix, values) { // List of [bs0:values[0], bs1:values[1], …] for prefix=’bs’. MAP(RANGE(0,SIZE(values)), (i) -> CONCAT(prefix, i, ‘:’, values[i])) }

DATETIME_RANGE /**

  • Return a list of numbers from start to end in steps of step.

  • Examples:

  • start=’2020-01-01 00:00:00.000’;

  • end='2020-01-03 00:00:00.001';
    
  • step=86400000; // 1 day in milliseconds
    
  • DATETIME_RANGE(start, end, step)
    
  •   = ['2020-01-01 00:00:00', '2020-01-02 00:00:00', '2020-01-03 00:00:00']
    
  • // Without RANGE_STEP().

  • start=MILLIS(TO_DATETIME(‘2020-01-01 00:00:00.000’));

  • end=MILLIS(TO_DATETIME('2020-01-03 00:00:00.001'));
    
  • step=86400000; // 1 day in milliseconds
    
  • num_steps=(end-start) / step;
    
  • MAP(RANGE(0,CEIL(num_steps)),
    
  •   (x) -&gt; FROM_UNIXTIME(x * step + start))
    
  •   = ['2020-01-01 00:00:00', '2020-01-02 00:00:00', '2020-01-03 00:00:00']
    
  • @param start Datetime-parseable string, e.g., ‘2020-01-01 00:00:00.000’.

  • @param end Datetime-parseable string, e.g., ‘2020-02-02 00:00:00.001’.

  • @param step Numeric scalar in milliseconds.

  • @return step_list Datetime string list, e.g., [‘2020-05-01 00:00:00’, ‘2020-05-02 00:00:00’, …]. */

function datetime_range(start, end, step) { start = MILLIS(TO_DATETIME(start)); end = MILLIS(TO_DATETIME(end)); MAP(RANGE_STEP(start, end, step), (x) -> FROM_UNIXTIME(x)) }

ANY /**

  • Return TRUE if any element of my_list is truthy.

  • Examples:

  • ANY([0,10,2,-4.3]) = TRUE

  • ANY([1,3,4,5,2,-4]) = TRUE

  • ANY([0]) = FALSE

  • ANY([‘tRuE’]) = TRUE

  • ANY([‘hi’]) = FALSE // All other strings are false.

  • ANY([null]) = FALSE

  • ANY([]) = FALSE

  • ANY(null) = null

  • @param my_list List, e.g. [0,10,2,-4.3].

  • @return any_true Boolean */

function any(my_list) { any_true = False; out = MAP(my_list, (x) -> any_true = any_true OR x; any_true); // Always return False for empty lists, []. IF(SIZE(my_list)=0, False, out[-1]) }

WHICH_MIN /**

  • Return the index corresponding to the minimum value of my_list.

  • Examples:

  • WHICH_MIN([0,10,2,-4.3]) = [3]

  • WHICH_MIN([1,3,4,5,2,-4]) = [5]

  • WHICH_MIN([]) = []

  • WHICH_MIN(null) = null

  • WHICH_MIN([null,-2]) = [1]

  • WHICH_MIN([-2,-2.0,null,-3]) = [3]

  • WHICH_MIN([‘c’,-2]) = [1]

  • WHICH_MIN([‘c’,’d’]) = []

  • @param my_list Numeric list, e.g. [0,10,2,-4.3].

  • @return index Numeric scalar, e.g., 3. */

function which_min(my_list) { min_value = AG_MIN(my_list); index = WHICH(MAP(my_list, (x) -> IF(x=min_value, 1, 0))); index }

NEXT_EVENT_TIES /**

  • Find and return a list of the next event(s) of a given type.

  • Examples:

  • // There should be a next event of type=’request’ after the first record.

  • NEXT_EVENT_TIES(timestamp, ‘request’) != []

  • // Current record is the next event from itself - 1 millisecond.

  • SIZE(NEXT_EVENT_TIES(timestamp-1, type)) >= 1

  • // First record is the next event from itself - 1 millisecond.

  • NEXT_EVENT_TIES(timestamp-1, type) = [timeline.events[0]]

  • @param current_time Timestamp in milliseconds.

  • @param filter_type String denoting the event type, e.g., ‘request’.

  • @return event_list_including_ties A list of events to accommodate potential ties. */

function next_event_ties(current_time, filter_type) { filtered_events = FILTER(timeline.events, (x) -> x.type=filter_type); time_filtered_events = FILTER(filtered_events, (x) -> x.timestamp > current_time); filtered_timestamps = MAP(time_filtered_events, (x) -> x.timestamp); event_list_including_ties = MAP(WHICH_MIN(filtered_timestamps), (x) -> time_filtered_events[x]); event_list_including_ties }

NEXT_EVENT_ /**

  • Deprecated: use next_event(filter_type)

  • Find and return the next event of a given type using a uniform random tie-breaker.

  • Examples:

  • // There should be a next event of type=’request’ after the first record.

  • NEXT_EVENT_(timestamp, ‘request’) != []

  • // First record is the next event from itself - 1 millisecond.

  • NEXT_EVENT_(timestamp-1, type) = ${@timeline.events[0]}

  • @param current_time Timestamp in milliseconds.

  • @param filter_type String denoting the event type, e.g., ‘request’.

  • @return next_event A single event using a uniform random tie-breaker. */

function next_event_(current_time, filter_type) { next_events = NEXT_EVENT_TIES(current_time, filter_type); idx = FLOOR(RANDOM() * SIZE(next_events)); next_event = next_events[idx]; next_event }

INT_LIST /**

  • Takes a list of numbers and returns a list with the numbers

  • converted to integers.

  • Examples:

  • INT_LIST([0.2,-1.3,5]) = [0,-1,5] // Toward zero.

  • @param my_list A list of numbers.

  • @return my_int_list my_list converted to integers. */

function int_list(my_list) { MAP(my_list, (x) -> TO_INT(x)) }

LIST_REPLACE /**

  • Replace the elements of my_list with replace_list starting

  • at index. If replace_list is longer than the remaining

  • elements of my_list, allow overflow, returning a longer list.

  • Examples:

  • LIST_REPLACE(RANGE(-3,3), RANGE(99,102), 3) = [-3,-2,-1,99,100,101]

  • @param my_list A list.

  • @param replace_list Replacement list elements.

  • @param index Starting index for replacing my_list elements with replace_list.

  • @return my_replace_list my_list with replacements by replace_list starting at index. */

function list_replace(my_list, insert_list, index) { list_size = SIZE(my_list); insert_size = SIZE(insert_list); indices = RANGE(0, index + insert_size); MAP(indices, (i) -> IF(i < index, my_list[i], IF(i < index + insert_size, insert_list[i - index], my_list[i]))) }

IN /**

  • Return TRUE if x is equal to any element of my_list.

  • Examples:

  • IN(1, [‘a’,’b’,’c’]) = False

  • IN(‘b’, [‘a’,’b’,’c’,’c’]) = True

  • IN(1, [1,2,3]) = True

  • IN(1.0, [1,2,3]) = True

  • IN(1.2, [1.2,2,3]) = True

  • NOT IN(1, [1,2,3]) = False // NOT operator

  • NOT(IN(1, [1,2,3])) = False // NOT() UDF

  • IN(1.0, null) = null

  • IN(1.0, []) = False

  • IN(1.0, [null]) = False

  • IN(null, [null]) = True

  • IN(null, [1,2,3]) = False

  • IN(null, null) = null

  • @param my_list List, e.g. [0,10,2,-4.3].

  • @return x_in_list Boolean */

function IN(x,list) { IF(list=null, null, ANY(MAP(list, (v) -> x=v))) }

INV_EXP /**

  • Return the quantile q of the exponential distribution of the probability p.

  • Examples:

  • INV_EXP(0.5) = 0.6931471805599453 // Median = natLog(2)

  • @param p Numeric scalar, e.g., 0.5.

  • @return q Numeric scalar, e.g., natLog(2)=0.69314718056. */

function inv_exp(p) { // Inverse CDF (quantile function) of the exponential distribution. -1 * LOG(1 - p, EXP(1)) }

LIST_REMOVE_VALUE /**

  • Remove the element of my_list at index, pulling subsequent

  • elements up by one (-1) index.

  • Examples:

  • LIST_REMOVE_VALUE(RANGE(-3,3), 3) = [-3,-2,-1,1,2]

  • @param my_list A list.

  • @param index Index that value will replace.

  • @return my_remove_list my_list with the value of at index removed. */

function list_remove_value(my_list, index) { list_size = SIZE(my_list); indices=LIST_UNION(RANGE(0,index), RANGE(index + 1,list_size)); LIST_SUB(my_list, indices) }

STDEV_BOOTSTRAP /**

  • Computes the standard deviation of the list of values, omitting the first

  • (0th) element, presuming it comes from the base model and, hence, is not a

  • bootstrap iteration.

  • Examples:

  • STDEV_BOOTSTRAP([-1,0,1,2]) = STDEV([0,1,2])

  • @param values A list of values.

  • @return stdev Standard deviation of the elements of values, except the first. */

function STDEV_BOOTSTRAP(values) { // Remove the 0th index, the main model, which is not one of the bootstraps. STDEV(LIST_REMOVE_VALUE(values,0)) }

STRLEN /**

  • Return the number of characters in my_string.

  • Examples:

  • STRLEN(‘Hello World!’) = 12

  • @param my_string A string.

  • @return num_chars Integer number of characters in my_string. */

function strlen(my_string) { // TODO: This should use a Java built-in function rather than counting regex matches. SIZE(STRING_SPLIT(my_string,’’)) }

SUM_NS /**

  • Return the sum of my_list, ignoring nulls.

  • Examples:

  • SUM_NS([null,null,2,2,2,4,4,4]) = 18

  • SUM_NS(null) = null

  • SUM_NS([null]) = 0

  • SUM_NS([‘’]) = null // TODO: 0. Strings should be ignored.

  • SUM_NS([‘c’]) = null // TODO: 0. Strings should be ignored.

  • SUM_NS([‘c’,1]) = null // TODO: 1. Strings should be ignored.

  • SUM_NS([]) = 0

  • @param my_list A list, e.g., [null,null,2,2,2,4,4,4].

  • @return sum The sum of non-null elements of my_list. */

function sum_ns(my_list) { SUM(DROP_NULLS(my_list)) }

ZERO_IMPUTE /**

  • Replace all null values of my_list with zero.

  • Examples:

  • ZERO_IMPUTE([null,null,2,4]) = [0,0,2,4]

  • @param my_list A list, e.g., [null,null,2,4].

  • @return my_imputed_list A list with nulls imputed using zero. */

function zero_impute(my_list) { MAP(my_list, (x) -> COALESCE(x,0)) }

PARSE_KV /**

  • Parse a key:value pair and return a map {‘cat’:’key’, ‘num’:value}. The UDF handles

  • certain cases where the key or value are empty. For example, if k:v is only v,

  • assume v is numerical.

  • Examples:

  • PARSE_KV(‘key:3.0’) = {‘cat’:’key’, ‘num’:3.0}

  • PARSE_KV(‘key:’) = {‘cat’:’c’, ‘num’:1.0}

  • PARSE_KV(‘key’) = {‘cat’:’c’, ‘num’:1.0}

  • PARSE_KV(‘3.0’) = {‘cat’:’c’, ‘num’:3.0}

  • PARSE_KV(3.0) = {‘cat’:’c’, ‘num’:3.0}

  • PARSE_KV(‘key:3.0:1’) = {‘cat’:’key’, ‘num’:3.0}

  • PARSE_KV(‘key:key2:1’) = {‘cat’:’key’, ‘num’:1.0} // Default numeric since ‘key2’ is nonnumeric.

  • @param kv Key-value pair string, e.g., key:3.0, where the value is numerical.

  • @return kv_map A map with {‘cat’:’key’, ‘num’:value}. */

function PARSE_KV(kv) { // Parse a key:value pair and return a map {‘cat’:’key’, ‘num’:value}. default_categorical=’c’; default_numerical=1.0;

key_values = STRING_SPLIT(TO_STRING(kv), ‘:’); // Split k:v.

// Try to find a categorical key in k:v—default to ‘c’. categorical = IF(SIZE(key_values)>1, key_values[0], default_categorical);

// Multiply by the value in the categorical:numerical k:v pair. // If k:v is only v, assume it is numerical. idx = IF(SIZE(key_values) > 2, 1, -1); // If ‘k:v1:v2’ or ‘k1:k2:v’, just use the second key. numerical = COALESCE(TO_DOUBLE(key_values[idx]), default_numerical);

// Create map: {cat:key, num:value}. kv_map = {‘cat’:categorical, ‘num’:numerical}; kv_map }

PARSE_KV_NUM /**

  • Return the numerical (value) component of a key-value pair string kv.

  • Examples:

  • PARSE_KV_NUM(‘key:3.0’) = 3.0

  • @param kv Key-value pair string, e.g., ‘key:3.0’, where the value is numerical.

  • @return num The value component of kv. */

function parse_kv_num(kv) { TO_DOUBLE(MAP_VALUE(PARSE_KV(kv),’num’)) }

SUM_MAP /**

  • Return the sum of values of a map of key=value pairs.

  • Examples:

  • my_map = JSON_TO_MAP(‘{“key1”:10,”key2”:2}’);

  • SUM_MAP(my_map) = 'sum:12.0'
    
  • my_list = [1,1,2,2,3,3,3,3];

  • my_map = COUNT_UNIQ(INT_LIST(my_list));
    
  • SUM_MAP(my_map) = 'sum:8.0'
    
  • @param my_map A map, e.g., {cat=1, num=2}.

  • @return kv_string A key-value pair as a string, e.g., num:2. */

function sum_map(my_map) { // SUM(my_list) // Works for lists. // SUM(my_map) // Doesn’t work for maps.

map_string = MAP_TO_LIST(my_map, (k,v) -> CONCAT_WS(‘:’,k,v));

REDUCE(map_string, (x,z) -> CONCAT_WS(‘:’, ‘sum’, PARSE_KV_NUM(x) + PARSE_KV_NUM(z))) }

LIST_MINUS /**

  • Elementwise list subtraction/’minus’: list1 - list2.

  • Examples:

  • LIST_MINUS([1,-1],[1,1]) = [0,-2]

  • @param list1 A list of numbers.

  • @param list2 A list of numbers.

  • @return list_out list1 - list2 */

function list_minus(list1, list2) { MAP(RANGE(0,SIZE(list1)), (i) -> list1[i] - list2[i]) }

COUNT /**

  • Return the number of elements of my_list. Alias for SIZE().

  • @param my_list A list.

  • @return count Integer number of elements in my_list. */

function count(my_list) { // Alias for SIZE(). SIZE(my_list) }

WINDOW_EVENTS /**

  • Return a list of all events of a given event_type occurring between

  • days_start and days_end, relative to current_time. Included

  • events have timestamps between current_time + (days_start, days_end).

  • Examples:

  • // For testing with the first event, no records fall in the window.

  • WINDOW_EVENTS(timestamp, ‘request’, -7, -3) = [] // Between 3 and 7 days ago.

  • // However, 5 days in the future from the first event, the current record falls in the window.

  • SIZE(WINDOW_EVENTS(timestamp + 5864001000, type, -7, -3)) >= 1 // Between 3 and 7 days ago.

  • // The current record falls inside a window starting looking back 1 day from 1 millisecond in the future.

  • WINDOW_EVENTS(timestamp + 1, type, -1, 0) = [${@timeline.events[0]}] // Within the past day, + 1 millisecond.

  • WINDOW_EVENTS(timestamp + 1, ‘BAD_TYPE’, -7, 0) = []

  • // Adjust window by -1, 0, and +1 milliseconds.

  • WINDOW_EVENTS(timestamp - 1, type, 0, 1)[0] = ${@timeline.events[0]} // First event is first returned.

  • WINDOW_EVENTS(timestamp, type, 0, 1)[0] = ${@timeline.events[0]} // First event is first returned.

  • WINDOW_EVENTS(timestamp + 1, type, 0, 1)[0] = null

  • // Adjust the current timestamp to be outside of the window for the current sample.

  • current = timestamp; SET_PROPERTY(event, ‘timestamp’, 0);

  • WINDOW_EVENTS(current + 1, type, -1, 0)[0] = null
    
  • @param current_time Timestamp in milliseconds.

  • @param event_type String denoting the event type, e.g., ‘request’.

  • @param days_start The number of days to add to current_time. Event timestamps are >=.

  • @param days_end The number of days to add to current_time. Event timestamps are < (strictly).

  • @return window_events A list of events. */

function window_events(current_time, filter_type, days_start, days_end) { millis_per_day = 24 * 3600 * 1000;

// window_events FILTER(timeline.events, (x) -> days_diff = (x.timestamp - current_time) / millis_per_day; IFRETURN(days_diff > days_end,0); x.type = filter_type AND BETWEEN(days_diff, days_start, days_end), (n,x) -> x.timestamp > current_time + days_end * millis_per_day) // Break upon reaching the end of the window. }

RECENT_EVENTS /**

  • Return a list of all events of a given filter_type occurring in recent days.

  • Examples:

  • // No records in the past 7 days from the first record.

  • RECENT_EVENTS(timestamp, ‘request’, 7) = []

  • // Current record is within the past 7 days from the first record + 1 millisecond.

  • SIZE(RECENT_EVENTS(timestamp+1, type, 7)) >= 1

  • // First record is the only record within 7 days of itself + 1 millisecond.

  • RECENT_EVENTS(timestamp+1, type, 7) = [${@timeline.events[0]}]

  • @param current_time Timestamp in milliseconds.

  • @param filter_type String denoting the event type, e.g., ‘request’.

  • @param days The number of days to look back to include events.

  • @return recent_events A list of events. */

function recent_events(current_time, filter_type, days) { // Return a list of all recent events of a given ‘filter_type’ in last ‘days’. recent_events = WINDOW_EVENTS(current_time, filter_type, -1 * days, 0); recent_events }

INV_RESCALE /**

  • Inverse the RESCALE() transformation on y using location

  • and scale parameters: (y - location) / scale.

  • Examples:

  • INV_RESCALE(11,1,2) = 5 // (11 - 1) / 2

  • @param y Scalar to be transformed.

  • @param location Scalar.

  • @param scale Scalar.

  • @return x (y - location) / scale. */

function inv_rescale(y, location, scale) { (y - location) / scale }

PARSE_KV_CAT /**

  • Return the categorical (key) component of a key-value pair string kv.

  • Examples:

  • PARSE_KV_CAT(‘key:3.0’) = ‘key’

  • @param kv Key-value pair string, e.g., ‘key:3.0’.

  • @return cat The categorical (key) component of kv. */

function parse_kv_cat(kv) { MAP_VALUE(PARSE_KV(kv),’cat’) }

LENGTH /**

  • Return the number of elements of my_list. Alias for SIZE().

  • @param my_list A list.

  • @return length Integer number of elements in my_list. */

function length(my_list) { // Alias for SIZE(). SIZE(my_list) }

LIST_REVERSE_ /**

  • Deprecated in favor of the faster built-in function: LIST_REVERSE().

  • Reverse the order of elements in my_list from first-to-last to last-to-first.

  • Examples:

  • LIST_REVERSE([‘a’,’b’,’c’,’d’]) = [‘d’,’c’,’b’,’a’]

  • @param my_list A list.

  • @return my_list_reverse my_list with elements in reverse order. */

function list_reverse_(my_list) { reverse_indices = RANGE_STEP(SIZE(my_list) - 1, -1, -1); LIST_SUB(my_list, reverse_indices) }

SUBSET_KV /**

  • Given k, a key, extract values[i] corresponding to k==keys[i].

  • Examples:

  • SUBSET_KV(‘a’,[‘a’,’a’,’b’,’c’],[1,2,3,4]) = [1,2]

  • @param k One key.

  • @param keys A list of keys.

  • @param values A list of values.

  • @return k_values The subset of the values list with key k. */

function subset_kv(k, keys, values) { // TODO: Is this redundant with SWITCH(x,cases,outputs)? matches = MAP(keys, (x) -> x=k); which_matches = WHICH(matches);

match_values = MAP(which_matches, (i) -> values[i]); match_values }

UNIQ /**

  • Return a list of the unique values in my_list.

  • Examples:

  • UNIQ([‘c’,’a’,’b’,’c’,’b’,’c’,’c’]) = [‘a’,’b’,’c’]

  • UNIQ([1,1.0,2,2.000]) = [1,2]

  • UNIQ([‘L1’,1.0,2,2.000]) = [1,2,’L1’]

  • @param my_list A list.

  • @return uniq_keys A sorted list of the unique values in my_list. */

function uniq(my_list) { counts = COUNT_UNIQ(STRING_LIST(my_list)); keys = MAP_KEYS(counts); MAP(keys, (x) -> IF(IS_NUMERIC(x), TO_DOUBLE(x), x)) }

GROUP_BY_SUM /**

  • Like COUNT_UNIQ(), but sum across values, rather than just counting the number of

  • times each unique element of groups appears.

  • Examples:

  • GROUP_BY_SUM([‘1’,’go’,’1’,’go’],[1,2,3,4]) = {‘1’:4.0, ‘go’:6.0}

  • GROUP_BY_SUM([‘1’,’go’,1,’go’],[1,2,3,4]) = {‘1’:4.0, ‘go’:6.0}

  • GROUP_BY_SUM([1,2,1.0,2.0],[1,2,3,4]) = {‘1’:4.0, ‘2’:6.0}

  • GROUP_BY_SUM([1,’2’,’1.0’,2.0],[1,2,3,4]) = {‘1’:4.0, ‘2’:6.0}

  • @param groups A list of group keys, e.g., [‘1’,’go’,’1’,’go’].

  • @param values A list of values to be summed within group, e.g., [1,2,3,4].

  • @return my_map A map of unique groups with values summed across values, e.g., {1=4.0, go=6.0}. */

function group_by_sum(groups, values) { // Collect unique group names.DATE // For each g in groups: // group:SUM(values) // values = [value[i] for x in groups if x==g] groups = MAP(groups, (g) -> IF(IS_NUMERIC(g), TO_DOUBLE(g), g)); unique_groups = UNIQ(groups); output = MAP(unique_groups, (g) -> SUM(SUBSET_KV(TO_STRING(g), STRING_LIST(groups), values)));

LISTS_TO_MAP(unique_groups, output) }

SPLIT /**

  • Rename SPLIT_STRING() as just SPLIT().

  • @param my_string String.

  • @param pattern Regular-expression pattern on which to split my_string, e.g., ‘ +’.

  • @return y List of strings resulting from split. */

function split(string, pattern) { STRING_SPLIT(string, pattern) }

REXP /**

  • Draw a random exponential with a scale parameter.

  • @param scale Scalar.

  • @return y Random exponential draw. */

function rexp(scale) { INV_EXP(RANDOM()) * scale }

CLIP /**

  • Clip/censor x between lower and upper.

  • Examples:

  • CLIP(-1, 0, 100) = 0

  • CLIP(111, 0, 100) = 100

  • CLIP(50, 0, 100) = 50

  • @param x Scalar to be transformed.

  • @param lower Scalar.

  • @param upper Scalar.

  • @return y y = MIN(upper,MAX(lower,x)). */

function clip(x, lower, upper) { MIN(upper, MAX(lower, x)) }

MAP_UNION /**

  • Union two maps.

  • Note: Keys and values cannot have commas ‘,’, equals ‘=’, or braces ‘{’ and ‘}’ due to

  • JSON parsing.

  • Examples:

  • MAP_UNION({‘a’:1,’b’:2}, {‘c’:3}) = {‘a’:1, ‘b’:2, ‘c’:3}

  • MAP_UNION({‘a’:1,’b’:2}, {}) = {‘a’:1, ‘b’:2}

  • MAP_UNION({}, {‘c’:3}) = {‘c’:3}

  • MAP_UNION({}, {}) = {}

  • MAP_UNION({“key1a”:”value1a”,”key1b”:”value1b”},

  •         {"key2a":"value2a","key2b":"value2b"})
    
  • = {"key1a":"value1a", "key1b":"value1b", "key2a":"value2a", "key2b":"value2b"}
    
  • @param my_map A map, e.g., {cat=1, num=2}.

  • @return my_appended_map my_map with the key:value pair appended. */

function map_union(my_map1, my_map2) { // Transform the map to a string. string_map1 = JSON(my_map1); string_map2 = JSON(my_map2); string_map1 = REGEX_REPLACE(string_map1, [‘^[{]’,’’, ‘[}]\(','']); string_map2 = REGEX_REPLACE(string_map2, ['^[{]','', '[}]\)’,’’]);

string_map = CONCAT_WS(‘,’, string_map1, string_map2); string_map = REGEX_REPLACE(string_map, [‘^,’,’’, ‘,$’,’’]); string_map = CONCAT(‘{’, string_map, ‘}’); JSON_TO_MAP(string_map) }

LIST_PREPEND /**

  • Prepend insert_list to my_list (at the beginning), pushing subsequent elements

  • down by the number of elements of insert_list.

  • Examples:

  • LIST_PREPEND(RANGE(-3,3), RANGE(99,102)) = [99,100,101,-3,-2,-1,0,1,2]

  • @param my_list A list.

  • @param insert_list List to be prepended to my_list.

  • @return my_longer_list my_list with insert_list prepended. */

function list_prepend(my_list, insert_list) { LIST_INSERT(my_list, insert_list, 0) }

MD5_PARTITIONS /**

  • Use MD5() to map partition_var into a partition denoted by names. The mapping is

  • quasi-random with respect to partition_var due to MD5()’s hashing properties. Use

  • shares to assign the probability of assignment to each partition. Doubles as a

  • weighted uniform hash-deterministic random sampler (e.g., random uniform given a seed).

  • Examples:

  • SET_PROPERTY(timeline,’id’,’my_timeline_id’); // For a deterministic example.

  • // New succinct, extendable, statistically robust UDF.
    
  • MD5_PARTITIONS(timeline.id, 'my_seed', [0.008, 0.002], ['train','test']) = 'test'
    
  • // Old succinct, undocumented approach with questionable statistical properties.
    
  • // IF(HASH(timeline.id) % 100 &gt; 20, "train", "test")
    
  • @param partition_var A partitioning expression, e.g., timeline.id.

  • @param seed String to append to partition_var prior to hashing in order to customize

  • (synchronize or make independent) the randomization for the application/model/etc.

  • @param shares List of (relative) shares of the split, e.g., [0.8, 0.2].

  • @param names List of partition names, e.g., [‘train’, ‘test’].

  • @return name A single, MD5-hash-weighted-random element of names. */

function md5_partitions(partition_var, seed, shares, names) { sum_shares = SUM(shares); upper_bin_edges = MAP(CUMSUM(shares), (x) -> x / sum_shares); // Normalize relative shares to sum to one. lower_bin_edges = LIST_PREPEND_VALUE(upper_bin_edges,0);

md5_string = CONCAT(TO_STRING(partition_var), TO_STRING(seed)); // var + seed. probability = MD5_MOD(md5_string, 1000) / 1000; // Create 1000 bins (e.g., 0.1% bins).

// Two approaches to getting the record’s partition bin index. // prob_bin = BIN_INDEX(probability, lower_bin_edges); prob_which_bin = WHICH_BIN_INDEX(probability, lower_bin_edges);

name = names[prob_which_bin]; name }

GSUB /**

  • Simple substitution of find with replace. Replaces all matches using

  • REGEX_REPLACE().

  • Examples:

  • GSUB(’ ‘,’_’,’Hello World!’) = ‘Hello_World!’

  • @param find String. Can be a regular expression.

  • @param replace String.

  • @param my_string String.

  • @return y String with replacements made. */

function gsub(find, replace, my_string) { // Simple substitution REGEX_REPLACE(my_string, [find, replace]) }

NEXT_EVENT_SORTED /**

  • Find and return the next event of a given type, assuming that

  • timeline.events is sorted by each event’s timestamp.

  • Examples:

  • // There should be a next event of type=’request’ after the first record.

  • NEXT_EVENT_SORTED(timestamp, ‘request’) != []

  • // First record is the next event from itself - 1 millisecond.

  • NEXT_EVENT_SORTED(timestamp-1, type) = ${@timeline.events[0]}

  • @param current_time Timestamp in milliseconds.

  • @param filter_type String denoting the event type, e.g., ‘request’.

  • @return next_event A single event. */

function next_event_sorted(current_time, filter_type) { filtered_events = FILTER(timeline.events, (x) -> x.type=filter_type); time_filtered_events = FILTER(filtered_events, (x) -> x.timestamp > current_time);

// Return the first event in the list, since events are sorted by timestamp. No tie breaker. next_event = time_filtered_events[0]; next_event }

DOUBLE /**

  • Rename to_double() as just double().

  • @param x Scalar or string.

  • @return y Numeric scalar. */

function double(x) { TO_DOUBLE(x) }

RUNIF /**

  • Draw a random uniform with location and scale parameters:

  • (0,1)*scale + location.

  • @param location Scalar.

  • @param scale Scalar.

  • @return y Random uniform draw from (0,1)*scale + location. */

function runif(location, scale) { RANDOM() * scale + location }

OCTAL_INT /**

  • Rename to_int/int32() as octal_int() because leading zeros in strings are treated as

  • octal: TO_INT(‘08’) is invalid, TO_INT(‘010’) = 8 base 10.

  • Examples:

  • OCTAL_INT(‘08’) = null // Invalid octal.

  • OCTAL_INT(‘010’) = 8 // Octal ‘010’ is 8 base 10.

  • OCTAL_INT(‘027’) = 23 // Octal ‘027’ is 2POW(8,1)+7POW(8,0) = 23 base 10.

  • OCTAL_INT(‘0127’) = 1POW(8,2)+2POW(8,1)+7*POW(8,0) // Octal ‘0127’ is 87 base 10.

  • OCTAL_INT(‘0127’) = TO_INT(‘0127’) // Alias

  • @param x Scalar or string.

  • @return y Integer. */

function octal_int(x) { TO_INT(x) }

AVG_MAP /**

  • Return the average of the values of a map of key=value

  • pairs.

  • Examples:

  • my_map = JSON_TO_MAP(‘{“key1”:10,”key2”:2}’);

  • AVG_MAP(my_map) = 'avg:6.0'
    
  • my_list = [1,1,2,2,3,3,3,3];

  • my_map = COUNT_UNIQ(INT_LIST(my_list));
    
  • AVG_MAP(my_map) = 'avg:2.6666666666666665'
    
  • @param my_map A map, e.g., {cat=1, num=2}.

  • @return kv_string A key-value pair as a string, e.g., num:2. */

function avg_map(my_map) { // AVG(my_list) // Works for lists. // AVG(my_map) // Doesn’t work for maps.

my_map_sum = SUM_MAP(my_map); CONCAT_WS(‘:’, ‘avg’, PARSE_KV_NUM(my_map_sum) / SIZE(my_map)) }

COUNT_NS /**

  • Return the number of elements in my_list, ignoring nulls.

  • Examples:

  • COUNT_NS([null,null,2,2,2,4,4,4]) = 6

  • COUNT_NS([null]) = 0

  • COUNT_NS([‘’]) = 1

  • COUNT_NS([]) = 0

  • COUNT_NS([‘null’]) = 1

  • @param my_list A list, e.g., [null,null,2,2,2,4,4,4].

  • @return count The number of non-null elements of my_list. */

function count_ns(my_list) { SIZE(DROP_NULLS(my_list)) }

ROUND10 /**

  • Round x to the base_10_decimal_place.

  • Examples:

  • ROUND10(100*PI(),-1) = 310.0

  • ROUND10(100*PI(),3) = 314.159

  • @param x Scalar.

  • @param base_10_decimal_place Decimal place to round, e.g., -2 (100s), -1 (10s), 0 (1s), 1 (0.1s), 2 (0.01s).

  • @return x_rounded x rounded to base_10_decimal_place. */

function round10(x, base_10_decimal_place) { round_units = POW(10, base_10_decimal_place); ROUND(x * round_units) / round_units }

LOG_BIN /**

  • Scale x into powers of base. To adjust location, do pre-/post-adjustments: e.g.,

  • LOG_BIN() + location or LOG_BIN(x + location, base).

  • Examples:

  • LOG_BIN(1024 * 0.8, 2) = 512 // Powers of 2.

  • LOG_BIN(1024 * 0.8, 4) = 256 // Powers of 4.

  • LOG_BIN(1024 * 0.8, 10) = 100 // Powers of 10.

  • @param x Scalar.

  • @param base Scalar.

  • @return x_bin The log-bin that x falls into in units of x. */

function log_bin(x, base) { POW(base, FLOOR(LOG(x, base))) }

QUANTILE /**

  • Retrieve the quantile value for which p proportion of my_list

  • is smaller / ranked / sorted lower.

  • Examples:

  • QUANTILE(RANGE(90,101), 0.5) = 95 // Median between 90 and 100.

  • median = QUANTILE(RANDOM_LIST(500), 0.5); BETWEEN(median,0.4,0.6) = 1

  • @param my_list A list.

  • @param p Probability in [0,1] at which to retrieve the quantile. 0=MIN,

  • 1=MAX.

  • @return q_value The quantile value corresponding to p. */

function quantile(my_list, p) { num_elements = SIZE(my_list) - 1; q = SORT(my_list)[FLOOR(p * num_elements)]; q }

LIST_SUBTRACT /**

  • Subtract list1 and list2 by removing all elements of

  • list1 having values in list2.

  • Examples:

  • LIST_SUBTRACT(RANGE(-3,3), RANGE(-1,2)) = [-3,-2,2]

  • @param list1 A list.

  • @param list2 List against which each element of list1 will be checked.

  • @return my_intersect_list list1 with list2’s common values removed. */

function list_subtract(list1, list2) { // Quadratic list search without sorting. // Take each element of list1 and count how many times that // same element shows up in list2. FILTER(list1, (x) -> SIZE(FILTER(list2, (y) -> x = y)) = 0) }

LIST_REMOVE /**

  • Remove the elements of my_list at each of indices,

  • pulling subsequent elements up by one (-1) index for each

  • element above them removed.

  • Examples:

  • LIST_REMOVE(RANGE(-3,3), [1,3]) = [-3,-1,1,2]

  • @param my_list A list.

  • @param indices List of indices to remove.

  • @return my_remove_list my_list with the elements at indices removed. */

function list_remove(my_list, indices) { idx = RANGE(0,SIZE(my_list));

// Could use the REDUCE() for-loop if indices is sorted. include_indices = LIST_SUBTRACT(idx,indices);

LIST_SUB(my_list, include_indices) }

LIST_MULTIPLY /**

  • Elementwise list multiplication: list1 * list2.

  • Examples:

  • LIST_MULTIPLY([1,-1],[1,2]) = [1,-2]

  • @param list1 A list of numbers.

  • @param list2 A list of numbers.

  • @return list_out list1 * list2 */

function list_multiply(list1, list2) { MAP(RANGE(0,SIZE(list1)), (i) -> list1[i] * list2[i]) }

LIST_PLUS /**

  • Elementwise list addition/’plus’: list1 + list2.

  • Examples:

  • LIST_PLUS([1,-1],[1,1]) = [2,0]

  • @param list1 A list of numbers.

  • @param list2 A list of numbers.

  • @return list_out list1 + list2 */

function list_plus(list1, list2) { MAP(RANGE(0,SIZE(list1)), (i) -> list1[i] + list2[i]) }

BAYESIAN_BOOTSTRAP /**

  • Use MD5() to map the input string cluster_var into a uniform random number.

  • Transform that uniform random number into a random exponential to create a random

  • weight for the Bayesian bootstrap.

  • Using an identifier such as timeline.id approximates the statistical behavior

  • of using clustered standard errors or a block-bootstrap.

  • Examples:

  • SET_PROPERTY(timeline,’id’,’my_timeline_id’); // For a deterministic example.

  • // bootstrap_cluster_var is defined by "cluster_var_expression": "timeline.id"
    
  • cluster_var = timeline.id; bootstrap_id = 3;
    
  • // Succinct UDF:
    
  • BAYESIAN_BOOTSTRAP(cluster_var, 'bootstraps', bootstrap_id) = 0.4263795482392406
    
  • // Old approach:
    
  • // -1 * LOG(1 - MD5_MOD(CONCAT('string', bootstrap_id, cluster_var), 1000000) / 1000000 + 0.0000005, EXP(1))
    
  • @param cluster_var An input string, e.g., timeline.id.

  • @param seed String to prepend to cluster_var prior to hashing in order to customize

  • (synchronize or make independent) the randomization for the application/model/etc.

  • @param bootstrap_id Bootstrap run identifier. Typically 1 through the number of bootstraps.

  • @return weight_multiplier A single, MD5-hash-produced, and exponentially-distributed random number. */

function bayesian_bootstrap(cluster_var, seed, bootstrap_id) { // Hash string. md5_string = CONCAT(seed, bootstrap_id, cluster_var);

// 6 digits of precision for the uniform hash. uniform_random_hash = MD5_MOD(md5_string, 1000000) / 1000000;

// Input range to the exponential inverse CDF: 0.0000005 to 0.9999995, by 1e-6. weight_multiplier = INV_EXP(uniform_random_hash + 0.0000005); weight_multiplier }

WEAKLY_BETWEEN /**

  • If start <= x <= end, return 1. Otherwise return 0.

  • Examples:

  • start = 700; end = -700; // Days

  • reference_time = MILLIS(TO_DATETIME('2022-01-01 9:00:00.000'));
    
  • // Replace this expression...
    
  • IF(reference_time - end*24*3600*1000 &lt;= timestamp
    
  •    AND timestamp &lt;= reference_time - start*24*3600*1000, 1, 0)
    
  • =
    
  • // With this more readable expression using the UDF.
    
  • BETWEEN(reference_time - timestamp,
    
  •         start*24*3600*1000,
    
  •         end*24*3600*1000)
    
  • @param x Scalar.

  • @param start Scalar.

  • @param end Scalar.

  • @return is_x_weakly_between IF(start <= x <= end, 1, 0). */

function weakly_between(x, start, end) { IF(start <= x AND x <= end, 1, 0) }

ROUND_BASE /**

  • Round x to the decimal_place in base; return the result in base 10.

  • Examples:

  • ROUND_BASE(57,2,-1) = 58.0 // Round to the nearest 2.

  • ROUND_BASE(57,5,-1) = 55.0 // Round to the nearest 5.

  • ROUND_BASE(57,5,-2) = 50.0 // Round to the nearest 25.

  • ROUND_BASE(57.25,5,1) = 57.2 // Round to the nearest 1/5.

  • ROUND_BASE(100*PI(),10,-1) = 310.0

  • ROUND_BASE(100*PI(),10,3) = 314.159

  • @param x Scalar.

  • @param base Numeric base for rounding.

  • @param decimal_place Decimal place to round, e.g., -2 (100s), -1 (10s), 0 (1s), 1 (0.1s), 2 (0.01s),

  • where decimal place means base^(-1*decimal_place).

  • @return x_rounded x rounded to decimal_place in base, returned in base 10. */

function round_base(x, base, decimal_place) { round_units = POW(base, decimal_place); ROUND(x * round_units) / round_units }

KTH_INDEX /**

  • Return the k^th index of my_list. Designed to handle nulls and empty lists.

  • k^th index, k in [0, SIZE() - 1]

  • Same as just using k, but protects against empty lists/nulls/out-of-range indices.

  • Returns null because LIST_VALUE([], null) returns null, but LIST_VALUE([],0) does not

  • (bugfix in progress).

  • Examples:

  • KTH_INDEX([-2,0,3,10,20],0) = 0

  • KTH_INDEX([-2,0,3,10,20],3) = 3

  • KTH_INDEX([],0) = null

  • KTH_INDEX(null,0) = null

  • @param my_list A list.

  • @param k The index to confirm.

  • @return kth_index The index of the k^th element, if any. */

function kth_index(my_list, k) { list_size = SIZE(my_list); IF(0 <= k AND k < list_size, k, null) // Start at 0 for the first element. }

FIRST_INDEX /**

  • Return the first index of my_list. Designed to handle nulls and empty lists.

  • Examples:

  • FIRST_INDEX([-2,0,3,10,20]) = 0

  • FIRST_INDEX([]) = null

  • FIRST_INDEX(null) = null

  • @param my_list A list.

  • @return first_index The index of the first element, if any. */

function first_index(my_list) { // Maybe there could be a special notation for last? (first=’min’, last=’max’?) // E.g., \({@my_list[0]} = my_list[0] // If SIZE(my_list) &gt; 0. // \){@my_list[‘min’]} = my_list[FIRST_INDEX(my_list)]. // ${@my_list[‘max’]} = my_list[LAST_INDEX(my_list)]. KTH_INDEX(my_list, 0) }

ALL /**

  • Return TRUE if all elements of my_list are truthy.

  • Examples:

  • ALL([0,10,2,-4.3]) = FALSE

  • ALL([1,3,4,5,2,-4]) = TRUE

  • ALL([0]) = FALSE

  • ALL([‘tRuE’]) = TRUE

  • ALL([‘hi’]) = FALSE // All other strings are false.

  • ALL([null]) = FALSE

  • ALL([]) = TRUE // Vacuously true.

  • ALL(null) = null

  • @param my_list List, e.g. [0,10,2,-4.3].

  • @return all_true Boolean */

function all(my_list) { all_true = True; out = MAP(my_list, (x) -> all_true = all_true AND x; all_true); // Always return (vacuously) True for empty lists, []. IF(SIZE(my_list)=0, True, out[-1]) }

LN /**

  • Return the natural log, log base e of input x. Alias for NATLOG().

  • Examples:

  • LN(1) = 0

  • LN(EXP(1)) = 1

  • @param x Numeric scalar in (0, infinity).

  • @return ln_x Numeric scalar. */

function ln(x) { natLog(x) }

EVENTS /**

  • Return a list of all events of a given event_type with no time filtering.

  • See other UDFs like WINDOW_EVENTS() for time-based filtering.

  • Examples:

  • EVENTS(‘request’) != [] // There is at least one event of type ‘request’.

  • EVENTS(‘INVALID_TYPE’) = [] // There are none of type ‘INVALID_TYPE’.

  • @param event_type String denoting the event type, e.g., ‘request’.

  • @return events_of_type A list of events of type = event_type. */

function events(event_type) { FILTER(timeline.events, (e) -> e.type=event_type) }

STR_WHICH /**

  • Return a list of indices of my_list elements that are not equal to the empty

  • string ‘’.

  • Examples:

  • STR_WHICH([‘’,’hi’,’’,’bye’]) = [1,3]

  • STR_WHICH([0,’hi’,FALSE,’bye’,null,-1,’’,3]) = [1,3,5,7]

  • @param my_list List of strings, e.g. [‘’,’hi’,’’,’bye’].

  • @return indices List of indices of non-empty-string elements, e.g., [1,3]. */

function str_which(string_list) { WHICH(MAP(string_list, (x) -> IF(x!=’’ AND x!=FALSE AND x!=0 AND x!=null, TRUE, FALSE))) }

ROUND_DIGIT /**

  • Round x to the base_10_decimal_place.

  • Examples:

  • ROUND_DIGIT(100*PI(),-1) = 310.0

  • ROUND_DIGIT(100*PI(),3) = 314.159

  • @param x Scalar.

  • @param base_10_decimal_place Decimal place to round, e.g., -2 (100s), -1 (10s), 0 (1s), 1 (0.1s), 2 (0.01s).

  • @return x_rounded x rounded to base_10_decimal_place. */

function round_digit(x, base_10_decimal_place) { round_units = POW(10, base_10_decimal_place); ROUND(x * round_units) / round_units }

REPLACE /**

  • Rename REGEX_REPLACE() as just REPLACE().

  • @param my_string String.

  • @param replacements List of replacement string pairs, alternating between regular expressions and their

  • substitutions, e.g., [’ +’,’space’,’[0-9]’,’digit’], applied first to last.

  • @return y String with subtitutions made. */

function replace(my_string, replacements) { REGEX_REPLACE(my_string, replacements) }

LIST_RANGE /**

  • Subset my_list to a range of indices from start_index to end_index,

  • excluding end_index.

  • Uses RANGE() function, so the indices are Pythonic: 0, …, SIZE() - 1.

  • Recycles based on the size of the list using the modulus % operator. Consider using

  • my_list[RANGE(start,end)] for subsetting without recycling.

  • Examples:

  • LIST_RANGE([‘a’,’b’,’c’,’d’],2,4) = [‘c’,’d’]

  • LIST_RANGE(RANGE(-2,2),1,3) = [-1,0]

  • LIST_RANGE(RANGE(0,10),3,7) = [3,4,5,6]

  • LIST_RANGE(RANGE(0,10),-7,-3) = [3,4,5,6]

  • LIST_RANGE(RANGE(0,10),7,3) = []

  • LIST_RANGE(RANGE(0,10),3,17) = [3,4,5,6,7,8,9,0,1,2,3,4,5,6]

  • // LIST_SUB() allows negative indices but does not recycle.

  • LIST_SUB([‘a’,’b’],RANGE(-4,4)) = [null, null, ‘a’, ‘b’, ‘a’, ‘b’, null, null]

  • // LIST_RANGE() allows negative indices and recycles.

  • LIST_RANGE([‘a’,’b’],-4,4) = [‘a’, ‘b’, ‘a’, ‘b’, ‘a’, ‘b’, ‘a’, ‘b’]

  • LIST_RANGE([‘a’,’b’,’c’,’d’],2,’4’) = [‘c’,’d’]

  • LIST_RANGE([‘a’,’b’,’c’,’d’],’c’,4) = null

  • LIST_RANGE([‘a’,’b’,’c’,’d’],null,null) = []

  • LIST_RANGE([‘a’,’b’,’c’,’d’],null,4) = []

  • LIST_RANGE([‘a’,’b’,’c’,’d’],2,null) = []

  • LIST_RANGE([],2,4) = [null, null] // TODO: null or []. Is this desirable?

  • LIST_RANGE(null,2,4) = null // TODO: null or []. Is this desirable?

  • LIST_RANGE([null],2,4) = [null, null] // Recycling.

  • LIST_RANGE([null],0,1) = [null]

  • @param my_list A list.

  • @param start_index Integer.

  • @param end_index Integer. Should be > start_index.

  • @return list_subset List with only elements in indices. */

function list_range(my_list, start, end) { // Use RANGE() then mod % to cleanly apply recycling. indices = COALESCE(RANGE(start, end),[]); // Invalid inputs mean empty indices. list_size = SIZE(my_list); recycled_indices = MAP(indices, (x) -> x % list_size);

// If the original inputs should return an empty list, create an empty list with RANGE(). // Otherwise, return the subsetted list. IF(start >= end AND start!=null and end!=null, RANGE(start, end), LIST_SUB(my_list, recycled_indices)) }

// function list_range_old(my_list, start_index, end_index) { // list_size = SIZE(my_list); // start = start_index % list_size; // end = (end_index - 1) % list_size; // - 1 because the end index is exclusive. // indices = RANGE(start, end + 1); // + 1 because the end index is exclusive. // // // If the original inputs should return an empty list, create an empty list with RANGE(). // // Otherwise, return the subsetted list. // IF(start_index > end_index, // RANGE(start_index, end_index), // LIST_SUB(my_list, indices)) // }

LOG_ROUND_BASE /**

  • Round the logarithm of x in base to the decimal_place; return the rounded,

  • base-exponentiated result in base 10:

  • base^ROUND_BASE(log(x,base),base,decimal_place).

  • This is especially useful for creating steps that are logarithmic and exponential while

  • preserving the original units of highly skewed data.

  • Examples:

  • // Convert 9 to base 2, round to -1 (‘twos’) digit, convert back to base 10.

  • // 9, round to the nearest ‘twos’ power of 2 –> 9 = 2^3.x –> 4 –> 2^4 –> 16.

  • LOG_ROUND_BASE(9,2,-1) = 16

  • MAP([5, 9, 27, 62], (x) -> TO_INT(LOG_ROUND_BASE(x,2,-1))) = [4, 16, 16, 64]

  • LOG_ROUND_BASE(120,10,0) = POW(10,2) // (rounding the exponent 2.1 –> 2)

  • LOG_ROUND_BASE(1000000,10,-1) = POW(10,10) // (rounding the exponent 6 –> 10)

  • MAP(RANGE(1,9), (x) -> ROUND10(LOG_ROUND_BASE(x,2,1),3))

  • = [1.0, 2.0, 2.828, 4.0, 5.657, 5.657, 8.0, 8.0]
    
  • @param x Scalar.

  • @param base Numeric base for rounding.

  • @param decimal_place Decimal place to round, e.g., -2 (100s), -1 (10s), 0 (1s), 1 (0.1s), 2 (0.01s),

  • where decimal place means base^(-1*decimal_place).

  • @return x_rounded x rounded to decimal_place in base, returned in base 10. */

function log_round_base(x, base, decimal_place) { POW(base, ROUND_BASE(LOG(x,base), base, decimal_place)) }

HISTOGRAM /**

  • Compute a histogram bin for my_list: count the number of elements

  • of my_list between two elements of knots. index and

  • cumulative_counts determine which two knots to use. return_counts

  • determines whether to return counts or a rate = counts / width between

  • the two knots.

  • Examples:

  • my_list = [1,2,3,4,7,10,12]; // Numerical list.

  • index = 0; // Do loops from 0 to SIZE(knots) - 2 (Which bin to count?)
    
  • knots = [0, 7];
    
  • cumulative_counts = FALSE; // Count always from knot[0] or knot[index] to knot[index+1]?
    
  • return_counts = FALSE; // Counts or time rates (divide by duration)?
    
  • HISTOGRAM(my_list, index, knots, cumulative_counts, return_counts) = 4/7
    
  • my_list = [1,2,3,4,7,10,12]; // Numerical list.

  • index = 1; // Do loops from 0 to SIZE(knots) - 2 (Which bin to count?)
    
  • knots = [0, 7, 14];
    
  • cumulative_counts = TRUE; // Count always from knot[0] or knot[index] to knot[index+1]?
    
  • return_counts = FALSE; // Counts or time rates (divide by duration)?
    
  • HISTOGRAM(my_list, index, knots, cumulative_counts, return_counts) = 7/14
    
  • @param my_list A numeric list.

  • @param index Knot index/bin to be evaluated, e.g., from 0 to SIZE(knots) - 2.

  • @param knots List of knots to be considered, e.g., [0, 7].

  • @param cumulative_counts Count from knot[0] (TRUE) or knot[index] (FALSE) to knot[index+1]?

  • @param return_counts Counts (TRUE) or time rates / divide by duration (FALSE)?

  • @return y Scalar count or rate, depending on counts. */

function histogram(my_list, index, knots, cumulative_counts, return_counts) { // my_list = [1,2,3,4,10,12]; // Numerical list. // index = 0; // Do loops from 0 to SIZE(knots) - 2 (Which bin to count?) // knots = [0, 7]; // In days. // cumulative_counts = FALSE; // Count always from knot[0] or knot[index] to knot[index+1]? // return_counts = FALSE; // Counts or time rates (divide by duration)?

window_start = IF(cumulative_counts, knots[0], knots[index]); window_end = knots[index + 1];

// ‘Duration’ is just the difference between knots. window_duration = window_end - window_start;

count_between_knots = HISTOGRAM_BIN(my_list, window_start, window_end);

elements_per_duration = count_between_knots / window_duration;

// Return the counts or rate of events per day. IF(return_counts, count_between_knots, elements_per_duration) }

LIST_APPEND_VALUE /**

  • Append value to my_list (at the end).

  • Examples:

  • LIST_APPEND_VALUE(RANGE(-3,3), 100) = [-3,-2,-1,0,1,2,100]

  • @param my_list A list.

  • @param value Value to be appended to my_list.

  • @return my_longer_list my_list with value appended. */

function list_append_value(my_list, value) { LIST_INSERT_VALUE(my_list, value, SIZE(my_list)) }

LIST_INTERSECT /**

  • Intersect list1 and list2 by retaining all elements of

  • list1 having values in list2.

  • Examples:

  • LIST_INTERSECT(RANGE(-3,3), RANGE(-1,2)) = [-1,0,1]

  • @param list1 A list.

  • @param list2 List against which each element of list1 will be checked.

  • @return my_intersect_list list1 with list2’s common values retained. */

function list_intersect(list1, list2) { // Quadratic list search without sorting. // Take each element of list1 and count how many times that // same element shows up in list2. FILTER(list1, (x) -> SIZE(FILTER(list2, (y) -> x = y)) > 0) }

LIST_DIVIDE /**

  • Elementwise list division: list1 / list2.

  • Examples:

  • LIST_DIVIDE([1,-1],[1,2]) = [1,-1/2]

  • @param list1 A list of numbers.

  • @param list2 A list of numbers.

  • @return list_out list1 / list2 */

function list_divide(list1, list2) { MAP(RANGE(0,SIZE(list1)), (i) -> list1[i] / list2[i]) }

SWITCH /**

  • Obtain the index for which x = cases[index].

  • Return the value of outputs[index].

  • Examples:

  • SWITCH(3, [1,2,3], [‘a’,’b’,’c’]) = ‘c’

  • @param x A value to compare with the list of cases.

  • @param cases A list of values to compare to x.

  • @param outputs A list of outputs.

  • @return x_output The output value for x’s case. */

function switch(x, cases, outputs) { // Cases list should match the type of x. // Outputs list should have a common output type. // Defaults to the last value of outputs.

// ENUM_BUCKET() in a UDF causes this the fail (bugfix, // similar to MAP_TO_LIST() in progress). This logic is // valid and functional, though. // outputs[ENUM_BUCKET(x, cases, SIZE(outputs) - 1)]

// Probably slower. idx = RANGE(0,SIZE(cases));

// Using WHICH(). // matched_index = COALESCE(WHICH(MAP(cases, // (c) -> x=c))[0], // SIZE(outputs) - 1);

matched_index = COALESCE(FILTER(idx, (i) -> x=cases[i])[0], SIZE(outputs) - 1); outputs[matched_index] }

RANGE_N /**

  • Return a list of n evenly numbers from start to end (inclusive).

  • Examples:

  • RANGE_N(-2,8,6) = [-2,0,2,4,6,8]

  • @param start Numeric scalar, e.g., -2.

  • @param end Numeric scalar, e.g., 8.

  • @param n Number of evenly spaced numbers to return.

  • @return range_list Numeric list, e.g., [-2,0,2,4,6,8]. */

function range_n(start, end, n) { idx = RANGE(0,n); step = (end - start) / (n - 1); // n-1 spaces between [start, end]. range_list = MAP(idx, (x) -> x * step + start); range_list }

AVG_NS /**

  • Return the mean of my_list, ignoring nulls.

  • Examples:

  • AVG_NS([null,null,2,2,2,4,4,4]) = 3

  • AVG_NS(null) = null

  • TRUE // TODO: null. AVG_NS([null]) = NaN // TODO: null. Divide by zero?

  • AVG_NS([‘’]) = null

  • AVG_NS([‘c’]) = null

  • AVG_NS([‘c’,1]) = null // TODO: 1. Ignore strings.

  • TRUE // TODO: NaNs. AVG_NS([]) = NaN // TODO: null. Divide by zero?

  • @param my_list A list, e.g., [null,null,2,2,2,4,4,4].

  • @return mean The mean of non-null elements of my_list. */

function avg_ns(my_list) { AVG(DROP_NULLS(my_list)) }

AG_MAX_NS /**

  • Return the maximum value of my_list, ignoring nulls.

  • Examples:

  • AG_MAX_NS([null,null,2,2,2,4,4,4]) = 4

  • AG_MAX_NS(null) = null

  • AG_MAX_NS([null]) = null

  • AG_MAX_NS([‘’]) = null

  • AG_MAX_NS([‘c’]) = null

  • AG_MAX_NS([‘c’,1]) = 1

  • AG_MAX_NS([]) = null

  • @param my_list A list, e.g., [null,null,2,2,2,4,4,4].

  • @return max The maximum value of non-null elements of my_list. */

function ag_max_ns(my_list) { AG_MAX(DROP_NULLS(my_list)) }

MODE_IMPUTE /**

  • Replace all null values of my_list with the mode of the

  • non-null elements.

  • Examples:

  • MODE_IMPUTE([null,null,2,2,4]) = [2,2,2,2,4]

  • @param my_list A list, e.g., [null,null,2,2,4].

  • @return my_imputed_list A list with nulls imputed using the mode. */

function mode_impute(my_list) { mode = MODE_NS(my_list)[0]; // Extract the first mode, in case of ties. MAP(my_list, (x) -> COALESCE(x,mode)) }

MAP_APPEND /**

  • Append a key:value pair to my_map.

  • Note: Keys and values cannot have commas ‘,’, equals ‘=’, or braces ‘{’ and ‘}’ due to

  • JSON parsing.

  • Examples:

  • MAP_APPEND({‘v’:1},’v0’,1) = {‘v0’:1,’v’:1}

  • MAP_APPEND(null,’v’,1) = {‘v’:1}

  • MAP_APPEND({},’v’,1) = {‘v’:1}

  • my_map = JSON_TO_MAP(‘{“key1”:”value1”,”key2”:”value2”}’);

  • MAP_APPEND(my_map, 'key3', 123.0)
    
  •   = {'key1':'value1', 'key2':'value2', 'key3':123.0}
    
  • @param my_map A map, e.g., {cat=1, num=2}.

  • @return my_appended_map my_map with the key:value pair appended. */

function map_append(my_map, key, value) { MAP_UNION(my_map, {key:value}) }

KTH_LAST_INDEX /**

  • Return the k^th-from-last index of my_list. Designed to handle nulls and empty

  • lists.

  • k^th index, k in [0, SIZE() - 1]

  • Same as just using k, but protects against empty lists/nulls/out-of-range indices.

  • Returns null because LIST_VALUE([], null) returns null, but LIST_VALUE([],0) does not

  • (bugfix in progress).

  • Examples:

  • KTH_LAST_INDEX([-2,0,3,10,20],3) = 1

  • KTH_LAST_INDEX([],0) = null

  • KTH_LAST_INDEX(null,0) = null

  • @param my_list A list.

  • @param k The index to confirm.

  • @return kth_last_index The index of the k^th element, if any. */

function kth_last_index(my_list, k) { // k^th from last index, k in [0, SIZE() - 1] list_size = SIZE(my_list); index = list_size - 1 - k; // Start at 0 for the last element.

// Returns null because LIST_VALUE([], null) returns null, but LIST_VALUE([],0) does not (bugfix in progress). IF(0 <= index AND index < list_size, index, null) }

LAST_INDEX /**

  • Return the last index of my_list. Designed to handle nulls and empty lists.

  • Examples:

  • LAST_INDEX([-2,0,3,10,20]) = 4

  • LAST_INDEX([]) = null

  • LAST_INDEX(null) = null

  • @param my_list A list.

  • @return last_index The index of the last element, if any. */

function last_index(my_list) { // Maybe there could be a special notation for last? (first=’min’, last=’max’?) // E.g., \({@my_list[0]} = my_list[0] // If SIZE(my_list) &gt; 0. // \){@my_list[‘min’]} = my_list[FIRST_INDEX(my_list)]. // ${@my_list[‘max’]} = my_list[LAST_INDEX(my_list)]. KTH_LAST_INDEX(my_list, 0) }

PERCENTILE /**

  • Retrieve the quantile value for which p proportion of my_list

  • is smaller / ranked / sorted lower.

  • Examples:

  • PERCENTILE(RANGE(90,101), 50) = 95 // Median between 90 and 100.

  • median = PERCENTILE(RANDOM_LIST(500), 50); BETWEEN(median,0.4,0.6) = 1

  • @param my_list A list.

  • @param percent Number between 0 and 100.

  • @return percentile The value corresponding to the percent-ile of my_list. */

function percentile(my_list, percent) { // my_list : Numerical list. // percent : Number between 0 and 100. QUANTILE(my_list, percent / 100) }

INT /**

  • Rename to_int/int32() and strip out any leading zeros because leading zeros in strings

  • are treated as octal: TO_INT(‘08’) is invalid, TO_INT(‘010’) = 8 base 10. This behavior

  • is an artifact of Java’s casting of strings to integers.

  • Examples::

  • INT(0) = 0

  • INT(‘0’) = 0

  • INT(‘000’) = 0

  • INT(‘08’) = 8

  • INT(‘010’) = 10

  • INT(‘027’) = 27

  • INT(‘0127’) = 127

  • INT(‘0-1’) = -1

  • INT(‘00-001’) = -1 // Strip leading zeros, TO_INT(‘-001’) = -1.

  • @param x Scalar or string.

  • @return y Integer. */

function int(x) { // Strip out leading zeros to avoid any octal issues. IF(TO_INT(x)!=0, TO_INT(REGEX_REPLACE(TO_STRING(x),[‘^0+’,’’])), 0) }

LIST_REPLACE_VALUE /**

  • Replace the value of my_list at index.

  • Examples:

  • LIST_REPLACE_VALUE(RANGE(-3,3), 100, 3) = [-3,-2,-1,100,1,2]

  • LIST_REPLACE_VALUE([], null, 0) = []

  • LIST_REPLACE_VALUE([], ‘’, 0) = []

  • LIST_REPLACE_VALUE([null], ‘’, 0) = [‘’]

  • LIST_REPLACE_VALUE([‘’], null, 0) = [null]

  • LIST_REPLACE_VALUE([‘’], ‘’, 0) = [‘’]

  • LIST_REPLACE_VALUE([0,1], null, 0) = [null, 1]

  • LIST_REPLACE_VALUE([], ‘a’, 0) = []

  • LIST_REPLACE_VALUE([], [], 0) = []

  • @param my_list A list.

  • @param value Replacement value.

  • @param index Index that value will replace.

  • @return my_replace_list my_list with the value of at index replaced by value. */

function list_replace_value(my_list, value, index) { list_size = SIZE(my_list); indices = RANGE(0,list_size); MAP(indices, (i) -> IF(i = index, value, my_list[i])) }

IS_NULL /**

  • Test whether x is null. (Warning: TO BE DEPRECATED. Use =null instead.)

  • Examples:

  • IS_NULL(null) = TRUE

  • IS_NULL(2) = FALSE

  • IS_NULL(‘hi’) = FALSE

  • IS_NULL([null,null,2,2,4,4]) = FALSE

  • @param x An object.

  • @return is_null A boolean denoting whether x is null. */

function IS_NULL(x) { // Warning: TO BE DEPRECATED. Use =null instead. x = null }

NEXT /**

  • Return the next event with the same type as my_event.

  • Examples:

  • NEXT(self) != self // The next event is not the current event.

  • NEXT(timeline.events[0]) = NEXT_EVENT_(timeline.events[0].timestamp, timeline.events[0].type)

  • @param my_event An event from timeline.events or the current in-focus event, event / self.

  • @return next_event The next event of the same type as my_event. */

function next(my_event) { NEXT_EVENT_(my_event.timestamp, my_event.type) }

// Future version. // function next(my_event) { // idx = FILTER(RANGE(my_event.index + 1, SIZE(timeline.events)), // (i) -> timeline.events[i].type = my_event.type, // (n,x) -> n > 1)[0] // timeline.events[idx] // }

MOST_RECENT_SORTED /**

  • Find and return the most recent event of a given type, assuming that

  • timeline.events is sorted by each event’s timestamp.

  • Examples:

  • // No recent records from the first record.

  • MOST_RECENT_SORTED(timestamp, ‘request’) = null

  • // First record is the most recent from itself + 1 millisecond.

  • MOST_RECENT_SORTED(timestamp+1, type) = ${@timeline.events[0]}

  • @param current_time Timestamp in milliseconds.

  • @param filter_type String denoting the event type, e.g., ‘request’.

  • @return event_list_including_ties A single event. */

function most_recent_sorted(current_time, filter_type) { time_filtered_events = FILTER(timeline.events, (x) -> x.timestamp < current_time AND x.type=filter_type, (n,x) -> x.timestamp >= current_time);

// Return the last event in the list, since events are sorted by timestamp. No tie breaker. last_event = time_filtered_events[-1]; last_event }

BUILT_IN_TESTS /**

  • A list of test cases for UDF built-in functions dependencies. Add to the

  • list of examples in the documentation below.

  • Examples:

  • // Add UDF-related built-in test cases here.

  • BUILT_IN_TESTS()

  • current = timestamp; SET_PROPERTY(event, ‘timestamp’, 0); current != 0

  • current = timestamp; SET_PROPERTY(event, ‘timestamp’, 0); event.timestamp = 0

  • current = timestamp; SET_PROPERTY(event, ‘timestamp’, null); current != null

  • current = timestamp; SET_PROPERTY(event, ‘timestamp’, null); event.timestamp != null // TODO: Should = null

  • SUM(null) = null

  • SUM([null]) = 0

  • // TODO: FAILS: SUM([‘’]) // TODO: 0. Non-numeric strings should be ignored.

  • // TODO: FAILS: SUM([‘c’]) // TODO: 0. Non-numeric strings should be ignored.

  • // TODO: FAILS: SUM([‘c’,1]) // TODO: 1. Non-numeric strings should be ignored.

  • SUM([]) = 0

  • SUM([‘2’]) = 2

  • AVG(null) = null

  • // AVG([null]) = NaN // TODO: null. Divide by zero? = TRUE

  • // TODO: FAILS: AVG([‘’]) // TODO: 0. Non-numeric strings should be ignored.

  • // TODO: FAILS: AVG([‘c’]) // TODO: 0. Non-numeric strings should be ignored.

  • // TODO: FAILS: AVG([‘c’,1]) // TODO: 1. Non-numeric strings should be ignored.

  • // TODO: NaNs. AVG([]) = NaN // TODO: null. Divide by zero? = TRUE

  • AVG([‘2’]) = 2

  • AG_MIN(null) = null

  • AG_MIN([null]) = null

  • AG_MIN([‘’]) = null

  • AG_MIN([‘c’]) = null

  • AG_MIN([‘c’,1]) = 1 // Ignores strings.

  • AG_MIN([]) = null

  • AG_MAX(null) = null

  • AG_MAX([null]) = null

  • AG_MAX([‘’]) = null

  • AG_MAX([‘c’]) = null

  • AG_MAX([‘c’,1]) = 1 // Ignores strings.

  • AG_MAX([]) = null

  • COUNT(null) = null

  • COUNT([null]) = 1

  • COUNT([‘’]) = 1

  • COUNT([‘c’]) = 1

  • COUNT([‘c’,1]) = 2

  • COUNT([]) = 0

  • COALESCE(null,’coalesced’) = ‘coalesced’

  • COALESCE([null],’coalesced’) = [null]

  • COALESCE([‘’],’coalesced’) = [‘’]

  • COALESCE([],’coalesced’) = []

  • COALESCE(null,’coalesced’) = ‘coalesced’

  • COALESCE(1,’coalesced’) = 1

  • COALESCE(AVG([]),’coalesced’) = ‘coalesced’ // NaN coalesces.

  • // SIZE(1) // Invalid input: Error.

  • // SIZE(‘c’) // Invalid input: Error.

  • SIZE(null) = null

  • SIZE([]) = 0

  • SIZE([null]) = 1

  • SIZE([null,1]) = 2

  • SIZE([‘’]) = 1

  • SIZE([‘c’]) = 1

  • SIZE([‘c’,1]) = 2

  • function test() { IFRETURN(TRUE, ‘early’); ‘late’} test() = ‘early’

  • function test() { IFRETURN(FALSE, ‘early’); ‘late’} test() = ‘late’

  • function test() { IFRETURN(‘tRuE’, ‘early’); ‘late’} test() = ‘early’

  • function test() { IFRETURN(0, ‘early’); ‘late’} test() = ‘late’

  • function test() { IFRETURN(1, ‘early’); ‘late’} test() = ‘early’

  • function test() { IFRETURN(‘c’, ‘early’); ‘late’} test() = ‘late’

  • function test() { IFRETURN(null, ‘early’); ‘late’} test() = ‘late’

  • function test() { IFRETURN([], ‘early’); ‘late’} test() = ‘late’

  • function test() { IFRETURN([null], ‘early’); ‘late’} test() = ‘early’

  • function test() { IFRETURN([‘’], ‘early’); ‘late’} test() = ‘early’

  • function test() { IFRETURN([‘c’], ‘early’); ‘late’} test() = ‘early’

  • function test() { IFRETURN([‘c’,1], ‘early’); ‘late’} test() = ‘early’

  • function test() { IFRETURN([null], ‘early’); ‘late’} test() = ‘early’

  • function test() { IFRETURN([null,1], ‘early’); ‘late’} test() = ‘early’

  • SIZE(PARSE_KERNELS(‘0.5m,4h,1d,3d’)) = 4

  • SIZE(PARSE_KERNELS(‘4h, 1d’)) = 2

  • PARSE_KERNELS(‘1d’)[0].distribution = null

  • PARSE_KERNELS(‘1d-e’)[0].distribution != null // EXP is a distribution, not a string.

  • PARSE_KERNELS(‘1d,1d-e’) != null

  • PARSE_KERNELS(‘0.5m-u’) != null

  • PARSE_KERNELS(‘0.5m-U’) != null

  • TRUE // PARSE_KERNELS(‘0.5m-X’) = null // Returns failure-to-parse distribution ‘x’ warning.

  • FLOOR(null) = null

  • FLOOR(-1) = -1

  • FLOOR(-1.0) = -1

  • FLOOR(-1.01) = -2

  • TRUE // Fails: FLOOR(‘c’) // Warning: Null pointer exception.

  • TRUE // Fails: FLOOR([]) // Warning: Null pointer exception.

  • TRUE // Fails: FLOOR([1.1]) // Warning: Null pointer exception.

  • SET_PROPERTY(event, ‘timestamp’, 0); event.timestamp = 0 // Replaces with 0.

  • x = event.timestamp; SET_PROPERTY(event, ‘timestamp’, null); event.timestamp = x // Does nothing.

  • // Standard 1-parameter: categorical.

  • kernels=’1d’; TO_STRING(SUM_OPPORTUNITIES(‘c’)[0]) = ‘c-1d-e:1’

  • kernels=’1d’; TO_STRING(SUM_OPPORTUNITIES(‘c’)[0]) = ‘c-1d-e:1’

  • kernels=’1d’; TO_STRING(SUM_OPPORTUNITIES(‘c’)) = ‘[c-1d-e:1]’

  • kernels=’1d’; TO_STRING(SUM_OPPORTUNITIES(3)) = ‘[3-1d-e:1]’

  • kernels=’1d’; TO_STRING(SUM_OPPORTUNITIES(‘3’)) = ‘[3-1d-e:1]’

  • kernels=’1d’; TO_STRING(SUM_OPPORTUNITIES(3.0)) = ‘[3.0-1d-e:1]’

  • kernels=’1d’; TO_STRING(SUM_OPPORTUNITIES(TO_STRING(3.0))) = ‘[3-1d-e:1]’

  • kernels=’1d’; TO_STRING(SUM_OPPORTUNITIES(‘:-^%&’)) = ‘[:-^%&-1d-e:1]’

  • kernels=’1d’; TO_STRING(SUM_OPPORTUNITIES(‘::’)) = ‘[::-1d-e:1]’

  • // Standard 2-parameter: categorical and numerical inputs.

  • kernels=’1d’; TO_STRING(SUM_OPPORTUNITIES(‘c’,0)) = ‘[]’

  • kernels=’1d’; TO_STRING(SUM_OPPORTUNITIES(‘c’,1)) = ‘[c-1d-e:1]’

  • kernels=’1d’; TO_STRING(SUM_OPPORTUNITIES(‘c’,-2)) = ‘[c-1d-e:-2]’

  • // Manipulating the ‘kernels’ background variable.

  • kernels=’1d-t’; TO_STRING(SUM_OPPORTUNITIES(‘c’)) = ‘[c-1d-t:1]’

  • // kernels=’1d-x’; TO_STRING(SUM_OPPORTUNITIES(‘c’)) = null // Warning: Null pointer exception: ‘x’ is invalid

  • kernels=’1d-t,1d-u’; TO_STRING(SUM_OPPORTUNITIES(‘c’)) = ‘[c-1d-t:1, c-1d-u:1]’

  • // Edge cases with invalid and/or unexpected inputs.

  • kernels=’1d’; TO_STRING(SUM_OPPORTUNITIES(null)) = null

  • kernels=’1d’; TO_STRING(SUM_OPPORTUNITIES(‘’)) = null

  • kernels=’1d’; TO_STRING(SUM_OPPORTUNITIES([null])) = ‘[[null]-1d-e:1]’

  • kernels=’1d’; TO_STRING(SUM_OPPORTUNITIES([‘’])) = ‘[[]-1d-e:1]’

  • kernels=’1d’; TO_STRING(SUM_OPPORTUNITIES([‘c’,1])) = ‘[[c, 1]-1d-e:1]’

  • kernels=’1d’; TO_STRING(SUM_OPPORTUNITIES([])) = ‘[[]-1d-e:1]’

  • // kernels=’1d’; TO_STRING(SUM_OPPORTUNITIES(‘c’,null)) = null // Warning: Null pointer exception.

  • // kernels=’1d’; TO_STRING(SUM_OPPORTUNITIES(‘c’,’d’)) = null // Warning: Null pointer exception.

  • // kernels=’1d’; TO_STRING(SUM_OPPORTUNITIES(‘c’,[])) = null // Warning: Null pointer exception.

  • TRUE

  • // List subsetting and reversal.

  • [0,1,2][0] = 0

  • [0,1,2][[0]][[0]][0] = 0

  • // RANGE(0,3)[[0]][[0]][0] = 0 // Fails

  • // timeline.events[[0]][[0]][0] = timeline.events[0] // Fails

  • RANGE(0,3)[0] = 0

  • [0,10,20][[0,1,2]][0] = 0

  • [0,10,20][RANGE(0,3)][0] = 0

  • // RANGE(0,3)[[0,1,2]][0] = 0 // Fails

  • // RANGE(0,3)[RANGE(0,3)][0] = 0 // Fails

  • // timeline.events[[0,1,2]][0] = timeline.events[0] // Fails

  • [timeline.events[0]][0] = timeline.events[0]

  • LIST_REVERSE(RANGE(0,5)) = [4,3,2,1,0]

  • FILTER(LIST_REVERSE(RANGE(0,5)), (x) -> TRUE) = [4,3,2,1,0]

  • MAP(LIST_REVERSE(RANGE(0,5)), (x) -> x) = [4,3,2,1,0]

  • my_list = [0,1,2,3,4]; LIST_REVERSE(my_list)[my_list] = [4,3,2,1,0]

  • // LIST_REVERSE(timeline.events[RANGE(0,5)])[0] = timeline.events[4] // FAILS

  • // timeline.events[RANGE(0,5)][0] = timeline.events[0] // FAILS

  • // my_list = [0,1,2,3,4]; timeline.events[my_list][0] = timeline.events[0] // FAILS

  • // my_list = [0,1,2,3,4]; LIST_REVERSE(timeline.events)[my_list][0] = timeline.events[4] // FAILS

  • FILTER(LIST_REVERSE(timeline.events), (x) -> FALSE) = []

  • @return TRUE Returns TRUE */

function BUILT_IN_TESTS() { TRUE }

MEAN_IMPUTE /**

  • Replace all null values of my_list with the mean of the

  • non-null elements.

  • Examples:

  • MEAN_IMPUTE([null,null,2,4]) = [3,3,2,4]

  • @param my_list A list, e.g., [null,null,2,4].

  • @return my_imputed_list A list with nulls imputed using the mean. */

function mean_impute(my_list) { mean = AVG_NS(my_list); MAP(my_list, (x) -> COALESCE(x,mean)) }

BIN /**

  • Scale x into integers by dividing by width. To adjust location, do

  • pre-/post-adjustments: e.g., BIN() + location.

  • Examples:

  • SET_PROPERTY(‘timestamp’,1587772800000);

  • BIN(timestamp, 7*24*3600*1000) = 2625  // Weeks.
    
  • SET_PROPERTY(‘timestamp’,1587772800000);

  • BIN(timestamp, 24*3600*1000) = 18377  // Days.
    
  • SET_PROPERTY(‘timestamp’,1587772800000);

  • BIN(timestamp, 6*3600*1000) = 73508  // 6-hour blocks.
    
  • SET_PROPERTY(‘timestamp’,1587772800000);

  • BIN(timestamp, 3600*1000) = 441048  // Hours.
    
  • @param x Scalar.

  • @param width Scalar.

  • @return x_bin The integer bin that x falls into: FLOOR(x/width). */

function bin(x, width) { FLOOR(x / width) }

LIST_HISTOGRAM /**

  • Apply WHICH_BIN() to a list and use COUNT_UNIQ() to

  • tabulate the results into a histogram map.

  • Examples:

  • LIST_HISTOGRAM([1,1,2,4],[-2,0,3,10,40]) = {‘0’:3, ‘3’:1, ‘-2’:0, ‘10’:0, ‘40’:0}

  • rng = RNG(‘test’);

  • my_list = MAP(RANGE(0,100), (x) -&gt; rng.next);  // Numerical
    
  • bin_edges = RANGE_STEP(0,1.001,0.1);
    
  • LIST_HISTOGRAM(my_list,bin_edges)
    
  •   = {'0':5, '0.1':9, '0.2':13, '0.30000000000000004':12, '0.4':7, '0.5':12,
    
  •      '0.6000000000000001':9, '0.7000000000000001':13, '0.8':6, '0.9':14, '1':0}
    
  • @param my_list A list, e.g., [1,1,2,4].

  • @param bin_edges A pre-sorted (increasing) numeric list, e.g., [-2,0,3,10,40].

  • @return my_histogram A map of {bin_edge=count} including all histogram bins. */

function list_histogram(my_list, bin_edges) { obs_bin = MAP(my_list, (x) -> WHICH_BIN(x, bin_edges)); my_histogram = COUNT_UNIQ(STRING_LIST(obs_bin));

// Include bin_edges with zero counts? // Which bins are missing? missing_bins = FILTER(bin_edges, (x) -> IF(MAP_VALUE(my_histogram,TO_STRING(x)),0,1)); map_zeros = LISTS_TO_MAP(missing_bins, MAP(missing_bins, (x) -> 0)); my_histogram = MAP_UNION(my_histogram, map_zeros);

my_histogram }

AG_MIN_MAP /**

  • Return the key:value pair with the minimum value as

  • a string, e.g., num:2.

  • Examples:

  • my_map = JSON_TO_MAP(‘{“key1”:10,”key2”:2}’);

  • AG_MIN_MAP(my_map) = 'key2:2'
    
  • my_list = [1,1,2,2,3,3,3,3];

  • my_map = COUNT_UNIQ(INT_LIST(my_list));
    
  • result = AG_MIN_MAP(my_map);
    
  • (result = '1:2' OR result = '2:2')  // Random tiebreaker.
    
  • @param my_map A map, e.g., {cat=1, num=2}.

  • @return kv_string A key-value pair as a string, e.g., num:2. */

function ag_min_map(my_map) { // AG_MAX(my_list) // Works for lists. // AG_MAX(my_map) // Doesn’t work for maps.

map_string = MAP_TO_LIST(my_map, (k,v) -> CONCAT_WS(‘:’,k,v));

REDUCE(map_string, (x,z) -> IF(PARSE_KV_NUM(x) < PARSE_KV_NUM(z), x, z)) }

EVENT_HISTOGRAM /**

  • Given a list of timestamps, compute the HISTOGRAM().

  • causmos and offset determine the reference_time to use. All times

  • look back (e.g., seconds/days ago). Use negative time to look forward into the

  • future.

  • Examples:

  • // Inputs to Set

  • causmos = FALSE; // Which type of model? Event or treatment effect model?

  • index = 0; // Loops from 0 to SIZE(knots) - 2.
    
  • knots = [0, 7]; // In days.
    
  • offset = - 60 / (24*3600); // In seconds.
    
  • cumulative = FALSE; // Count always from knot[0] or knot[index] to knot[index+1]?
    
  • counts = FALSE; // Counts or time rates (divide by duration)?
    
  • // Pre-filter the events, extract timestamps.
    
  • filtered_timestamps = MAP(FILTER(timeline.events,
    
  •   (x) -&gt; x.type = type),  // Current record's type.
    
  •   (x) -&gt; x.timestamp);
    
  • // We set the offset forward in time and the type to include the current record's
    
  • // type, so we should find at least one record in the [0,7]-day window. We then
    
  • // set counts=FALSE, so we will compute a rate = 1 record over the 7 days = 1/7.
    
  • EVENT_HISTOGRAM(filtered_timestamps, index, knots, offset,
    
  •                 cumulative, counts, causmos) &gt;= 1/7
    
  • @param filtered_timestamps List of timestamps.

  • @param index Knot index/bin to be evaluated, e.g., from 0 to SIZE(knots) - 2.

  • @param knots List of knots to be considered, e.g., [0, 7] // Days.

  • @param offset Timestamp adjustment in days, e.g., 60 / (24*3600) for 60 seconds.

  • @param cumulative Count from knot[0] (TRUE) or knot[index] (FALSE) to knot[index+1]?

  • @param counts Counts (TRUE) or time rates / divide by duration (FALSE)?

  • @param causmos Use outcome.timestamp (TRUE) or timestamp (FALSE) for

  • the reference timestamp? Also, model type for feature: Event-level

  • (FALSE) or Causmos treatment model (TRUE)?)

  • @return y Scalar count or rate, depending on counts. */

function event_histogram(filtered_timestamps, index, knots, offset, cumulative, counts, causmos) { // Given a list of timestamps, compute the HISTOGRAM(). // causmos and offset determine the reference_time to use. // // causmos = FALSE; // Which type of model? Event or treatment effect model? // offset = 60 / (24*3600); // In days. // // index = 0; // Do loops from 0 to SIZE(knots) - 2 (Which bin to count?) // knots = [0, 7]; // In days. // cumulative_counts = FALSE; // Count always from knot[0] or knot[index] to knot[index+1]? // return_counts = FALSE; // Counts or time rates (divide by duration)?

millis_day = 2436001000;

// Which type of model? Event or treatment effect model? // Determine the source of and set the reference time for counting. reference_time = IF(causmos, outcome.timestamp, timestamp) - offset * millis_day;

// Reference time is ‘How long since this event?’ or ‘This event happened X days ago.’ events_relative_days = MAP(filtered_timestamps, (x) -> (reference_time - x)/millis_day);

HISTOGRAM(events_relative_days, index, knots, cumulative, counts) }

LEN /**

  • Return the number of elements of my_list. Alias for SIZE().

  • @param my_list A list.

  • @return length Integer number of elements in my_list. */

function len(my_list) { // Alias for SIZE(). SIZE(my_list) }

AG_MAX_MAP /**

  • Return the key:value pair with the maximum value as

  • a string, e.g., num:2.

  • Examples:

  • my_map = JSON_TO_MAP(‘{“key1”:10,”key2”:2}’);

  • AG_MAX_MAP(my_map) = 'key1:10'
    
  • my_list = [1,1,2,2,3,3,3,3];

  • my_map = COUNT_UNIQ(INT_LIST(my_list));
    
  • AG_MAX_MAP(my_map) = '3:4'
    
  • @param my_map A map, e.g., {cat=1, num=2}.

  • @return kv_string A key-value pair as a string, e.g., num:2. */

function ag_max_map(my_map) { // AG_MAX(my_list) // Works for lists. // AG_MAX(my_map) // Doesn’t work for maps.

map_string = MAP_TO_LIST(my_map, (k,v) -> CONCAT_WS(‘:’,k,v));

REDUCE(map_string, (x,z) -> IF(PARSE_KV_NUM(x) > PARSE_KV_NUM(z), x, z)) }

RESCALE /**

  • Transform x by location and scale parameters:

  • x * scale + location.

  • Examples:

  • RESCALE(5,1,2) = 11 // 5 * 2 + 1

  • @param x Scalar to be transformed.

  • @param location Scalar.

  • @param scale Scalar.

  • @return y x * scale + location. */

function rescale(x, location, scale) { x * scale + location }

AG_MIN_NS /**

  • Return the minimum value of my_list, ignoring nulls.

  • Examples:

  • AG_MIN_NS([null,null,2,2,2,4,4,4]) = 2

  • AG_MIN_NS(null) = null

  • AG_MIN_NS([null]) = null

  • AG_MIN_NS([‘’]) = null

  • AG_MIN_NS([‘c’]) = null

  • AG_MIN_NS([‘c’,1]) = 1

  • AG_MIN_NS([]) = null

  • @param my_list A list, e.g., [null,null,2,2,2,4,4,4].

  • @return min The minimum value of non-null elements of my_list. */

function ag_min_ns(my_list) { AG_MIN(DROP_NULLS(my_list)) }

RANDOM_LIST /**

  • Create a list with n draws from RANDOM().

  • Examples:

  • SIZE(RANDOM_LIST(5)) = 5

  • SUM(RANDOM_LIST(5)) > 0

  • SUM(RANDOM_LIST(5)) < 5

  • @param n Integer. Number of random draws.

  • @return random_list List with n draws from RANDOM(). */

function random_list(n) { MAP(RANGE(0,n), (x) -> RANDOM()) }

STRICTLY_BETWEEN /**

  • If start < x < end, return 1. Otherwise return 0.

  • Examples:

  • start = 700; end = -700; // Days

  • reference_time = MILLIS(TO_DATETIME('2022-01-01 9:00:00.000'));
    
  • // Replace this expression...
    
  • IF(reference_time - end*24*3600*1000 &lt; timestamp
    
  •    AND timestamp &lt; reference_time - start*24*3600*1000, 1, 0)
    
  • =
    
  • // With this more readable expression using the UDF.
    
  • BETWEEN(reference_time - timestamp,
    
  •         start*24*3600*1000,
    
  •         end*24*3600*1000)
    
  • @param x Scalar.

  • @param start Scalar.

  • @param end Scalar.

  • @return is_x_strictly_between IF(start < x < end, 1, 0). */

function strictly_between(x, start, end) { IF(start < x AND x < end, 1, 0) }

SPLINE /**

  • j^th-order polynomial spline at knot x_k: (x-x_k)^j * 1(x>x_k).

  • Example:

  • SPLINE(20, 40, 0) = 0 // Knot at 40.

  • SPLINE(20, 40, 1) = 0 // Knot at 40.

  • SPLINE(40, 40, 1) = 0 // Knot at 40.

  • SPLINE(50, 40, 0) = 1 // Knot at 40.

  • SPLINE(50, 40, 1) = 10 // Knot at 40.

  • SPLINE(50, 40, 2) = 100 // Knot at 40.

  • @param x Scalar.

  • @param knot Scalar knot/breakpoint/kink in the piecewise spline.

  • @param order Order/degree of polynomial spline term.

  • @return y Value of the spline, given x, knot, and order. */

function spline(x, knot, order) { // j^th-order polynomial spline at knot x_k: (x-x_k)^j * 1(x>x_k) IF(x > knot, POW(x - knot, order), 0) }

LIST_HIST /**

  • Use LIST_HISTOGRAM() with num_bins, setting the bin

  • range to the minimum and maximum values of my_list.

  • Examples:

  • LIST_HIST([1,1,2,4],4) = {‘1’:2, ‘2’:1, ‘3’:0, ‘4’:1}

  • LIST_HIST([1.0,1,2.0,4],4) = {‘1’:2, ‘2’:1, ‘3’:0, ‘4’:1}

  • @param my_list A list, e.g., [1,1,2,4].

  • @param num_bins Number of histogram bins to create.

  • @return my_histogram A map of {bin_edge=count} for each histogram bin. */

function list_hist(my_list, num_bins) { // Evenly spaced num_bins between min and max of ‘my_list’. min_edge = AG_MIN(my_list); max_edge = AG_MAX(my_list);

edges = RANGE_N(min_edge,max_edge,num_bins);

my_histogram = LIST_HISTOGRAM(my_list, edges); my_histogram }

LIST_CONCAT /**

  • Concatenate the elements of my_list.

  • Examples:

  • LIST_CONCAT([‘3’,’2’,’1’]) = ‘321’

  • LIST_CONCAT(MAP([3,2,1],(y) -> TO_INT(y))) = ‘321’

  • LIST_CONCAT([‘’]) = ‘’

  • LIST_CONCAT(null) = null

  • @param my_list List to be concatenated.

  • @return my_list_concat String with the elements of my_list concatenated. */

function list_concat(my_list) { REDUCE(MAP(my_list, (x) -> TO_STRING(x)), (x,y) -> CONCAT(x, y)) }

DIV2 /**

  • given an integer, long, or floating point number, return the value / 2

  • @param x Number

  • @return Double */

function div2(x) { x / 2 }

Query-level UDFs

Query-level UDFs can be attached to a query via select(…).udf(src1, src2, ...) and will be available to any expression in the query, but not beyond.

udf_src = '''
function is_zero(val, epsilon) {
  ABS(val) < epsilon 
}
'''
select('is_zero(0.00062, .01)', 'is_zero(0.62, .01)').from_events(1).limit(3).udf(udf_src)

Query results:

partition "_default"
_c0 _c1
0 true false
1 true false
2 true false
query produced 3 rows x 2 columns in 0.42 seconds

Inline expression UDFs

UDFs can also be placed in any expression, and will only be available for the scope of that expression:

select('function say_hello(name) { concat("Hello, ", name, "!") } say_hello("Bob")').from_events(1).limit(3)

Query results:

partition "_default"
_c0
0 Hello, Bob!
1 Hello, Bob!
2 Hello, Bob!
query produced 3 rows x 1 columns in 0.32 seconds