# The TQL Expression Language
The expression language is the core of TQL's data processing engine. Expressions are written as
Python strings and express a one-to-one transformation between an input timeline event object and
a piece of output data. These transformations form the building blocks of all TQL data processing
queries.
The language has strong syntactic similarities to SQL (basic functions, record-level structure)
and Python (variables, list operations/aggregations, lambda/inline functions, user-defined
functions). However, the expression language has several advantages:
- **Performant**: TQL Expressions are compiled into Java byte code and hence, are very fast.
- **Lazily Evaluated**: TQL Expressions are applied directly to raw timelines and events at
query time, meaning all datasets are dynamically extracted from Timelines, eliminating the need
for hard-to-maintain ETL pipelines.
- **Built for ML Model Evaluation**: Trained ML Models deployed using TQL also store all the
expressions expressions used to transform raw events into the training dataset, meaning feature
extraction at scoring time is already built into the model.
TQL Expressions only exist in the context of a project and query. For this guide, we will the
demo project **lethe4** to learn the language. Lethe represents a typical online advertising
funnel, with a set of online ad auctions `bid`, a set of website activity logs `activity`, and
static user demographics `user`. Each Timeline represents the total of all observed events for
a particular user. For this guide we will be manipulating data primarily from the `bid` TimeSeries.
Further reference:
**Expression language cheat sheet**, and the
full [**expression function reference**](functions).
## Literals
TQL accepts string, boolean, integer, decimal, array, and null literals. For example:
```python
from zeenk.tql import *
select(
'"Hello World!"',
1,
1.3,
True,
"[1,2,3,4]",
None
).from_events('lethe4').limit(3)
```
Query results:
partition "_default"
|
_c0 |
_c1 |
_c2 |
_c3 |
_c4 |
_c5 |
0 |
Hello World! |
1 |
1.3 |
true |
[1,2,3,4] |
None |
1 |
Hello World! |
1 |
1.3 |
true |
[1,2,3,4] |
None |
2 |
Hello World! |
1 |
1.3 |
true |
[1,2,3,4] |
None |
query produced 3 rows x 6 columns in 0.69 seconds
## Event Attributes
In TQL expressions, attributes of timeline events are addressable as variables. For example, the lethe4
Project is composed of the `bid` TimeSeries, which contains the attributes `ad_size`, `bid`, `event_time`,
`ghosted`, `request_id`, `user_id`, and `won`. These attributes are directly selectable using the
expression language as variables, for example:
```python
select(
'bid.ad_size',
'bid.bid',
'bid.ghosted',
'bid.request_id',
'bid.user_id',
'bid.won'
).from_events('lethe4').limit(3)
```
Query results:
partition "_default"
|
_c0 |
_c1 |
_c2 |
_c3 |
_c4 |
_c5 |
0 |
None |
None |
None |
None |
None |
None |
1 |
big |
1.0 |
false |
r13260 |
u252 |
true |
2 |
big |
1.0 |
false |
r13376 |
u252 |
true |
query produced 3 rows x 6 columns in 0.36 seconds
## Booleans and Logical Operators
The expression language contains the case-insensitive primitives `true` and `false`. Non-null, non-empty,
non-zero values are also considered "truthy" and will be intepreted as true in a boolean context, such
as in a where clause, The expression language has the native boolean operators `AND` and `OR` to express
compound conditionals. These may be used in isolation, or in conjunction with the `IF()` function to
emit conditional values in columns.
```python
select(
'true', #primitives of true
'FaLsE', #case insensitive primitives
'1 OR "non-empty string!"', # "truthy" values with compound conditionals
'0 AND ""', # "falsey" values
'IF(bid.ad_size == "big", "HUGE", "default")' #the IF function
).from_events('lethe4').limit(3)
```
Query results:
partition "_default"
|
_c0 |
_c1 |
_c2 |
_c3 |
_c4 |
0 |
true |
false |
true |
false |
default |
1 |
true |
false |
true |
false |
HUGE |
2 |
true |
false |
true |
false |
HUGE |
query produced 3 rows x 5 columns in 0.63 seconds
## TQL Functions
TQL provides a rich set of functions for transforming event attributes into pieces of derived data,
which will be covered in the following sections.
## Function Reference
The expression function reference can be found [here](functions). Additionally, you can look up
expression function usage directly in your Python code using the `helper` object.
`helper.find_function('search_term')` lists functions matching the given search term:
```python
helper.find_function('LOG')
```
['LOG', 'LOG10', 'LOG1P', 'LOG2', 'NATLOG']
`heper.function_usage('function_name')` displays the usage of any function:
```python
helper.function_usage('LOG')
```
Usage: |
LOG(number, base) |
description |
Return the logarithm of a {number} for a given {base}. Any null, a base of 0, or a base of 1 will return null. |
parameters |
number: Double
base: Double |
return_type |
Double |
function_type |
native |
## Mathematical Operators
The Expression Language contains a basic set of mathematical operators and functions to cover basic
numerical transformations.
```python
select(
'1 * (3 + 4) / 2 % 2',
'EXP(1)',
'LOG(23, 2)',
'PI()',
'SIN(45)',
).from_events('lethe4').limit(3)
```
Query results:
partition "_default"
|
_c0 |
_c1 |
_c2 |
_c3 |
_c4 |
0 |
1.5 |
2.718281828459045 |
4.523561956057013 |
3.141592653589793 |
0.8509035245341184 |
1 |
1.5 |
2.718281828459045 |
4.523561956057013 |
3.141592653589793 |
0.8509035245341184 |
2 |
1.5 |
2.718281828459045 |
4.523561956057013 |
3.141592653589793 |
0.8509035245341184 |
query produced 3 rows x 5 columns in 0.71 seconds
## Probabilibity Distributions
```python
select(
#computes the probability density using the specified, distribution, X value, and scale
'PDF("exp", 1, 1.0)',
#computes the cumulative distribution using the specified, distribution, X value, and scale
'CDF("exp", 1, 1.0)',
#computes the cumulative distribution using the specified, distribution, X value, and scale
'iCDF("exp", .95, 1.0)',
).from_events('lethe4').limit(3)
```
Query results:
partition "_default"
|
_c0 |
_c1 |
_c2 |
0 |
0.36787944117144233 |
0.6321205588285577 |
2.99573227355399 |
1 |
0.36787944117144233 |
0.6321205588285577 |
2.99573227355399 |
2 |
0.36787944117144233 |
0.6321205588285577 |
2.99573227355399 |
query produced 3 rows x 3 columns in 0.68 seconds
## Generating Random Numbers
```python
select(
#generate a random number
'RANDOM()',
#generate a random number from the given distribution
'SAMPLE("exp")',
#create a random number generator from a known seed that can be reused multiple times in an expression.
'rng = RNG("my_seed"); [rng.next, rng.next]',
).from_events('lethe4').limit(3)
```
Query results:
partition "_default"
|
_c0 |
_c1 |
0 |
[0.028381221364210618,0.2602691934427571] |
1.5990840325478177 |
1 |
[0.028381221364210618,0.2602691934427571] |
0.23555180093113778 |
2 |
[0.028381221364210618,0.2602691934427571] |
3.4662128831513432 |
query produced 3 rows x 2 columns in 0.31 seconds
## Hashing
```python
select(
'HASH("foo")',#hash the input to a number using java hashcode
'MD5("foo")', #compute the string MD5 sum of the input
'MD5_LONG("foo")',#compute the long MD5 sum of the input
).from_events('lethe4').limit(3)
```
Query results:
partition "_default"
|
_c0 |
_c1 |
_c2 |
0 |
101574 |
ACBD18DB4CC2F85CEDEF654FCCC4A4D8 |
1301710373874522920 |
1 |
101574 |
ACBD18DB4CC2F85CEDEF654FCCC4A4D8 |
1301710373874522920 |
2 |
101574 |
ACBD18DB4CC2F85CEDEF654FCCC4A4D8 |
1301710373874522920 |
query produced 3 rows x 3 columns in 0.69 seconds
## String Manipulation
The Expression Language provides the ability to perform string manipulation operations commonly needed
for transforming categorical features:
`CONCAT()`, `UPPER()`, `LOWER()`, `COALESCE()`, `STRING_SPLIT()`, `REGEX_EXTRACT()`, `REGEX_REPLACE()`
```python
select(
'bid.ad_size', #base attribute of ad size
'UPPER(bid.ad_size)', #upper case
'lower(bid.ad_size)', #lower case
'COALESCE(bid.ad_size, "UNKNOWN")', #coalesce nulls
'STRING_SPLIT(bid.ad_size, "i")', #split strings into arrays
'REGEX_EXTRACT(bid.ad_size, "[a-zA-Z0-9]", 0)', #extract a regex group
'REGEX_REPLACE(bid.ad_size, ["b", "_", "i", "xx"])', #replace characters in a string
'TO_STRING(1.3)', #cast anything to a string
'bid.ad_size == "BIG"' #string equality
).from_events('lethe4').limit(3)
```
## Date/Time Manipulation
TODO`NOW()`, `DATE()`, `TO_DATETIME()`, `DAY_OF_WEEK()`, `MILLIS()`, `FROM_UNIXTIME()`
```python
select(
'NOW()', #returns the current time in milliseconds
'DATE(NOW())', #parses the input into a date string
'TO_DATETIME("2021-03-21 12:51:00")', #parses the input string or number into a datetime object
'DAY_OF_WEEK(TO_DATETIME("2021-03-21"))', #emits the day of the week of the input in the range of [1,7]
'MILLIS(TO_DATETIME("2021-03-21"))', #emits the epoch milliseconds of the input
'FROM_UNIXTIME(NOW(), "yyyy/MM/dd")' #formats the input epoch timestamp using Joda date format.
).from_events('lethe4').limit(3)
```
Query results:
partition "_default"
|
_c0 |
_c1 |
_c2 |
_c3 |
_c4 |
_c5 |
0 |
1630353837028 |
2021-08-30 |
2021-03-21T12:51:00.000Z |
7 |
1616284800000 |
2021/08/30 |
1 |
1630353837028 |
2021-08-30 |
2021-03-21T12:51:00.000Z |
7 |
1616284800000 |
2021/08/30 |
2 |
1630353837028 |
2021-08-30 |
2021-03-21T12:51:00.000Z |
7 |
1616284800000 |
2021/08/30 |
query produced 3 rows x 6 columns in 0.36 seconds
## Array Manipulation
Arrays first class citizens of TQL. Arrays can be defined using bracket notation `[obj1, obj2, ...]`
individual data from an array can be addressed using brackets array[idx].
```python
select(
'RANGE(0, 5)', #generate an array of numbers in the specified range
'[1,2,3,4][3]' #get the nth element of the array
'SORT([6, 1, 234, 2])', #sort the array
'SIZE([3, 4, 58])', #find the size of the array
'MIN([3, 4, 58])', #find the min value of the array
'MAX([3, 4, 58])', #find the max value of the array
'AVG([3, 4, 58])', #find the average value of the array
'STDEV([3, 4, 58])', #take the standard deviation of the array
'MAP([1, 2, 3, 4], (x) -> x*2)', #transform an array to another array
'FILTER([1, 2, 3, 4], (x) -> x > 3)', #filter an array to a smaller array
'REDUCE(["a", "b", "c"], (x,y) -> CONCAT(x, y))', #combine an array to a single value
).from_events('lethe4').limit(3)
```
Query results:
partition "_default"
|
_c0 |
_c1 |
_c2 |
_c3 |
_c4 |
_c5 |
_c6 |
_c7 |
_c8 |
_c9 |
_c10 |
0 |
[2,4,6,8] |
abc |
[4] |
[0,1,2,3,4] |
[1,2,6,234] |
3.0 |
3 |
58 |
21.666666666666668 |
25.69478978746902 |
4 |
1 |
[2,4,6,8] |
abc |
[4] |
[0,1,2,3,4] |
[1,2,6,234] |
3.0 |
3 |
58 |
21.666666666666668 |
25.69478978746902 |
4 |
2 |
[2,4,6,8] |
abc |
[4] |
[0,1,2,3,4] |
[1,2,6,234] |
3.0 |
3 |
58 |
21.666666666666668 |
25.69478978746902 |
4 |
query produced 3 rows x 11 columns in 0.62 seconds
## Array Manipulation of Timeline Events
special variable `timeline.events` is also an array, and is commonly transformed using the array manipulation
functions. For example, we can extract the number of small ad size bids we made for each timeline:
```python
select(
'SIZE(FILTER(timeline.events, (x) -> x.type == "bid" and x.bid.ad_size == "small"))',
).from_events('lethe4').limit(3)
```
Query results:
partition "_default"
|
_c0 |
0 |
221.0 |
1 |
221.0 |
2 |
221.0 |
query produced 3 rows x 1 columns in 0.37 seconds
## Local Variables and Multi-Statement Expressions
Long transformations can be broken up into multiple statements, delimited by a `;` for readability. For example:
```python
select('''
positive_value=1.0;
negative_value=-1.0;
ad_size=COALESCE(UPPER(bid.ad_size), 'UNKNOWN');
if(bid.ad_size == 'BIG', positive_value, negative_value)
''').from_events('lethe4').limit(3)
#In multi-statement expressions, the last statement in the expression must return a value.
```
Query results:
partition "_default"
query produced 3 rows x 1 columns in 0.29 seconds
## The PREDICT Function
Models are first-class citizens of TQL, and are accessible via the native function `PREDICT()`. Models are
published into TQL with the column expressions used to produce the features, and because TQL expressions
are applied to timeline events only when a query is executed, TQL can create result sets with model predictions
as well as other attributes or metrics. For example:
```python
select(
'bid.ad_size',
'PREDICT("my_model")'
).from_events('lethe4').limit(3)
```
Since the output of `PREDICT()` is simply a floating point number, it can be further manipulated using the
expression language. Multiple models can even be composed together to compute derived values. For example,
perhaps you would like to join together your predictions of winrate and conversion rate:
```python
select(
"PREDICT('winrate_model') * PREDICT('coversion_rate_model')"
).from_events('lethe4').limit(3)
```
Models must be previously published into TQL. Refer to [Model Training and Publishing](modeling) guide
for more information on this topic.
## The SET_PROPERTY Function
The native function `SET_PROPERTY()`can be used to temporarily change an attribute of the input event for the
rest of the execution of the current expression. For example:
```python
select(
'bid.ad_size',
'SET_PROPERTY("bid.ad_size", "tiny"); bid.ad_size'
).from_events('lethe4').limit(3)
```
Query results:
partition "_default"
|
_c0 |
_c1 |
0 |
None |
tiny |
1 |
big |
tiny |
2 |
big |
tiny |
query produced 3 rows x 2 columns in 0.28 seconds
This functionality is powerful in conjunction with the [Model Predict Function](#Model-Predict-Function), allowing
you to easily write queries that make counterfactual predictions. For example:
```python
select(
'bid.ad_size',
'SET_PROPERTY("bid.ad_size", "tiny"); bid.ad_size'
'PREDICT("my_winrate_model")',
'SET_PROPERTY("bid.ad_size", "tiny"); PREDICT("my_winrate_model")'
).from_events('lethe4').limit(3)
```
## Comments
Expressions may contain comments. Both inline and multi-line comments are supported:
```python
select('''
ad_size = bid.ad_size; //this is a comment
/*
this is a comment...
...on multiple lines
*/
ad_size
''').from_events('lethe4').limit(3)
```
Query results:
partition "_default"
query produced 3 rows x 1 columns in 0.3 seconds
## User Defined Functions (UDFs)
In addition to TQL's library of built-in functions, the user can write additional functions to extend the
language's capabilities with custom behavior. Functions are defined with the syntax `function(...){ ... }`.
A function may also provide an optional doc string preceding it. For example:
```python
select(
'clean_ad_size()'
).from_events('lethe4').udf('''
function clean_ad_size() {
COALESCE(UPPER(bid.ad_size), 'UNKNOWN')
}
''').limit(3)
```
Query results:
partition "_default"
|
_c0 |
0 |
UNKNOWN |
1 |
BIG |
2 |
BIG |
query produced 3 rows x 1 columns in 0.27 seconds
Just like regular expressions, UDFs may contain multiple statements, delimited by a `;`, in which case the last
statement must return a value. UDFs may also define input arguments. For example:
```python
select(
'circumference(3)'
).from_events('lethe4').udf('''
function circumference(r) {
diameter = 2 * r;
circ = diameter * PI();
circ
}
''').limit(3)
```
Query results:
partition "_default"
|
_c0 |
0 |
18.84955592153876 |
1 |
18.84955592153876 |
2 |
18.84955592153876 |
query produced 3 rows x 1 columns in 0.26 seconds
Read full guide to writing user-defined functions [here](user-defined-functions).
## Compile and Runtime Error Handling
If an expression fails to compile, an error will be shown when trying to submit the query.
```python
#For example, 'blah' has no meaning here:
select('blah').from_events('lethe4')
```
Encountered 1 expression compilation error: Line 1:0 Unknown variable '@blah' (in column _c0)
Expression error: Line 1:0 Unknown variable '@blah' (in column _c0)
blah
^
---------------------------------------------------------------------------
TQLAnalysisException Traceback (most recent call last)
~/dev/.virtualenvs/noumena-dev/lib/python3.7/site-packages/IPython/core/formatters.py in __call__(self, obj)
343 method = get_real_method(obj, self.print_method)
344 if method is not None:
--> 345 return method()
346 return None
347 else:
~/dev/nanigans/noumena/noumena-python/noumena/noumena/tql/query.py in _repr_html_(self)
893
894 def _repr_html_(self):
--> 895 return self.submit(interactive=self._interactive)._repr_html_()
896
897 def __iter__(self):
~/dev/nanigans/noumena/noumena-python/noumena/noumena/tql/query.py in submit(self, interactive, wait, print_payloads, analyze, spark)
779 from noumena.tql import icarus
780 start = time.time()
--> 781 submit_reply = icarus.post('query/submit', self.json(), print_json=print_payloads)
782 if interactive:
783 if self.spec.get('allow_invalid_column_expressions') is None:
~/dev/nanigans/noumena/noumena-python/noumena/noumena/tql/icarus.py in post(url, json_data, files_data, data, print_json)
14 print(json.dumps(json_data))
15 resp = requests.post(url, data=data, json=json_data, files=files_data)
---> 16 return _handle_response(url, resp, print_json=print_json)
17
18
~/dev/nanigans/noumena/noumena-python/noumena/noumena/tql/icarus.py in _handle_response(url, resp, print_json)
70 else:
71 print(reason)
---> 72 raise TQLAnalysisException(reason)
73 else:
74 msg = ''
TQLAnalysisException: Encountered 1 expression compilation error: Line 1:0 Unknown variable '@blah' (in column _c0)Expression error: Line 1:0 Unknown variable '@blah' (in column _c0)
blah
^
Query
```python
#In this example, the `LOG` function requires two operands, the input number and the log base to use:
select('log(3)').from_events('lethe4')
```
Encountered 1 expression compilation error: Line 1:0 LOG function needs 2 operands (in column _c0)
Expression error: Line 1:0 LOG function needs 2 operands (in column _c0)
log(3)
^
---------------------------------------------------------------------------
TQLAnalysisException Traceback (most recent call last)
~/dev/.virtualenvs/noumena-dev/lib/python3.7/site-packages/IPython/core/formatters.py in __call__(self, obj)
343 method = get_real_method(obj, self.print_method)
344 if method is not None:
--> 345 return method()
346 return None
347 else:
~/dev/nanigans/noumena/noumena-python/noumena/noumena/tql/query.py in _repr_html_(self)
893
894 def _repr_html_(self):
--> 895 return self.submit(interactive=self._interactive)._repr_html_()
896
897 def __iter__(self):
~/dev/nanigans/noumena/noumena-python/noumena/noumena/tql/query.py in submit(self, interactive, wait, print_payloads, analyze, spark)
779 from noumena.tql import icarus
780 start = time.time()
--> 781 submit_reply = icarus.post('query/submit', self.json(), print_json=print_payloads)
782 if interactive:
783 if self.spec.get('allow_invalid_column_expressions') is None:
~/dev/nanigans/noumena/noumena-python/noumena/noumena/tql/icarus.py in post(url, json_data, files_data, data, print_json)
14 print(json.dumps(json_data))
15 resp = requests.post(url, data=data, json=json_data, files=files_data)
---> 16 return _handle_response(url, resp, print_json=print_json)
17
18
~/dev/nanigans/noumena/noumena-python/noumena/noumena/tql/icarus.py in _handle_response(url, resp, print_json)
70 else:
71 print(reason)
---> 72 raise TQLAnalysisException(reason)
73 else:
74 msg = ''
TQLAnalysisException: Encountered 1 expression compilation error: Line 1:0 LOG function needs 2 operands (in column _c0)Expression error: Line 1:0 LOG function needs 2 operands (in column _c0)
log(3)
^
Query
```python
#In this example, we simply forgot a closing parenthesis:
select('log(3').from_events('lethe4')
```
Encountered 1 expression compilation error: Line 1:5 missing ')' at '' (in column _c0)
Expression error: Line 1:5 missing ')' at '' (in column _c0)
log(3
-----^
---------------------------------------------------------------------------
TQLAnalysisException Traceback (most recent call last)
~/dev/.virtualenvs/noumena-dev/lib/python3.7/site-packages/IPython/core/formatters.py in __call__(self, obj)
343 method = get_real_method(obj, self.print_method)
344 if method is not None:
--> 345 return method()
346 return None
347 else:
~/dev/nanigans/noumena/noumena-python/noumena/noumena/tql/query.py in _repr_html_(self)
893
894 def _repr_html_(self):
--> 895 return self.submit(interactive=self._interactive)._repr_html_()
896
897 def __iter__(self):
~/dev/nanigans/noumena/noumena-python/noumena/noumena/tql/query.py in submit(self, interactive, wait, print_payloads, analyze, spark)
779 from noumena.tql import icarus
780 start = time.time()
--> 781 submit_reply = icarus.post('query/submit', self.json(), print_json=print_payloads)
782 if interactive:
783 if self.spec.get('allow_invalid_column_expressions') is None:
~/dev/nanigans/noumena/noumena-python/noumena/noumena/tql/icarus.py in post(url, json_data, files_data, data, print_json)
14 print(json.dumps(json_data))
15 resp = requests.post(url, data=data, json=json_data, files=files_data)
---> 16 return _handle_response(url, resp, print_json=print_json)
17
18
~/dev/nanigans/noumena/noumena-python/noumena/noumena/tql/icarus.py in _handle_response(url, resp, print_json)
70 else:
71 print(reason)
---> 72 raise TQLAnalysisException(reason)
73 else:
74 msg = ''
TQLAnalysisException: Encountered 1 expression compilation error: Line 1:5 missing ')' at '' (in column _c0)Expression error: Line 1:5 missing ')' at '' (in column _c0)
log(3
-----^
Query
However, if an expression compiles successfully but fails during evaluation, then the result for that row will
be null. In this example, `upper(bid.ad_size)` returns null because some events rows do not have ad size
defined at all.
```python
select(
'bid.ad_size',
'upper(bid.ad_size)'
).from_events('lethe4').limit(3)
```
Query results:
partition "_default"
|
_c0 |
_c1 |
0 |
None |
None |
1 |
big |
BIG |
2 |
big |
BIG |
query produced 3 rows x 2 columns in 0.29 seconds
## Expression Language Cheat Sheet

## Where To Next?
Browse the full [**expression function reference**](functions).