# Quick Start
This tutorial provides a basic overview of TQL. In this guide we will first build Timelines from a set of online
advertising auctions and a set of user site activity data, then write a TQL query that produces a dataset we can
use to train a simple model that predicts probability of winning a bid based on ad size, and finally deploy that
model to make batch and realtime predictions.
#### Table of Contents
```{contents}
:depth: 3
```
In order to follow along with this guide, first install TQL using the instructions provided on the
[TQL Homepage](overview.html#installation). Afterwords, we recommend
reading the [guide to creating timelines](timelines), the [manual on TQL queries](queries), and on how to
use the [TQL expression language](expressions). Additionally, for useful bakground on TQL's design and data
processing model, refer to the [TQL Concepts and Architecture Guide](concepts).
## Starting the TQL Backend
After installing TQL, start the backend from the command line using `tql start`. Learn more about configuring
the TQL backend [here](configuration).
## Importing TQL
Always start your work by importing TQL into your namespace. If you did not previously start the backend from the
command line, one will be automatically started upon import.
```python
from zeenk.tql import *
```
----------------------------------------------------------------
Version: 20.1.17-SNAPSHOT
Version Timestamp: 2021-08-31 17:30:26
Version Age: 3 days, 21 minutes, 5 seconds
Filesystem Root: /Users/clin/.tql/files
Working Directory: /Users/clin/.tql
Configuration File: /Users/clin/.tql/tql-conf.yml
Api Gateway: http://localhost:9000
Service Status: Icarus: ONLINE, Daedalus: ONLINE
Service Uptime: 1 days, 6 hours, 6 minutes, 29 seconds
----------------------------------------------------------------
## Creating Timelines
The first step of any TQL project is to map time series data from your data lake into TQL. We will use online
advertising log data generated with TQL's [data-simulator](data-simulator) package. Every TimeSeries must
have a source of data, either from flat files or from an [SQL store](timelines#Loading-From-an-External-SQL-Store),
as well as a timestamp and id field, passed to `identified_by(id_col, ts_col)`. Here we create two TimeSeries
objects, from a source of advertising bid logs, and a source of site activity logs, some of which may have been
caused by the ads. We then use the `create_project()` and `from_timeseries()` functions to attach them to a project
named `tql_quickstart`, and the `build_timelines()` function to pull the raw data into timelines.
```python
from zeenk.tql.demo_projects import get_project4_files_dir
# this directory contains the time series data files for demo project 4.
base_dir = get_project4_files_dir()
ad_auctions = create_timeseries('bid')
ad_auctions.from_files(f'{base_dir}/bid*.csv', has_csv_header=True)
ad_auctions.identified_by('user_id', 'event_time')
site_activity = create_timeseries('activity')
site_activity.from_files(f'{base_dir}/activity*.csv', has_csv_header=True)
site_activity.identified_by('user_id', 'event_time')
project = create_project('tql_quickstart', or_update=True)
project.from_timeseries(ad_auctions, site_activity)
project.build_timelines()
```
Click to Abort Job
NOTE: Abort link will open the result of the call in a new tab/window
SUCCESS: 230/230|██████████|, TOTAL ELAPSED=0:00:23, WAIT=0:00:00, PENDING=0:00:02, COMPILING=0:00:21
Timeline Statistics
0 | 1 |
Total Timelines |
993 |
Smallest Timeline |
5 events |
Largest Timeline |
903 events |
Min Timestamp |
2021-01-01 00:00:01 |
Max Timestamp |
2021-01-14 23:59:59 |
Total Events |
197393 |
Activity Events |
25416 |
Bid Events |
171977 |
Attributes
TQL Expression | Description | Type |
activity.activity_id |
None |
string |
activity.conversion |
None |
string |
activity.event_time |
None |
string |
activity.user_id |
None |
string |
bid.ad_size |
None |
string |
bid.bid |
None |
double |
bid.event_time |
None |
string |
bid.ghosted |
None |
boolean |
bid.request_id |
None |
string |
bid.user_id |
None |
string |
bid.won |
None |
boolean |
duration |
The duration of the event. Typically used with generated events. |
long |
event |
The current event object |
event |
generated |
This event was created by generate_events() |
boolean |
generated_weight |
The weight to apply to the the sample |
double |
id |
The auto-generated event id |
long |
index |
The index of the current event in the timeline |
integer |
outcome |
The outcome event. Only used in conjunction with SUM_OPPORTUNITIES() and CAUSAL_EFFFCT |
event |
timeline.events |
Time-ordered array of events |
array |
timeline.id |
Shared timeline id |
string |
timestamp |
The timestamp of the event |
long |
type |
The event type of this event |
string |
## Introduction to Queries
### Creating and executing queries
Queries are created by chaining together `select()`, `from()`, and various other operators, ending with a call to
`submit()`, which returns a TQL ResultSet. For example:
```python
resultset = select('bid.*').from_events('tql_quickstart').limit(5).submit()
resultset
```
Query results:
partition "_default"
|
bid_ad_size |
bid_bid |
bid_event_time |
bid_ghosted |
bid_request_id |
bid_user_id |
bid_won |
0 |
big |
1.0 |
2021-01-02 00:02:36 |
false |
r13260 |
u252 |
true |
1 |
big |
1.0 |
2021-01-02 00:16:41 |
false |
r13376 |
u252 |
true |
2 |
small |
1.0 |
2021-01-02 00:30:55 |
false |
r13509 |
u252 |
true |
3 |
None |
None |
None |
None |
None |
None |
None |
4 |
small |
1.0 |
2021-01-02 01:04:01 |
false |
r13737 |
u252 |
true |
query produced 5 rows x 7 columns in 0.28 seconds
### Interactive Query Execution in Jupyter
In a Jupyter notebook environment, if a Query is the last line of code in a cell when it is evaluated, `submit()`
will automatically be called, and the resulting dataset will be displayed as the output. In Jupyter, the above
query/resultset code is identical to the following, simpler code:
```python
select('bid.*').from_events('tql_quickstart').limit(5)
```
Query results:
partition "_default"
|
bid_ad_size |
bid_bid |
bid_event_time |
bid_ghosted |
bid_request_id |
bid_user_id |
bid_won |
0 |
big |
1.0 |
2021-01-02 00:02:36 |
false |
r13260 |
u252 |
true |
1 |
big |
1.0 |
2021-01-02 00:16:41 |
false |
r13376 |
u252 |
true |
2 |
small |
1.0 |
2021-01-02 00:30:55 |
false |
r13509 |
u252 |
true |
3 |
None |
None |
None |
None |
None |
None |
None |
4 |
small |
1.0 |
2021-01-02 01:04:01 |
false |
r13737 |
u252 |
true |
query produced 5 rows x 7 columns in 0.12 seconds
### The TQL Data Processing Model
TQL ResultSets contain ***exactly one row per event***, and ***exactly one column per TQL expression*** defined
in the `select()` clause.

In the following query, the expression `bid.ad_size` will contain null values for some rows, because the timelines
contain both `bid` and `activity` events. The type of the row can be retrieved with the special event attribute `type`.
```python
select('type', 'bid.ad_size', 'activity.conversion').from_events('tql_quickstart').limit(5)
```
Query results:
partition "_default"
|
_c0 |
_c1 |
_c2 |
0 |
bid |
big |
None |
1 |
bid |
big |
None |
2 |
bid |
small |
None |
3 |
activity |
None |
product_view |
4 |
bid |
small |
None |
query produced 5 rows x 3 columns in 0.09 seconds
## The TQL Expression Language
The `select(...)` operator takes one or more TQL expression strings, corresponding to columns in the output
ResultSet. TQL expressions are a written in a custom DSL (domain-specific language) that describe one-to-one
translations from an event object to a feature or column value. The attributes of each event are selectable
using `timeseries_name` + `.` + `column_name`, and can be further transformed with TQL expression functions.
The TQL Expression Language contains a rich set math, string, and array manipulation functions to perform
common "data-wrangling" operations.
For example, here we select the bid ad_size using the attribute `bid.ad_size`, then further manipulate it
into a standardized feature:
```python
select('COALESCE(UPPER(bid.ad_size), "UNKNOWN")').from_events('tql_quickstart').limit(5)
```
Query results:
partition "_default"
|
_c0 |
0 |
BIG |
1 |
BIG |
2 |
SMALL |
3 |
UNKNOWN |
4 |
SMALL |
query produced 5 rows x 1 columns in 0.08 seconds
### Timeline Expressions
Because of TQL's [unique data processing model](concepts.html), the entire context of the timeline is available
to each event at extraction time, using the special attribute `timeline`. Each timeline has an id, accessible by
`timeline.id`, as well as an array of events, accessible by `timeline.events`. Here is an example of some
timeline expressions:
```python
select(
#the id of each timeline
'timeline.id',
#the type of the current event
'type',
#the type of the first event
'timeline.events[0].type',
#the type of the last event
'timeline.events[-1].type',
#the number of bids that preceeded this event
'SIZE(FILTER(timeline.events, (x) -> x.type == "bid" and x.timestamp < timestamp))',
#the total number of purchase events on the timeline
'SIZE(FILTER(timeline.events, (x) -> x.type == "activity" AND x.activity.conversion == "purchase"))',
)\
.from_events('tql_quickstart')\
.limit(5)
```
Query results:
partition "_default"
|
_c0 |
_c1 |
_c2 |
_c3 |
_c4 |
_c5 |
0 |
u252 |
bid |
bid |
bid |
0.0 |
5.0 |
1 |
u252 |
bid |
bid |
bid |
1.0 |
5.0 |
2 |
u252 |
bid |
bid |
bid |
2.0 |
5.0 |
3 |
u252 |
activity |
bid |
bid |
3.0 |
5.0 |
4 |
u252 |
bid |
bid |
bid |
3.0 |
5.0 |
query produced 5 rows x 6 columns in 0.16 seconds
For a comprehensive guide to TQL expressions, please refer to the [expression language manual](expressions.html),
or the [expression language cheat sheat](expression-language#expression-language-cheat-sheet) for quick reference.
## Building a Machine Learning Dataset
To build a machine learning dataset, we will use a few more pieces of the query API, including typed column
operators `label()` and `categorical()`, the `where()` operator, and the `partition_by()` operator:
```python
select(
label('if(bid.won, 1, 0)', 'won_bid'),
categorical('lower(bid.ad_size)', 'ad_size')
)\
.from_events('tql_quickstart')\
.where('type == "bid"')\
.partition_by('IF(random() < 0.2, "test", "train")')\
.options(expand_numerical_features=True, fill_na=True)\
.limit(10)
```
Query results:
partition "test"
|
won_bid |
ad_size |
0 |
1.0 |
small |
1 |
1.0 |
big |
2 |
1.0 |
small |
partition "train"
|
won_bid |
ad_size |
0 |
1.0 |
big |
1 |
1.0 |
big |
2 |
1.0 |
small |
3 |
1.0 |
big |
4 |
1.0 |
big |
5 |
1.0 |
big |
6 |
1.0 |
big |
query produced 10 rows x 2 columns in 0.13 seconds
Now that our query results looks good, we store the output of the query in a ResultSet using `submit()`.
We will also remove the `limit(10)` to make sure we have enough samples to train a model with:
```python
rs = select(
label('if(bid.won, 1, 0)', 'won_bid'),
categorical('lower(bid.ad_size)', 'ad_size')
)\
.from_events('tql_quickstart')\
.where('type == "bid"')\
.partition_by('IF(random() < 0.2, "test", "train")')\
.options(expand_numerical_features=True, fill_na=True)\
.submit()
```
## Training A Machine Learning Model
We now will pass our resultset directly into TQL's model trainer. We provide a convienence wrapper for H2O,
but drop-in support for other training packages can be made avaiable
[upon request](mailto:zkozick@nanigans.com?subject=Request%20For%20Additional%20TQL%20Training%20Engine%20Support).
```python
from zeenk.tql.modeling import H2OEstimator
model = H2OEstimator('logistic', rs, 'glm').train()
```
Checking whether there is an H2O instance running at http://localhost:54321 . connected.
H2O_cluster_uptime: |
3 days 22 hours 59 mins |
H2O_cluster_timezone: |
America/Denver |
H2O_data_parsing_timezone: |
UTC |
H2O_cluster_version: |
3.32.1.5 |
H2O_cluster_version_age: |
9 days |
H2O_cluster_name: |
H2O_from_python_zkozick_t4cxa3 |
H2O_cluster_total_nodes: |
1 |
H2O_cluster_free_memory: |
3.237 Gb |
H2O_cluster_total_cores: |
8 |
H2O_cluster_allowed_cores: |
8 |
H2O_cluster_status: |
locked, healthy |
H2O_connection_url: |
http://localhost:54321 |
H2O_connection_proxy: |
{"http": null, "https": null} |
H2O_internal_security: |
False |
H2O_API_Extensions: |
Amazon S3, XGBoost, Algos, AutoML, Core V3, TargetEncoder, Core V4 |
Python_version: |
3.7.3 final |
Parse progress: |█████████████████████████████████████████████████████████| 100%
Parse progress: |█████████████████████████████████████████████████████████| 100%
## Model Deployment
In order to use our model to make either batch or online predictions, we first must deploy the model into TQL
using `.publish()`, assigning it a unique model type, which we will use to reference it later in queries:
```python
model.publish('my_first_model')
```
329
## Making Predictions
### Batch Predictions
We can now write a query that leverages this model using the TQL function `PREDICT('model_type')`:
```python
select('PREDICT("my_first_model")').from_events('tql_quickstart').limit(5)
```
Query results:
partition "_default"
|
_c0 |
0 |
0.7102528549392564 |
1 |
0.7102528549392564 |
2 |
0.7101808325686126 |
3 |
0.7101808325686126 |
4 |
0.7101808325686126 |
query produced 5 rows x 1 columns in 0.35 seconds
### Real-Time Predictions
Additionally, we can make new predictions against previously unseen events using TQL's `external_timelines()`
operator. With this operator, the TQL query will be executed only against the provided timeline, instead of
against the full set of Project Timelines.
```python
new_bid = {
'type':'bid',
'timestamp': '1628095396',
'bid': {
'ad_size': 'BIG',
}
}
timeline = {
'id':'',
'events': [new_bid]
}
select('PREDICT("my_first_model")')\
.from_events('tql_quickstart')\
.where('type=="bid"')\
.external_timelines([timeline])\
.submit()\
.dataframe()[0][0]
```
'0.7102528549392564'
### Prediction As A Service
All TQL operations occur by making requests to a REST API, the gateway to TQL's Java backend. This means by
publishing, your model is already deployed to a real-time prediction service, which can be invoked through
Python as shown above, or directly over HTTP if your client is written in another language:
```python
select('PREDICT("my_first_model")')\
.from_events('tql_quickstart')\
.where('type=="bid"')\
.external_timelines([timeline])\
.json()
```
{'columns': [{'name': None,
'expression': 'PREDICT("my_first_model")',
'type': 'METADATA'}],
'project_id': 104,
'where_filters': ['type=="bid"'],
'external_timelines': [{'id': '',
'events': [{'type': 'bid',
'timestamp': '1628095396',
'bid': {'ad_size': 'BIG'}}]}]}
```python
!curl\
--header "Content-Type: application/json" \
--request POST \
--data '{"columns": [{"name": null, "expression": "PREDICT(\"my_first_model\")", "type": "METADATA"}], "project_id": 4, "where_filters": ["type==\"bid\""], "external_timelines": [{"id": "", "events": [{"type": "bid", "timestamp": "1628095396", "bid": {"ad_size": "tiny"}}]}]}'\
http://localhost:9000/v1/query/submit
```
{"status_code":200,"data":{"query":{"id":-1,"project_id":4,"timeline_id":-1,"query_udfs":[],"timelines":{"id":138,"project_id":4,"project_history_id":27,"compute_stats":true,"is_universal_timelines":true,"statistics":{"timeline_count":1000,"event_count":198393,"event_min_timestamp":"2021-01-01 00:00:00.000","event_max_timestamp":"2021-01-14 23:59:59.000","event_count_min":1,"event_count_max":904,"event_counts_by_type":{"activity":25416,"bid":171977,"user":1000},"time_series_min_max_timestamps":{"activity":{"event_name":"activity","event_min_timestamp":"2021-01-01 00:00:00.000","event_max_timestamp":"2021-01-14 23:58:20.000","column_stats":[],"event_count":0,"timeline_count":0},"bid":{"event_name":"bid","event_min_timestamp":"2021-01-01 00:00:00.000","event_max_timestamp":"2021-01-14 23:59:59.000","column_stats":[],"event_count":0,"timeline_count":0},"user":{"event_name":"user","event_min_timestamp":"2021-01-01 00:00:00.000","event_max_timestamp":"2021-01-01 00:00:00.000","column_stats":[],"event_count":0,"timeline_count":0}}},"dot_notations":["activity.activity_id","activity.conversion","activity.event_time","activity.user_id","bid.ad_size","bid.bid","bid.event_time","bid.ghosted","bid.request_id","bid.user_id","bid.won","duration","event","generated","generated_weight","id","index","outcome","timeline.events","timeline.id","timestamp","type","user.age","user.gender","user.group","user.region","user.timestamp","user.user_id"],"schema":{"fields":[{"name":"id","type":{"type":"string"},"nullable":true},{"name":"events","type":{"element_type":{"fields":[{"name":"timestamp","type":{"type":"long"},"nullable":true},{"name":"id","type":{"type":"long"},"nullable":true},{"name":"type","type":{"type":"string"},"nullable":true},{"name":"activity","type":{"fields":[{"name":"activity_id","type":{"type":"string"},"nullable":true},{"name":"event_time","type":{"type":"string"},"nullable":true},{"name":"user_id","type":{"type":"string"},"nullable":true},{"name":"conversion","type":{"type":"string"},"nullable":true}],"type":"schema"},"nullable":true},{"name":"bid","type":{"fields":[{"name":"request_id","type":{"type":"string"},"nullable":true},{"name":"event_time","type":{"type":"string"},"nullable":true},{"name":"user_id","type":{"type":"string"},"nullable":true},{"name":"ad_size","type":{"type":"string"},"nullable":true},{"name":"bid","type":{"type":"double"},"nullable":true},{"name":"ghosted","type":{"type":"boolean"},"nullable":true},{"name":"won","type":{"type":"boolean"},"nullable":true}],"type":"schema"},"nullable":true},{"name":"user","type":{"fields":[{"name":"user_id","type":{"type":"string"},"nullable":true},{"name":"gender","type":{"type":"string"},"nullable":true},{"name":"region","type":{"type":"string"},"nullable":true},{"name":"age","type":{"type":"string"},"nullable":true},{"name":"group","type":{"type":"string"},"nullable":true},{"name":"timestamp","type":{"type":"string"},"nullable":true}],"type":"schema"},"nullable":true},{"name":"generated_weight","type":{"type":"double"},"nullable":true},{"name":"generated","type":{"type":"boolean"},"nullable":true},{"name":"index","type":{"type":"long"},"nullable":true},{"name":"duration","type":{"type":"long"},"nullable":true}],"type":"schema"},"type":"array"},"nullable":true}],"type":"schema"},"universal_timeline_sample_data_location":"file:/Users/zkozick/.noumena/files/universal_timelines/2021/08/05/project_id=4/id=138/data_sample/","universal_timeline_data_location":"file:/Users/zkozick/.noumena/files/universal_timelines/2021/08/05/project_id=4/id=138/data/","state":"SUCCESS","retry_count":0,"created":"2021-08-04 17:00:14.000","updated":"2021-08-04 10:01:14.000","metadata":{}},"interactive":true,"timeline_sample_rate":1.0,"timeline_limit":-1,"row_limit":-1,"from_filters":[],"where_filters":["type==\"bid\""],"columns":[{"type":"METADATA","name":"_c0","expression":"PREDICT(\"my_first_model\")"}],"global_variables":{},"timeline_variables":{},"event_variables":{},"max_columns":-1,"global_min_total_count":-1,"apply_feature_filters":true,"apply_charset_filter":true,"drop_empty_rows":false,"expand_numerical_features":false,"drop_numerical_zero_features":false,"drop_constant_feature_columns":false,"throw_expression_errors":false,"debug_expressions":false,"allow_invalid_column_expressions":false,"external_timelines":[{"id":"","events":[{"type":"bid","timestamp":"1628095396","bid":{"ad_size":"tiny"}}]}],"output_format":"PARQUET","published_model_context":{"published_models_table":{"328":{"model_id":328,"project_id":4,"model_type":"my_first_model","enabled":true,"columns":[{"type":"LABEL","name":"won_bid","expression":"if(bid.won, 1, 0)"},{"type":"CATEGORICAL","name":"ad_size","expression":"lower(bid.ad_size)"}],"model_estimates":[{"engine":"H2O","artifact_path":"/Users/zkozick/.noumena/files/model_artifacts/artifact_986adb49-cef1-468f-bac5-fe4fd7b11f4d.nml","test_cases":[]}],"constant_enabled":true,"keep_flags":[],"ignore_flags":[],"is_universal_timelines":true,"created":"2021-08-13 05:54:01.000","updated":"2021-08-13 05:54:01.000"},"my_first_model":{"model_id":328,"project_id":4,"model_type":"my_first_model","enabled":true,"columns":[{"type":"LABEL","name":"won_bid","expression":"if(bid.won, 1, 0)"},{"type":"CATEGORICAL","name":"ad_size","expression":"lower(bid.ad_size)"}],"model_estimates":[{"engine":"H2O","artifact_path":"/Users/zkozick/.noumena/files/model_artifacts/artifact_986adb49-cef1-468f-bac5-fe4fd7b11f4d.nml","test_cases":[]}],"constant_enabled":true,"keep_flags":[],"ignore_flags":[],"is_universal_timelines":true,"created":"2021-08-13 05:54:01.000","updated":"2021-08-13 05:54:01.000"}},"published_models":[{"model_id":328,"project_id":4,"model_type":"my_first_model","enabled":true,"columns":[{"type":"LABEL","name":"won_bid","expression":"if(bid.won, 1, 0)"},{"type":"CATEGORICAL","name":"ad_size","expression":"lower(bid.ad_size)"}],"model_estimates":[{"engine":"H2O","artifact_path":"/Users/zkozick/.noumena/files/model_artifacts/artifact_986adb49-cef1-468f-bac5-fe4fd7b11f4d.nml","test_cases":[]}],"constant_enabled":true,"keep_flags":[],"ignore_flags":[],"is_universal_timelines":true,"created":"2021-08-13 05:54:01.000","updated":"2021-08-13 05:54:01.000"}]},"published_models":[],"numerical_feature_precision":16,"numerical_feature_epsilon":1.0E-9,"metadata":{},"fill_na":true},"resultset":{"columns":[{"type":"METADATA","name":"_c0","expression":"PREDICT(\"my_first_model\")"}],"id":-1,"result_set_metrics":{"total_example_count":1,"column_metrics":[{"categorical_cardinality":0,"name":"_c0","expression":"PREDICT(\"my_first_model\")","debug_info":[],"column_value_metrics":{},"expression_timing_stats":{"min":674,"max":674,"mean":674,"total":674,"p50":674.0,"p75":674.0,"p95":674.0,"p99":674.0,"unit":"micros"}}],"dropped_empty_row_count":0,"event_runtime":{"min":681,"max":681,"mean":681,"total":681,"p50":681.0,"p75":681.0,"p95":681.0,"p99":681.0,"unit":"micros"},"timeline_runtime":{"min":6736,"max":6736,"mean":6736,"total":6736,"p50":6736.0,"p75":6736.0,"p95":6736.0,"p99":6736.0,"unit":"micros"},"query_runtime_micros":10055,"outcome_generated":0,"timelines_processed":1,"real_events_processed":1,"event_passed_filters":1,"opportunity_event_count":0,"null_label_evaluation_count":0,"null_weight_evaluation_count":0,"events_processed":1,"column_expanded":false,"dropped_expanded_columns":{},"dropped_column_count":0,"dropped_empty_rows_after_conversion":0,"partition_metrics":{"_default":{"pos_row_count":0,"total_row_count":1}},"warnings":[],"ml_dataset_id":0,"positive_example_count":0,"negative_example_count":1,"empty_example_count":1,"label_sum":0.0,"weighted_label_sum":0.0},"partitions":[{"name":"_default","rows":[["0.7097314931031173"]],"row_count":1,"positive_row_count":0}]},"state":"SUCCESS"},"meta":{"service_name":"icarus-webservice","deployment_level":"standalone","version":"20.1.17-SNAPSHOT"},"data_count":1}
## What Next?
Congratulations! You have now built your first TQL application.
- Learn more about getting your data into TQL with the [Building Timelines Guide](timelines).
- For an in-depth view of the query API, head over to the [Query Manual](queries).
- Learn more about writing TQL expressions with the [TQL Expression Language Manual](expression-language).
- Understand the background and design of TQL, check out the [Core Concepts And Architecture Guide](concepts).
- Scale TQL to big data applications with the [TQL Scaling Guide](scaling).