Using ResultSets
TQL ResultSet is the result of calling the Query object’s submit() method. ResultSets contain exactly one row per event, and exactly one column per TQL expression defined in the select() clause.
When submitting a Query, you can define it to be either interactive with submit(interactive=True) or
asynchronous with submit(interative=False). Interactive queries store their ResultSet data in memory. Asynchronous
queries store their ResultSets on disk. ResultSets stored on disk can be recalled at a later time, see the
load() function below.
from zeenk.tql import *
------------------------------------------------------------------
Version: 20.1.17-SNAPSHOT
Version Timestamp: 2021-08-27 20:56:17
Version Age: 4 days, 13 hours, 39 minutes, 56 seconds
Filesystem Root: /Users/zkozick/.tql/files
Working Directory: /Users/zkozick/.tql
Configuration File: /Users/zkozick/.tql/tql-conf.yml
Api Gateway: http://localhost:9000
Service Status: Icarus: ONLINE, Daedalus: ONLINE
Service Uptime: 1 days, 18 hours, 7 minutes, 55 seconds
------------------------------------------------------------------
Loading ResultSet Data
Loading Data as a Python List
The .dataframe() function returns the dataframe as a Python list of lists from all partitions:
resultset = select('bid.*').from_events('lethe4').limit(5).submit()
# first row of the dataframe
resultset.dataframe()
[[None, None, None, None, None, None, None],
['big', '1.0', '2021-01-02 00:02:36', 'false', 'r13260', 'u252', 'true'],
['big', '1.0', '2021-01-02 00:16:41', 'false', 'r13376', 'u252', 'true'],
['small', '1.0', '2021-01-02 00:30:55', 'false', 'r13509', 'u252', 'true'],
[None, None, None, None, None, None, None]]
Loading Data as a Pandas Dataframe
Similar to dataframe, .pandas_dataframe() returns the data, but as a Pandas dataframe.
resultset = select('bid.*').from_events('lethe4').limit(5).submit()
resultset.pandas_dataframe()
| bid_ad_size | bid_bid | bid_event_time | bid_ghosted | bid_request_id | bid_user_id | bid_won | |
|---|---|---|---|---|---|---|---|
| 0 | None | None | None | None | None | None | None |
| 1 | big | 1.0 | 2021-01-02 00:02:36 | false | r13260 | u252 | true |
| 2 | big | 1.0 | 2021-01-02 00:16:41 | false | r13376 | u252 | true |
| 3 | small | 1.0 | 2021-01-02 00:30:55 | false | r13509 | u252 | true |
| 4 | None | None | None | None | None | None | None |
Loading Data as a Spark Dataframe
Similar to dataframe, .spark_dataframe() returns the data, but as a Spark dataframe. The backend Java process that manipulated the data outputs it as strings so Spark cannot always automatically infer data types.
resultset = select("'test'", '1').from_events('lethe4').limit(5).submit()
spark_df = resultset.spark_dataframe()
spark_df.show(2)
+----+---+
| _1| _2|
+----+---+
|test| 1|
|test| 1|
+----+---+
only showing top 2 rows
ResultSet Partitions
ResultSet data is always partitioned, though if the query did not define a partition_key(expr), then there will
only be a single “default” partition with name _default. If .partition_key(expr) was defined, then there will
be N partitions, one for each unique value output by the partition key expression.
Default Partition
The method default_partition() outputs the contents of the default partition.
resultset = select('bid.*').from_events('lethe4').limit(5).submit()
resultset.default_partition()
| bid_ad_size | bid_bid | bid_event_time | bid_ghosted | bid_request_id | bid_user_id | bid_won | |
|---|---|---|---|---|---|---|---|
| 0 | None | None | None | None | None | None | None |
| 1 | big | 1.0 | 2021-01-02 00:02:36 | false | r13260 | u252 | true |
| 2 | big | 1.0 | 2021-01-02 00:16:41 | false | r13376 | u252 | true |
| 3 | small | 1.0 | 2021-01-02 00:30:55 | false | r13509 | u252 | true |
| 4 | None | None | None | None | None | None | None |
Partition Names
The partition_names() method outputs a tuple of partition names.
resultset = select('activity.*')\
.from_events('lethe4')\
.limit(7)\
.partition_by('IF(RANDOM() < 0.5, "train", "test")')\
.submit()
resultset.partition_names()
('test', 'train')
Partitions
The partitions() method outputs Partition objects.
for part in resultset.partitions():
print(f"""
name: {part.name()}
row_count: {part.row_count()}""")
name: test
row_count: 5
name: train
row_count: 2
Partition
The partition() method gets the partition supplied as an argument. If no argument is supplied, it tries to get the default partition.
resultset.partition('test')
| activity_activity_id | activity_conversion | activity_event_time | activity_user_id | |
|---|---|---|---|---|
| 0 | None | None | None | None |
| 1 | None | None | None | None |
| 2 | None | None | None | None |
| 3 | None | None | None | None |
| 4 | None | None | None | None |
Retrieving Column Names
Calling column_names() retrieves the columns of the ResultSet.
resultset = select('bid.*').from_events('lethe4').limit(5).submit()
resultset.column_names()
Row Count
The row_count() method returns the total number of rows in the ResultSet over all partitions.
resultset.row_count()
7
Positive row count
The positive_row_count() method returns the number of positive rows.
resultset.positive_row_count()
0
Get Query
The method get_query() returns the query that was used to generate the ResultSet
query = resultset.get_query()
query.describe()
| Attribute | Value |
|---|---|
| project_id | 4 |
| timeline_limit | -1 |
| row_limit | 7 |
| partition_key_expression | IF(RANDOM() < 0.5, "train", "test") |
| interactive | True |
| allow_invalid_column_expressions | True |
| name | expression | type |
|---|---|---|
| activity_activity_id | activity.activity_id | METADATA |
| activity_conversion | activity.conversion | METADATA |
| activity_event_time | activity.event_time | METADATA |
| activity_user_id | activity.user_id | METADATA |
Get ID
The get_id method returns the ID of the ResultSet if the query was executed as a non-interactive job. Otherwise, it returns -1.
resultset = select('bid.*').from_events('lethe4').limit(5).submit(interactive=False)
resultset.get_id()
Click to Abort Job
NOTE: Abort link will open the result of the call in a new tab/window
SUCCESS: 8/8|██████████|, TOTAL ELAPSED=0:00:17, WAIT=0:00:00, PENDING=0:00:01, RUNNING=0:00:16
33
Load
The load() method takes the ResultSet ID as an argument and loads it into the notebook. You can optionally set analyze parameter to True to get an analysis of the Query.
resultset = select('bid.*').from_events('lethe4').limit(5).submit(interactive=False)
rs_id = resultset.get_id()
load_resultset(rs_id).pandas_dataframe()
Click to Abort Job
NOTE: Abort link will open the result of the call in a new tab/window
SUCCESS: 8/8|██████████|, TOTAL ELAPSED=0:00:06, WAIT=0:00:00, PENDING=0:00:01, RUNNING=0:00:05
| bid_ad_size | bid_bid | bid_event_time | bid_ghosted | bid_request_id | bid_user_id | bid_won | |
|---|---|---|---|---|---|---|---|
| 0 | None | None | None | None | None | None | None |
| 1 | big | 1.0 | 2021-01-02 00:02:36 | false | r13260 | u252 | true |
| 2 | big | 1.0 | 2021-01-02 00:16:41 | false | r13376 | u252 | true |
| 3 | small | 1.0 | 2021-01-02 00:30:55 | false | r13509 | u252 | true |
| 4 | None | None | None | None | None | None | None |
Refresh
The refresh() method refreshes non-interactive ResultSet fields after query execution finishes.
resultset = select('bid.*').from_events('lethe4').limit(5).submit(interactive=False)
resultset.refresh()
resultset
Click to Abort Job
NOTE: Abort link will open the result of the call in a new tab/window
SUCCESS: 8/8|██████████|, TOTAL ELAPSED=0:00:07, WAIT=0:00:00, PENDING=0:00:01, RUNNING=0:00:06
Query results:
partition "_default"| key | value | |
|---|---|---|
| 0 | name | _default |
| 1 | data_path | file:/Users/smaguire/.tql/files/datasets/2021-08-31/94096521418144/data/partition=_default |
| 2 | row_count | 40 |
| 3 | columns | [bid_ad_size, bid_bid, bid_event_time, bid_ghosted, bid_request_id, bid_user_id, bid_won] |
| 4 | positive_row_count | 0 |
Metrics
The method metrics() returns timing and runtime metadata about the ResultSet.
resultset = select(
# column value metrics are computed when using FeatureColumn such as numerical() and categorical()
numerical('1'),
categorical('type', name='my_categorical_col')
).from_events('lethe4').limit(5).submit(analyze=True)
resultset.metrics()
Expression timing stats:
| name | expression | mean(ms) | p95(ms) | |
|---|---|---|---|---|
| 0 | _c0 | 1 | 0.007 | 0.022 |
| 1 | my_categorical_col | type | 0.015 | 0.046 |
Timeline & Event runtime:
| min | max | mean | total | p50 | p75 | p95 | p99 | unit | |
|---|---|---|---|---|---|---|---|---|---|
| timeline_runtime | 7405 | 7405 | 7405 | 7405 | 7405.0 | 7405.0 | 7405.0 | 7405.00 | micros |
| event_runtime | 29 | 652 | 193 | 773 | 42.0 | 50.0 | 531.6 | 627.92 | micros |
Query summary:
| key | value | |
|---|---|---|
| 0 | query_runtime_micros | 21732 |
| 1 | outcome_generated | 0 |
| 2 | timelines_processed | 1 |
| 3 | real_events_processed | 5 |
| 4 | event_passed_filters | 5 |
| 5 | opportunity_event_count | 0 |
| 6 | null_label_evaluation_count | 0 |
| 7 | null_weight_evaluation_count | 0 |
| 8 | events_processed | 5 |
| 9 | column_expanded | False |
| 10 | dropped_empty_rows_after_conversion | 0 |
| 11 | positive_example_count | 0 |
| 12 | negative_example_count | 5 |
| 13 | empty_example_count | 0 |
| 14 | label_sum | 0.0 |
| 15 | weighted_label_sum | 0.0 |
Column value metrics:
| name | column_index | positive_count | negative_count | non_null_count | non_zero_count | positive_sum | negative_sum | positive_weighted_count | negative_weighted_count | label_sum | weighted_label_sum | min_numerical_value | max_numerical_value | total_weighted_count | total_sum | total_count | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | _c0 | 0 | 0 | 5 | 5 | 5 | 0.0 | 5.0 | 0.0 | 5.0 | 0.0 | 0.0 | 1.0 | 1.0 | 5.0 | 5.0 | 5 |
| 1 | my_categorical_col_EQ_bid | 1 | 0 | 3 | 3 | 0 | 0.0 | 0.0 | 0.0 | 3.0 | 0.0 | 0.0 | NaN | NaN | 3.0 | 0.0 | 3 |
| 2 | my_categorical_col_EQ_activity | 1 | 0 | 1 | 1 | 0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | NaN | NaN | 1.0 | 0.0 | 1 |
| 3 | my_categorical_col_EQ_user | 1 | 0 | 1 | 1 | 0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | NaN | NaN | 1.0 | 0.0 | 1 |