# Using ResultSets
TQL ResultSet is the result of calling the [Query](queries) object's `submit()` method. ResultSets contain ***exactly one row per event***, and ***exactly one column per TQL expression*** defined in the `select()` clause.
When submitting a [Query](queries), you can define it to be either interactive with `submit(interactive=True)` or
asynchronous with `submit(interative=False)`. Interactive queries store their ResultSet data in memory. Asynchronous
queries store their ResultSets on disk. ResultSets stored on disk can be recalled at a later time, see the
[load() function](#Load) below.
```python
from zeenk.tql import *
```
------------------------------------------------------------------
Version: 20.1.17-SNAPSHOT
Version Timestamp: 2021-08-27 20:56:17
Version Age: 4 days, 13 hours, 39 minutes, 56 seconds
Filesystem Root: /Users/zkozick/.tql/files
Working Directory: /Users/zkozick/.tql
Configuration File: /Users/zkozick/.tql/tql-conf.yml
Api Gateway: http://localhost:9000
Service Status: Icarus: ONLINE, Daedalus: ONLINE
Service Uptime: 1 days, 18 hours, 7 minutes, 55 seconds
------------------------------------------------------------------
## Loading ResultSet Data
### Loading Data as a Python List
The `.dataframe()` function returns the dataframe as a Python list of lists from all partitions:
```python
resultset = select('bid.*').from_events('lethe4').limit(5).submit()
# first row of the dataframe
resultset.dataframe()
```
[[None, None, None, None, None, None, None],
['big', '1.0', '2021-01-02 00:02:36', 'false', 'r13260', 'u252', 'true'],
['big', '1.0', '2021-01-02 00:16:41', 'false', 'r13376', 'u252', 'true'],
['small', '1.0', '2021-01-02 00:30:55', 'false', 'r13509', 'u252', 'true'],
[None, None, None, None, None, None, None]]
### Loading Data as a Pandas Dataframe
Similar to dataframe, `.pandas_dataframe()` returns the data, but as a [Pandas](https://pandas.pydata.org/) dataframe.
```python
resultset = select('bid.*').from_events('lethe4').limit(5).submit()
resultset.pandas_dataframe()
```
|
bid_ad_size |
bid_bid |
bid_event_time |
bid_ghosted |
bid_request_id |
bid_user_id |
bid_won |
| 0 |
None |
None |
None |
None |
None |
None |
None |
| 1 |
big |
1.0 |
2021-01-02 00:02:36 |
false |
r13260 |
u252 |
true |
| 2 |
big |
1.0 |
2021-01-02 00:16:41 |
false |
r13376 |
u252 |
true |
| 3 |
small |
1.0 |
2021-01-02 00:30:55 |
false |
r13509 |
u252 |
true |
| 4 |
None |
None |
None |
None |
None |
None |
None |
### Loading Data as a Spark Dataframe
Similar to dataframe, `.spark_dataframe()` returns the data, but as a [Spark](https://spark.apache.org/docs/3.1.1/sql-programming-guide.html#untyped-dataset-operations-aka-dataframe-operations) dataframe. The backend Java process that manipulated the data outputs it as strings so Spark cannot always automatically infer data types.
```python
resultset = select("'test'", '1').from_events('lethe4').limit(5).submit()
spark_df = resultset.spark_dataframe()
spark_df.show(2)
```
+----+---+
| _1| _2|
+----+---+
|test| 1|
|test| 1|
+----+---+
only showing top 2 rows
## ResultSet Partitions
ResultSet data is always partitioned, though if the query did not define a `partition_key(expr)`, then there will
only be a single "default" partition with name `_default`. If .`partition_key(expr)` was defined, then there will
be N partitions, one for each unique value output by the partition key expression.
### Default Partition
The method `default_partition()` outputs the contents of the default partition.
```python
resultset = select('bid.*').from_events('lethe4').limit(5).submit()
resultset.default_partition()
```
|
bid_ad_size |
bid_bid |
bid_event_time |
bid_ghosted |
bid_request_id |
bid_user_id |
bid_won |
| 0 |
None |
None |
None |
None |
None |
None |
None |
| 1 |
big |
1.0 |
2021-01-02 00:02:36 |
false |
r13260 |
u252 |
true |
| 2 |
big |
1.0 |
2021-01-02 00:16:41 |
false |
r13376 |
u252 |
true |
| 3 |
small |
1.0 |
2021-01-02 00:30:55 |
false |
r13509 |
u252 |
true |
| 4 |
None |
None |
None |
None |
None |
None |
None |
### Partition Names
The `partition_names()` method outputs a tuple of partition names.
```python
resultset = select('activity.*')\
.from_events('lethe4')\
.limit(7)\
.partition_by('IF(RANDOM() < 0.5, "train", "test")')\
.submit()
resultset.partition_names()
```
('test', 'train')
### Partitions
The `partitions()` method outputs [Partition](tql.html#tql.resultset.Partition) objects.
```python
for part in resultset.partitions():
print(f"""
name: {part.name()}
row_count: {part.row_count()}""")
```
name: test
row_count: 5
name: train
row_count: 2
### Partition
The `partition()` method gets the partition supplied as an argument. If no argument is supplied, it tries to get the default partition.
```python
resultset.partition('test')
```
|
activity_activity_id |
activity_conversion |
activity_event_time |
activity_user_id |
| 0 |
None |
None |
None |
None |
| 1 |
None |
None |
None |
None |
| 2 |
None |
None |
None |
None |
| 3 |
None |
None |
None |
None |
| 4 |
None |
None |
None |
None |
## Retrieving Column Names
Calling column_names() retrieves the columns of the ResultSet.
```python
resultset = select('bid.*').from_events('lethe4').limit(5).submit()
resultset.column_names()
```
## Row Count
The `row_count()` method returns the total number of rows in the ResultSet over all partitions.
```python
resultset.row_count()
```
7
## Positive row count
The `positive_row_count()` method returns the number of positive rows.
```python
resultset.positive_row_count()
```
0
## Get Query
The method `get_query()` returns the query that was used to generate the ResultSet
```python
query = resultset.get_query()
query.describe()
```
Query
| Attribute |
Value |
| project_id |
4 |
| timeline_limit |
-1 |
| row_limit |
7 |
| partition_key_expression |
IF(RANDOM() < 0.5, "train", "test") |
| interactive |
True |
| allow_invalid_column_expressions |
True |
Query.columns
| name |
expression |
type |
| activity_activity_id |
activity.activity_id |
METADATA |
| activity_conversion |
activity.conversion |
METADATA |
| activity_event_time |
activity.event_time |
METADATA |
| activity_user_id |
activity.user_id |
METADATA |
## Get ID
The `get_id` method returns the ID of the ResultSet if the query was executed as a non-interactive job. Otherwise, it returns `-1`.
```python
resultset = select('bid.*').from_events('lethe4').limit(5).submit(interactive=False)
resultset.get_id()
```
Click to Abort Job
NOTE: Abort link will open the result of the call in a new tab/window
SUCCESS: 8/8|██████████|, TOTAL ELAPSED=0:00:17, WAIT=0:00:00, PENDING=0:00:01, RUNNING=0:00:16
33
## Load
The `load()` method takes the ResultSet ID as an argument and loads it into the notebook. You can optionally set `analyze` parameter to `True` to get an analysis of the Query.
```python
resultset = select('bid.*').from_events('lethe4').limit(5).submit(interactive=False)
rs_id = resultset.get_id()
load_resultset(rs_id).pandas_dataframe()
```
Click to Abort Job
NOTE: Abort link will open the result of the call in a new tab/window
SUCCESS: 8/8|██████████|, TOTAL ELAPSED=0:00:06, WAIT=0:00:00, PENDING=0:00:01, RUNNING=0:00:05
|
bid_ad_size |
bid_bid |
bid_event_time |
bid_ghosted |
bid_request_id |
bid_user_id |
bid_won |
| 0 |
None |
None |
None |
None |
None |
None |
None |
| 1 |
big |
1.0 |
2021-01-02 00:02:36 |
false |
r13260 |
u252 |
true |
| 2 |
big |
1.0 |
2021-01-02 00:16:41 |
false |
r13376 |
u252 |
true |
| 3 |
small |
1.0 |
2021-01-02 00:30:55 |
false |
r13509 |
u252 |
true |
| 4 |
None |
None |
None |
None |
None |
None |
None |
## Refresh
The `refresh()` method refreshes non-interactive ResultSet fields after query execution finishes.
```python
resultset = select('bid.*').from_events('lethe4').limit(5).submit(interactive=False)
resultset.refresh()
resultset
```
Click to Abort Job
NOTE: Abort link will open the result of the call in a new tab/window
SUCCESS: 8/8|██████████|, TOTAL ELAPSED=0:00:07, WAIT=0:00:00, PENDING=0:00:01, RUNNING=0:00:06
Query results:
partition "_default"
|
key |
value |
| 0 |
name |
_default |
| 1 |
data_path |
file:/Users/smaguire/.tql/files/datasets/2021-08-31/94096521418144/data/partition=_default |
| 2 |
row_count |
40 |
| 3 |
columns |
[bid_ad_size, bid_bid, bid_event_time, bid_ghosted, bid_request_id, bid_user_id, bid_won] |
| 4 |
positive_row_count |
0 |
query produced 5 rows x 7 columns in 8.2 seconds
## Metrics
The method `metrics()` returns timing and runtime metadata about the ResultSet.
```python
resultset = select(
# column value metrics are computed when using FeatureColumn such as numerical() and categorical()
numerical('1'),
categorical('type', name='my_categorical_col')
).from_events('lethe4').limit(5).submit(analyze=True)
resultset.metrics()
```
Expression timing stats:
|
name |
expression |
mean(ms) |
p95(ms) |
| 0 |
_c0 |
1 |
0.007 |
0.022 |
| 1 |
my_categorical_col |
type |
0.015 |
0.046 |
Timeline & Event runtime:
|
min |
max |
mean |
total |
p50 |
p75 |
p95 |
p99 |
unit |
| timeline_runtime |
7405 |
7405 |
7405 |
7405 |
7405.0 |
7405.0 |
7405.0 |
7405.00 |
micros |
| event_runtime |
29 |
652 |
193 |
773 |
42.0 |
50.0 |
531.6 |
627.92 |
micros |
Query summary:
|
key |
value |
| 0 |
query_runtime_micros |
21732 |
| 1 |
outcome_generated |
0 |
| 2 |
timelines_processed |
1 |
| 3 |
real_events_processed |
5 |
| 4 |
event_passed_filters |
5 |
| 5 |
opportunity_event_count |
0 |
| 6 |
null_label_evaluation_count |
0 |
| 7 |
null_weight_evaluation_count |
0 |
| 8 |
events_processed |
5 |
| 9 |
column_expanded |
False |
| 10 |
dropped_empty_rows_after_conversion |
0 |
| 11 |
positive_example_count |
0 |
| 12 |
negative_example_count |
5 |
| 13 |
empty_example_count |
0 |
| 14 |
label_sum |
0.0 |
| 15 |
weighted_label_sum |
0.0 |
Column value metrics:
|
name |
column_index |
positive_count |
negative_count |
non_null_count |
non_zero_count |
positive_sum |
negative_sum |
positive_weighted_count |
negative_weighted_count |
label_sum |
weighted_label_sum |
min_numerical_value |
max_numerical_value |
total_weighted_count |
total_sum |
total_count |
| 0 |
_c0 |
0 |
0 |
5 |
5 |
5 |
0.0 |
5.0 |
0.0 |
5.0 |
0.0 |
0.0 |
1.0 |
1.0 |
5.0 |
5.0 |
5 |
| 1 |
my_categorical_col_EQ_bid |
1 |
0 |
3 |
3 |
0 |
0.0 |
0.0 |
0.0 |
3.0 |
0.0 |
0.0 |
NaN |
NaN |
3.0 |
0.0 |
3 |
| 2 |
my_categorical_col_EQ_activity |
1 |
0 |
1 |
1 |
0 |
0.0 |
0.0 |
0.0 |
1.0 |
0.0 |
0.0 |
NaN |
NaN |
1.0 |
0.0 |
1 |
| 3 |
my_categorical_col_EQ_user |
1 |
0 |
1 |
1 |
0 |
0.0 |
0.0 |
0.0 |
1.0 |
0.0 |
0.0 |
NaN |
NaN |
1.0 |
0.0 |
1 |