# Using ResultSets TQL ResultSet is the result of calling the [Query](queries) object's `submit()` method. ResultSets contain ***exactly one row per event***, and ***exactly one column per TQL expression*** defined in the `select()` clause. When submitting a [Query](queries), you can define it to be either interactive with `submit(interactive=True)` or asynchronous with `submit(interative=False)`. Interactive queries store their ResultSet data in memory. Asynchronous queries store their ResultSets on disk. ResultSets stored on disk can be recalled at a later time, see the [load() function](#Load) below. ```python from zeenk.tql import * ``` ------------------------------------------------------------------ Version: 20.1.17-SNAPSHOT Version Timestamp: 2021-08-27 20:56:17 Version Age: 4 days, 13 hours, 39 minutes, 56 seconds Filesystem Root: /Users/zkozick/.tql/files Working Directory: /Users/zkozick/.tql Configuration File: /Users/zkozick/.tql/tql-conf.yml Api Gateway: http://localhost:9000 Service Status: Icarus: ONLINE, Daedalus: ONLINE Service Uptime: 1 days, 18 hours, 7 minutes, 55 seconds ------------------------------------------------------------------ ## Loading ResultSet Data ### Loading Data as a Python List The `.dataframe()` function returns the dataframe as a Python list of lists from all partitions: ```python resultset = select('bid.*').from_events('lethe4').limit(5).submit() # first row of the dataframe resultset.dataframe() ``` [[None, None, None, None, None, None, None], ['big', '1.0', '2021-01-02 00:02:36', 'false', 'r13260', 'u252', 'true'], ['big', '1.0', '2021-01-02 00:16:41', 'false', 'r13376', 'u252', 'true'], ['small', '1.0', '2021-01-02 00:30:55', 'false', 'r13509', 'u252', 'true'], [None, None, None, None, None, None, None]] ### Loading Data as a Pandas Dataframe Similar to dataframe, `.pandas_dataframe()` returns the data, but as a [Pandas](https://pandas.pydata.org/) dataframe. ```python resultset = select('bid.*').from_events('lethe4').limit(5).submit() resultset.pandas_dataframe() ```

	bid_ad_size	bid_bid	bid_event_time	bid_ghosted	bid_request_id	bid_user_id	bid_won
0	None	None	None	None	None	None	None
1	big	1.0	2021-01-02 00:02:36	false	r13260	u252	true
2	big	1.0	2021-01-02 00:16:41	false	r13376	u252	true
3	small	1.0	2021-01-02 00:30:55	false	r13509	u252	true
4	None	None	None	None	None	None	None

### Loading Data as a Spark Dataframe Similar to dataframe, `.spark_dataframe()` returns the data, but as a [Spark](https://spark.apache.org/docs/3.1.1/sql-programming-guide.html#untyped-dataset-operations-aka-dataframe-operations) dataframe. The backend Java process that manipulated the data outputs it as strings so Spark cannot always automatically infer data types. ```python resultset = select("'test'", '1').from_events('lethe4').limit(5).submit() spark_df = resultset.spark_dataframe() spark_df.show(2) ``` +----+---+ | _1| _2| +----+---+ |test| 1| |test| 1| +----+---+ only showing top 2 rows ## ResultSet Partitions ResultSet data is always partitioned, though if the query did not define a `partition_key(expr)`, then there will only be a single "default" partition with name `_default`. If .`partition_key(expr)` was defined, then there will be N partitions, one for each unique value output by the partition key expression. ### Default Partition The method `default_partition()` outputs the contents of the default partition. ```python resultset = select('bid.*').from_events('lethe4').limit(5).submit() resultset.default_partition() ```

	bid_ad_size	bid_bid	bid_event_time	bid_ghosted	bid_request_id	bid_user_id	bid_won
0	None	None	None	None	None	None	None
1	big	1.0	2021-01-02 00:02:36	false	r13260	u252	true
2	big	1.0	2021-01-02 00:16:41	false	r13376	u252	true
3	small	1.0	2021-01-02 00:30:55	false	r13509	u252	true
4	None	None	None	None	None	None	None

### Partition Names The `partition_names()` method outputs a tuple of partition names. ```python resultset = select('activity.*')\ .from_events('lethe4')\ .limit(7)\ .partition_by('IF(RANDOM() < 0.5, "train", "test")')\ .submit() resultset.partition_names() ``` ('test', 'train') ### Partitions The `partitions()` method outputs [Partition](tql.html#tql.resultset.Partition) objects. ```python for part in resultset.partitions(): print(f""" name: {part.name()} row_count: {part.row_count()}""") ``` name: test row_count: 5 name: train row_count: 2 ### Partition The `partition()` method gets the partition supplied as an argument. If no argument is supplied, it tries to get the default partition. ```python resultset.partition('test') ```

	activity_activity_id	activity_conversion	activity_event_time	activity_user_id
0	None	None	None	None
1	None	None	None	None
2	None	None	None	None
3	None	None	None	None
4	None	None	None	None

## Retrieving Column Names Calling column_names() retrieves the columns of the ResultSet. ```python resultset = select('bid.*').from_events('lethe4').limit(5).submit() resultset.column_names() ``` ## Row Count The `row_count()` method returns the total number of rows in the ResultSet over all partitions. ```python resultset.row_count() ``` 7 ## Positive row count The `positive_row_count()` method returns the number of positive rows. ```python resultset.positive_row_count() ``` 0 ## Get Query The method `get_query()` returns the query that was used to generate the ResultSet ```python query = resultset.get_query() query.describe() ```

Query
Attribute	Value
project_id	4
timeline_limit	-1
row_limit	7
partition_key_expression	IF(RANDOM() < 0.5, "train", "test")
interactive	True
allow_invalid_column_expressions	True

Query.columns
name	expression	type
activity_activity_id	activity.activity_id	METADATA
activity_conversion	activity.conversion	METADATA
activity_event_time	activity.event_time	METADATA
activity_user_id	activity.user_id	METADATA

## Get ID The `get_id` method returns the ID of the ResultSet if the query was executed as a non-interactive job. Otherwise, it returns `-1`. ```python resultset = select('bid.*').from_events('lethe4').limit(5).submit(interactive=False) resultset.get_id() ``` Click to Abort Job

NOTE: Abort link will open the result of the call in a new tab/window

SUCCESS: 8/8|██████████|, TOTAL ELAPSED=0:00:17, WAIT=0:00:00, PENDING=0:00:01, RUNNING=0:00:16 33 ## Load The `load()` method takes the ResultSet ID as an argument and loads it into the notebook. You can optionally set `analyze` parameter to `True` to get an analysis of the Query. ```python resultset = select('bid.*').from_events('lethe4').limit(5).submit(interactive=False) rs_id = resultset.get_id() load_resultset(rs_id).pandas_dataframe() ``` Click to Abort Job

NOTE: Abort link will open the result of the call in a new tab/window

SUCCESS: 8/8|██████████|, TOTAL ELAPSED=0:00:06, WAIT=0:00:00, PENDING=0:00:01, RUNNING=0:00:05

	bid_ad_size	bid_bid	bid_event_time	bid_ghosted	bid_request_id	bid_user_id	bid_won
0	None	None	None	None	None	None	None
1	big	1.0	2021-01-02 00:02:36	false	r13260	u252	true
2	big	1.0	2021-01-02 00:16:41	false	r13376	u252	true
3	small	1.0	2021-01-02 00:30:55	false	r13509	u252	true
4	None	None	None	None	None	None	None

## Refresh The `refresh()` method refreshes non-interactive ResultSet fields after query execution finishes. ```python resultset = select('bid.*').from_events('lethe4').limit(5).submit(interactive=False) resultset.refresh() resultset ``` Click to Abort Job

NOTE: Abort link will open the result of the call in a new tab/window

SUCCESS: 8/8|██████████|, TOTAL ELAPSED=0:00:07, WAIT=0:00:00, PENDING=0:00:01, RUNNING=0:00:06

Query results:

partition "_default"

	key	value
0	name	_default
1	data_path	file:/Users/smaguire/.tql/files/datasets/2021-08-31/94096521418144/data/partition=_default
2	row_count	40
3	columns	[bid_ad_size, bid_bid, bid_event_time, bid_ghosted, bid_request_id, bid_user_id, bid_won]
4	positive_row_count	0

query produced 5 rows x 7 columns in 8.2 seconds

## Metrics The method `metrics()` returns timing and runtime metadata about the ResultSet. ```python resultset = select( # column value metrics are computed when using FeatureColumn such as numerical() and categorical() numerical('1'), categorical('type', name='my_categorical_col') ).from_events('lethe4').limit(5).submit(analyze=True) resultset.metrics() ```

Expression timing stats:

	name	expression	mean(ms)	p95(ms)
0	_c0	1	0.007	0.022
1	my_categorical_col	type	0.015	0.046

Timeline & Event runtime:

	min	max	mean	total	p50	p75	p95	p99	unit
timeline_runtime	7405	7405	7405	7405	7405.0	7405.0	7405.0	7405.00	micros
event_runtime	29	652	193	773	42.0	50.0	531.6	627.92	micros

Query summary:

	key	value
0	query_runtime_micros	21732
1	outcome_generated	0
2	timelines_processed	1
3	real_events_processed	5
4	event_passed_filters	5
5	opportunity_event_count	0
6	null_label_evaluation_count	0
7	null_weight_evaluation_count	0
8	events_processed	5
9	column_expanded	False
10	dropped_empty_rows_after_conversion	0
11	positive_example_count	0
12	negative_example_count	5
13	empty_example_count	0
14	label_sum	0.0
15	weighted_label_sum	0.0

Column value metrics:

	name	column_index	negative_count	non_null_count	non_zero_count	negative_sum	negative_weighted_count	min_numerical_value	max_numerical_value	total_weighted_count	total_sum	total_count
0	_c0	0	5	5	5	5.0	5.0	1.0	1.0	5.0	5.0	5
1	my_categorical_col_EQ_bid	1	3	3	0	0.0	3.0	NaN	NaN	3.0	0.0	3
2	my_categorical_col_EQ_activity	1	1	1	0	0.0	1.0	NaN	NaN	1.0	0.0	1
3	my_categorical_col_EQ_user	1	1	1	0	0.0	1.0	NaN	NaN	1.0	0.0	1