Query Options
The following options are available:
max_columns: the maximum number of columns to return, top N columns is computed from count of non-null values.
global_min_total_count: require that at least this many rows contain a non-null value, or drop the column.
apply_feature_filters: flag on/off applying feature filtering (default true)
apply_charset_filter: flag on/off cleaning the values of numerical and categorical feature columns (default true)
drop_empty_rows: if on, remove rows” that have no non-null values. (default false)
expand_numerical_features: expand numerical feature arrays into multiple columns. (default false)
drop_numerical_zero_features: drop numerical feature columns that contain all zeros. (default false)
drop_constant_feature_columns: drop numerical feature columns that are constant (default false)
throw_expression_errors: use “fail fast” behavior with invalid expressions (default false)
debug_expressions: return extended debugging information about TQL expression evaluation with the result set.
fill_na: replace all numerical features with non-numeric values with 0.0.
numerical_feature_precision: how many decimal places to return.
numerical_feature_epsilon: abs(val) < eps will be rounded down to zero.
fix_column_names: specify if backend should rename duplicate column names. (default true)
from zeenk.tql import *
The drop_constant_feature_columns Option
Constant feature columns will be dropped
# constant feature columns will be dropped
select(
numerical(1.01), #constant numerical feature columns will be dropped
categorical('"blue"'), #constant categorical feature columns will be dropped
categorical('if(random() < .5, "cat", "dog")'), # non-constant columns will NOT be dropped
)\
.from_events('lethe4')\
.options(drop_constant_feature_columns=1)\
.limit(3)
Query results:
partition "_default"0 |
---|
1 |
2 |
The numerical_feature_epsilon Option
Numerical features, labels, and weights with a value < epsilon will be rounded down to zero.
select(
label(.01), # epsilon will be applied to label columns
weight(.01), # epsilon will be applied to weight columns
numerical(.01), # epsilon will be applied to numerical feature columns
numerical('"foo:.01"'), # epsilon will be to mixed numerical feature columns
categorical(.01), # epsilon will NOT be applied here since the column type is CATEGORICAL
.01 # epsilon will NOT be applied here since the column type is METADATA
)\
.from_events('lethe4')\
.options(numerical_feature_epsilon=1)\
.limit(3)
Query results:
partition "_default"_label | _weight | _c2 | _c3 | _c4 | _c5 | |
---|---|---|---|---|---|---|
0 | 0.0 | 0.0 | 0 | foo:0 | 0.01 | 0.01 |
1 | 0.0 | 0.0 | 0 | foo:0 | 0.01 | 0.01 |
2 | 0.0 | 0.0 | 0 | foo:0 | 0.01 | 0.01 |