Query Options

The following options are available:

  • max_columns: the maximum number of columns to return, top N columns is computed from count of non-null values.

  • global_min_total_count: require that at least this many rows contain a non-null value, or drop the column.

  • apply_feature_filters: flag on/off applying feature filtering (default true)

  • apply_charset_filter: flag on/off cleaning the values of numerical and categorical feature columns (default true)

  • drop_empty_rows: if on, remove rows” that have no non-null values. (default false)

  • expand_numerical_features: expand numerical feature arrays into multiple columns. (default false)

  • drop_numerical_zero_features: drop numerical feature columns that contain all zeros. (default false)

  • drop_constant_feature_columns: drop numerical feature columns that are constant (default false)

  • throw_expression_errors: use “fail fast” behavior with invalid expressions (default false)

  • debug_expressions: return extended debugging information about TQL expression evaluation with the result set.

  • fill_na: replace all numerical features with non-numeric values with 0.0.

  • numerical_feature_precision: how many decimal places to return.

  • numerical_feature_epsilon: abs(val) < eps will be rounded down to zero.

  • fix_column_names: specify if backend should rename duplicate column names. (default true)

from zeenk.tql import *

The drop_constant_feature_columns Option

Constant feature columns will be dropped

# constant feature columns will be dropped 
select(
  numerical(1.01), #constant numerical feature columns will be dropped
  categorical('"blue"'), #constant categorical feature columns will be dropped 
  categorical('if(random() < .5, "cat", "dog")'), # non-constant columns will NOT be dropped
)\
.from_events('lethe4')\
.options(drop_constant_feature_columns=1)\
.limit(3)

Query results:

partition "_default"
0
1
2
query produced 3 rows x 0 columns in 0.46 seconds

The numerical_feature_epsilon Option

Numerical features, labels, and weights with a value < epsilon will be rounded down to zero.

select(
  label(.01), # epsilon will be applied to label columns
  weight(.01), # epsilon will be applied to weight columns
  numerical(.01), # epsilon will be applied to numerical feature columns
  numerical('"foo:.01"'), # epsilon will be to mixed numerical feature columns 
  categorical(.01), # epsilon will NOT be applied here since the column type is CATEGORICAL
  .01 # epsilon will NOT be applied here since the column type is METADATA
)\
.from_events('lethe4')\
.options(numerical_feature_epsilon=1)\
.limit(3)

Query results:

partition "_default"
_label _weight _c2 _c3 _c4 _c5
0 0.0 0.0 0 foo:0 0.01 0.01
1 0.0 0.0 0 foo:0 0.01 0.01
2 0.0 0.0 0 foo:0 0.01 0.01
query produced 3 rows x 6 columns in 0.48 seconds