causmos package

causmos.categorical(expr, name: Optional[str] = None, filters: Optional[dict] = None) → zeenk.tql.column.FeatureColumn

Creates a categorical feature column from a TQL expression, optionally providing name and filters

Parameters

expr – A TQL expression string
name (str) – Optionally provide a name for the column. If not provided a default name will be assigned
filters (dict) – Optionally provide a dictionary of column filter conditions for categorical or numerical filters

Returns

A FeatureColumn object to be provided to select(...)

Return type

FeatureColumn

causmos.causmos(project_id: Optional[int] = None, name: Optional[str] = None, author: Optional[str] = None)

Initialize a Causmos model. See the Causmos class for more detailed documentation.

Parameters

project_id (int) – Use timelines from this project_id.
name (str) – Name of the Causmos model that will be published.
author (str) – Author of the Causmos model.

Returns

A Causmos model configuration object.

causmos.col(c: any, name: Optional[str] = None, type: Optional[str] = None, filters: Optional[dict] = None)

Creates a column from a TQL expression or given input, optionally providing name, type, and filters. The first argument can be a variety of formats, including:

classes or subclasses of type Column

dictionary containing keys for name, expression, and type

raw strings, which will be interpreted as TQL expressions.

The created column is by default unnamed, and will be assigned a name if used in a TQL select(...) statement. The default type of columns is ‘METADATA’.

Parameters

c (any) – A TQL expression string, column object, dictionary, list, tuple, or FeatureColumn object
name (str) – Optionally provide a name for the column. If not provided a default name will be assigned
type (str) – Optionally provide a type for the column. If not provided, the column type will be METADATA
filters (dict) – Optionally provide a dictionary of column filter conditions for categorical or numerical filters

Returns

A Column or FeatureColumn object to be provided to select(...)

causmos.constant()

Used in select(…) statements to select the FeatureColumn “1.0” with name “constant”

Returns: A tuple with the FeatureColumn “1.0” with name “constant”
Return type: tuple

causmos.create_project(name: str, or_update: bool = False) → zeenk.tql.timelines.Project

Creates a new project. This project will require further configuration before timelines can be created. If the project already exists and you wish to update it, use load_project(name), or give the or_update=True option here.

Parameters

name (str) – The name of the Project
or_update (bool) – Whether to update the Project

Returns

The Project that has just been created

Return type

Project

causmos.create_timeseries(name: str) → zeenk.tql.timeseries.TimeSeries

Creates a TimeSeries with the given name

Parameters: name (str) – The name of the TimeSeries
Returns: A new TimeSeries with the given name
Return type: TimeSeries

causmos.debugger(project, expression: str = '', theme: str = 'light')

Create an expression debugger widget - default row limit of 10 is applied automatically. Only works in Jupyter Notebooks or Jupyter Labs

Parameters

project – The Project ID or name to run the expression against
expression (str) – The initial value of the expression, if any
theme (str) – The editor theme, ‘light’ or ‘dark’

Returns

A Jupyter Notebook/Lab widget

causmos.delete_udf(project_id: int, function_name: str)

Deletes a UDF for a specific Project by function name

Parameters

project_id (int) – ID of the Project to delete udf from
function_name (str) – Name of the udf to be deleted

causmos.describe_project(project_identifier, fail_if_not_found: bool = True) → zeenk.tql.timelines.Project

Loads the specified project by name, or throws TQLAnalysisException if not found

Parameters

project_identifier – The name (str) or ID (int) of the Project
fail_if_not_found (bool) – Whether to throw an exception if the Project is not found

Returns

The Project with the specified name or ID

Return type

Project

causmos.drop_project(name_or_id, if_exists: bool = False)

Deletes the project by name or ID

Parameters

name_or_id – The name (str) or ID (int) of the Project
if_exists (bool) – Only drop the Project if it exists

causmos.event_metadata() → tuple

To be used in select(…) statements for returning timeline.id, id, datetime, and type

Returns: A tuple of Columns
Return type: tuple

causmos.event_time() → tuple

To be used in select(…) statements for returning timestamp and duration

Returns: A tuple of Columns
Return type: tuple

causmos.generate_events(sample_generator_expression: str, functions: Optional[list] = None, variables: Optional[dict] = None, attribute_expressions: Optional[dict] = None, inherit_attributes: bool = False) → zeenk.tql.sampling.SamplingConf: Manual sampling is similar to importance sampling but allows the user to manually override parameters such as the list of sample events (e.g., timestamps at which sampling occurs) and other properties of each generated sample event. See generate_importance_events() for an advanced use case.

causmos.generate_importance_events(num_samples: float = - 1.0, min_ts: Optional[str] = None, max_ts: Optional[str] = None, time_shift_factor: float = 0.0, fraction_uniform_time: float = 0.0, sampling_distribution: str = 'exponential', sampling_events_expression: str = 'FILTER(timeline.events, (x) -> x.type=null)', sampling_kernels: str = '5m,15m,1h,4h,1d,3d,7d') → zeenk.tql.sampling.SamplingConf

Importance sampling can be used to retain modeling unbiasedness (avoid introducing selection bias when sampling records) while still increasing the number of records where the modeling is most interesting. For example, when modeling the causal effect of a treatment on an outcome, we would like to ensure that most of our records (whether ‘positive’, an outcome event, or ‘negative’, a non-outcome sampling event) are in the vicinity of a treatment opportunity or an outcome. By so doing, we increase the model’s statistical power at deciphering the relationship between the two. In contrast, if most outcomes happen during only 10% of the sample time period, we most of our observations will be during the “boring” portion of the timeline when no events of interest are occurring.

generate_importance_events helps you configure what time periods are “interesting.” You configure how many records to randomly sample for each timeline, which timestamps or events you want to increase your sampling around, the distribution (shape and scale) around each event from which you would like to randomly sample, and the probability you would like to draw from the background uniform distribution (e.g., a random point in the timeline).

In summary, one way to generate negative (non-outcome) records would be to simply draw uniformly between the start and end of the timeline’s observation window. However, we can improve upon that by instructing the extractor to generate these negatives by a configurable time-importance-weighted sampling methodology around times of timeline events.

causmos.get_project()

causmos.get_udf(project_id: int, function_name: str) → zeenk.tql.udf.UDF

Retrieve a specific UDF from a Project

Parameters

project_id (int) – ID of the Project to retrieve UDF on
function_name (str) – Name of the UDF to retrieve

Returns

A UDF or throws TQLAnalysisNotFound exception

Return type

UDF

causmos.intervention(where, value='1', name=None, project_id=None, generate=None, dedup=None, dynamics=None, attributes=None, vars: Optional[dict] = None)

Creates an event group from a TQL expression. EventGroup objects also have a number of additional parameters that can be customized for modeling.

Parameters

where (str) – a TQL expression string which subsets the timeline to this event group.
value (str) – a TQL expression string with the return value of the event for modeling.
name (str) – name of the event group. If not provided a default name will be assigned.
project_id (str) – project id from which to pull the timelines for modeling.
generate (str) – a SamplingSettings object or simple expression that can be coerced to generate events.
dedup (dict) – for now, a dictionary with several keys to configure deduplication. Future: a DedupEvents object. all (bool): uses all events or limits to the first event matching the “where” expression. zero_value (bool): skips any events whose ‘value’ expression is zero. where (str): skips any events which do not satisfy the expression for deduping. window (str): skips any events which have a non-skipped event within ‘window’ milliseconds prior. Default: {'all': False, 'zero_value': False, 'where': "TRUE", 'window': "0"} Example: {'all': True, 'zero_value': True, 'where': "TRUE", 'window': "7 * 24 * 3600 * 1000"}.
dynamics (dict) –
for now, a dictionary with ‘scale’, ‘shape’, ‘filters’, and ‘epsilon’. Future: a DynamicEvents object. Example: {'scale': '5m,1h,4h,1d,3d,7d', 'shape': 'exponential', 'filters': {'min_total_count': 200}, 'epsilon': 1e-9}. scale (str): the time scale over which an event may have predictive power or causal effects

(default ‘5m,1h,4h,1d,3d,7d’).

shape (str): the decay function shape (default ‘exponential’). filters (dict): A default feature filters configuration dictionary. epsilon (float): The threshold value at which SUM_OPPORTUNITIES() rounds to zero.
attributes (str) – a list of expressions describing each event in the group for modeling heterogeneity. If columns with feature filter configurations are provided, the filter configurations are preserved when applying dynamic filters.
vars (dict) – a dictionary keyed with any additional ‘global_vars’, ‘timeline_vars’, or ‘event_vars’ required to evaluate any expression in the event group. See also Query.vars().

Returns

an EventGroup object

causmos.kernelize(features, name: Optional[str] = None, treatment_filter_event: Optional[str] = None, event_group_where_event: Optional[str] = None, intended_action: Optional[list] = None, actual_action: Optional[list] = None, treatment_model: Optional[str] = None, kernel_types: Optional[list] = None, opportunity_conf: Optional[zeenk.tql.opportunity_conf.OpportunityConf] = None, kernel_parameters: Optional[str] = None, kernel_distribution: Optional[str] = None, filters: Optional[dict] = None, parse_kv: bool = False)

Takes a list of opportunity-level features based on user/opportunity/treatment/proxy-outcome-level fields and create a transformed feature with SUM_OPPORTUNITIES() accumulating the effects across each opportunity. These “kernelized” features are for use in a Causmos timeline-based causal model where each observation record in a dataset is an outcome or a potential outcome (e.g., a moment in time when an outcome could have occurred). For example, we can power dataset generation using a treatment-propensity model to reduce the potential bias if treatment and the outcome are correlated. Apply this function to each list of features to be kernelized. There are several types of transformations described as (KD = ‘treatment’, BKD = ‘baseline’, GKD = ‘ghost’, NKD = ‘nonrandom’).

Parameters

features (list) – List of col() to be kernelized (KD,BKD,GKD,NKD). Examples: ["1.0", "IF(1.0,'L','R')"], [{"name":"constant", "expression":"1.0", "type":"NUMERICAL"}]). Columns can be “CATEGORICAL” or “NUMERICAL”. Each categorical expression creates an expansion of features; each numerical expression multiplies/reweights the opportunity’s contribution to the SUM_OPPORTUNITIES() sum.
name (str) – Base string for the feature sets. Will append type of kernel and prepend “w” to denote non-incrementality features for BKD, GKD, and NKD. If name=’’ (default), just return the list of features with the default prefix: AKD_..., wBKD_....
treatment_filter_event (str) – Column that defines whether a treatment opportunity resulted in treatment. For example, was the advertiser’s impression shown after bidding? In a sample dataset, request.impression is the relevant field.
event_group_where_event (str) – second where condition within passed EventGroup (currently this is set to be output from filtering like dedupe - but should be made more explicit).
intended_action – List of numerical/categorical expressions defining the intended/optimal decisions, e.g., bid_amount or eligible_treatment_groups.
actual_action – List of numerical/categorical expressions defining the actual action/decision taken, e.g., IF(ghost_bid, 0, bid_amount) or assigned_treatment_group.
treatment_model (str) – String for the deployed/published treatment prediction model (GKD, NKD-only). In practice, this will be the win-rate model based on the leaves.
kernel_types (list) – List of kernels types to include in the feature set. Subset of ['KD','BKD','GKD','NKD'].
opportunity_conf – An OpportunityConf() object that contains the defaults for opportunity_filter, kernel_parameters, and kernel_distribution.
kernel_parameters (str) – Calibration for kernel.days in the kernel parameters. Arrays of kernel features are created with suffixes using values such as seconds (s), minutes (m), hours (h), and days (d) in combination with a number, such as '15m,4h,3d': 'name_feature' becomes 'name_feature-15m', etc.
kernel_distribution (str) – Positive-support distributions (short-form abbreviations): exponential (e, exp), uniform (u, unif), triangular (t, tri), halfnormal (h, hnorm), and halflogistic (l, hlog). Positive- & negative-support distributions are the symmetric analogs of the positive-support distributions: laplace (a, lap), rectangular (r, rect), symmetrictriangular (s, stri), normal (n, norm), and logistic (o, log). Time-independent constant kernel (c, const) is also useful for various use cases such as static models in order to accumulate all opportunities and treatments for each outcome.
filters (dict) – column filters to apply to all of the features.
parse_kv (boolean) – Try to parse ‘NUMERICAL’ input features as key-value pairs ‘k:v’. This has extra overhead, but more complex mixed ‘categorical:numerical’ input features.

Returns

A list of feature sets for each kernel_type, each containing a list of kernelized features.

Return type

list

causmos.label(expr, name: str = '_label') → zeenk.tql.column.Column

Creates a label column from a TQL expression. The expression provided to label() is expected to return a numeric value for all rows. If a numeric or NaN/infinite value is not returned for any row, it will be replaced with the default label value of 0.0. Label columns will be automatically be named “_label”. It is expected that a dataset will have at most one label column.

Parameters

expr – A TQL expression string
name (str) – Optionally provide a name for the column. If not provided a default name will be assigned

Returns

A Column object to be provided to select(...)

Return type

Column

causmos.list_udfs(project_id: int) → zeenk.tql.udf.UDF

Retrieves all UDFs defined for a Project

Parameters: project_id (int) – ID of the Project to retrieve UDFs on
Returns: All existing UDFs for the Project
Return type: UDF

causmos.load_project(project_identifier, fail_if_not_found: bool = True) → zeenk.tql.timelines.Project

Loads the specified project by name, or throws TQLAnalysisException if not found

Parameters

project_identifier – The name (str) or ID (int) of the Project
fail_if_not_found (bool) – Whether to throw an exception if the Project is not found

Returns

The Project with the specified name or ID

Return type

Project

causmos.load_query(id: int) → zeenk.tql.query.Query

Loads the query from the given ID

Parameters: id (int) – The ID of the query
Returns: A new Query instance
Return type: Query

causmos.load_resultset(id: int) → zeenk.tql.resultset.ResultSet

Loads the ResultSet with the given ID

Parameters: id (int) – The ID of the ResultSet to be loaded
Returns: The ResultSet
Return type: ResultSet

causmos.metadata(expr, name: Optional[str] = None) → zeenk.tql.column.Column

Creates a metadata column from a TQL expression. Metadata columns will return the expression values “as is”, meaning they will not be post-processed with charset filtering or expansion of numerical columns. Metadata columns are also not subject to column filters. Metadata columns are the default column type and are often the correct choice for arbitrary datasets that are not specifically intended to be consumed by a ML training package.

Parameters

expr – A TQL expression string
name (str) – Optionally provide a name for the column. If not provided a default name will be assigned

Returns

A Column object to be provided to select(...)

Return type

Column

causmos.numerical(expr, name: Optional[str] = None, filters: Optional[dict] = None) → zeenk.tql.column.FeatureColumn

Creates a numerical feature column from a TQL expression, optionally providing name and filters

Parameters

expr – A TQL expression string
name (str) – Optionally provide a name for the column. If not provided a default name will be assigned
filters (dict) – Optionally provide a dictionary of column filter conditions for categorical or numerical filters

Returns

A FeatureColumn object to be provided to select(...)

Return type

FeatureColumn

causmos.opportunity(where: str, value: str = '1', name: Optional[str] = None, project_id: Optional[str] = None, generate=None, dedup: Optional[dict] = None, dynamics: Optional[dict] = None, attributes: Optional[list] = None, vars: Optional[dict] = None)

Creates an opportunity event group from a TQL expression. EventGroup objects also have a number of additional parameters that can be customized for modeling.

Parameters

where (str) – a TQL expression string which subsets the timeline to this event group.
value (str) – a TQL expression string with the return value of the event for modeling.
name (str) – name of the event group. If not provided a default name will be assigned.
project_id (str) – project id from which to pull the timelines for modeling.
generate (str) – a SamplingSettings object or simple expression that can be coerced to generate events.
dedup (dict) – for now, a dictionary with several keys to configure deduplication. Future: a DedupEvents object. all (bool): uses all events or limits to the first event matching the “where” expression. zero_value (bool): skips any events whose ‘value’ expression is zero. where (str): skips any events which do not satisfy the expression for deduping. window (str): skips any events which have a non-skipped event within ‘window’ milliseconds prior. Default: {'all': False, 'zero_value': False, 'where': "TRUE", 'window': "0"} Example: {'all': True, 'zero_value': True, 'where': "TRUE", 'window': "7 * 24 * 3600 * 1000"}.
dynamics (dict) –
for now, a dictionary with ‘scale’, ‘shape’, ‘filters’, and ‘epsilon’. Future: a DynamicEvents object. Example: {'scale': '5m,1h,4h,1d,3d,7d', 'shape': 'exponential', 'filters': {'min_total_count': 200}, 'epsilon': 1e-9}. scale (str): the time scale over which an event may have predictive power or causal effects

(default ‘5m,1h,4h,1d,3d,7d’).

shape (str): the decay function shape (default ‘exponential’). filters (dict): A default feature filters configuration dictionary. epsilon (float): The threshold value at which SUM_OPPORTUNITIES() rounds to zero.
attributes (list) – a list of expressions describing each event in the group for modeling heterogeneity. If columns with feature filter configurations are provided, the filter configurations are preserved when applying dynamic filters.
vars (dict) – a dictionary keyed with any additional ‘global_vars’, ‘timeline_vars’, or ‘event_vars’ required to evaluate any expression in the event group. See also Query.vars().

Returns

an EventGroup object

causmos.outcome(where, value='1', name=None, project_id=None, generate=None, dedup=None, dynamics=None, attributes=None, vars: Optional[dict] = None)

Creates an outcome event group from a TQL expression. EventGroup objects also have a number of additional parameters that can be customized for modeling.

Parameters

where (str) – a TQL expression string which subsets the timeline to this event group.
value (str) – a TQL expression string with the return value of the event for modeling.
name (str) – name of the event group. If not provided a default name will be assigned.
project_id (str) – project id from which to pull the timelines for modeling.
generate (str) – a SamplingConf object or simple expression that can be coerced to generate events.
dedup (dict) – for now, a dictionary with several keys to configure deduplication. Future: a DedupEvents object. all (bool): uses all events or limits to the first event matching the “where” expression. zero_value (bool): skips any events whose ‘value’ expression is zero. where (str): skips any events which do not satisfy the expression for deduping. window (str): skips any events which have a non-skipped event within ‘window’ milliseconds prior. Default: {'all': False, 'zero_value': False, 'where': "TRUE", 'window': "0"} Example: {'all': True, 'zero_value': True, 'where': "TRUE", 'window': "7 * 24 * 3600 * 1000"}.
dynamics (dict) –
for now, a dictionary with ‘scale’, ‘shape’, ‘filters’, and ‘epsilon’. Future: a DynamicEvents object. Example: {'scale': '5m,1h,4h,1d,3d,7d', 'shape': 'exponential', 'filters': {'min_total_count': 200}, 'epsilon': 1e-9}. scale (str): the time scale over which an event may have predictive power or causal effects

(default ‘5m,1h,4h,1d,3d,7d’).

shape (str): the decay function shape (default ‘exponential’). filters (dict): A default feature filters configuration dictionary. epsilon (float): The threshold value at which SUM_OPPORTUNITIES() rounds to zero.
attributes (str) – a list of expressions describing each event in the group for modeling heterogeneity. If columns with feature filter configurations are provided, the filter configurations are preserved when applying dynamic filters.
vars (dict) – a dictionary keyed with any additional ‘global_vars’, ‘timeline_vars’, or ‘event_vars’ required to evaluate any expression in the event group. See also Query.vars().

Returns

an EventGroup object

causmos.query(project='lethe4')

A simple example query with event_metadata() and a limit of 10

Parameters: project – The Project or project ID to run the Query against
Returns: An example Query object
Return type: Query

causmos.random_partition(partition_var: str = 'timeline.id', seed: str = '', shares: list = [0.8, 0.2], names: list = ['train', 'test'])

Use the arguments to create a random partition TQL expression like col("MD5_PARTITIONS(timeline.id, 'my hashing seed', [0.008, 0.002], ['train','test'])"). This can be used as a column or with .partition_by().

Parameters

partition_var (str) – Single-line TQL subexpression
seed (str) – String to serve as the seed for the hash partitioning
shares (list) – List of the relative shares of each partition. Relative shares do not need to sum to one
names (list) – List of the names for each partition

Returns

A metadata column with the random_partition expression for use in .partition_by() or as a column

Return type

Column

causmos.select(*cols) → zeenk.tql.query.Query

Create a new query object from one or more columns. Columns can be defined as TQL expression strings, or wrapped using one of the provided TQL column functions label(), weight(), tag(), categorical(), numerical(), or metadata(). Typically select(...) will immediately be followed by .from_timelines(...) or .from_events(...) during query construction.

Parameters: cols – One or more TQL column objects or expressions
Returns: A new tql.query.Query instance for further chaining/modification
Return type: Query

causmos.set_project(p_id)

causmos.show_projects(): Shows the available projects

causmos.tag(expr, name: str = '_tag') → zeenk.tql.column.Column

Creates a tag column from a TQL expression. The expression provided to tag() is expected to return a non-null value for all rows. Typically this expression will uniquely identify the row, which is useful for debugging and tracing datasets later. Uniqueness is not required for the return value, but highly encouraged. Tag columns will automatically be named “_tag”. It is expected that a dataset will have at most one tag column.

Parameters

expr – A TQL expression string that returns a unique identifier for the row
name (str) – Optionally provide a name for the column. If not provided a default name will be assigned

Returns

A Column object to be provided to select(...)

Return type

Column

causmos.treatment(where, value='1', name=None, project_id=None, generate=None, dedup=None, dynamics=None, attributes=None, vars: Optional[dict] = None)

Creates a treatment event group from a TQL expression. EventGroup objects also have a number of additional parameters that can be customized for modeling.

Parameters

where (str) – a TQL expression string which subsets the timeline to this event group.
value (str) – a TQL expression string with the return value of the event for modeling.
name (str) – name of the event group. If not provided a default name will be assigned.
project_id (str) – project id from which to pull the timelines for modeling.
generate (str) – a SamplingSettings object or simple expression that can be coerced to generate events.
dedup (dict) – for now, a dictionary with several keys to configure deduplication. Future: a DedupEvents object. all (bool): uses all events or limits to the first event matching the “where” expression. zero_value (bool): skips any events whose ‘value’ expression is zero. where (str): skips any events which do not satisfy the expression for deduping. window (str): skips any events which have a non-skipped event within ‘window’ milliseconds prior. Default: {'all': False, 'zero_value': False, 'where': "TRUE", 'window': "0"} Example: {'all': True, 'zero_value': True, 'where': "TRUE", 'window': "7 * 24 * 3600 * 1000"}.
dynamics (dict) –
for now, a dictionary with ‘scale’, ‘shape’, ‘filters’, and ‘epsilon’. Future: a DynamicEvents object. Example: {'scale': '5m,1h,4h,1d,3d,7d', 'shape': 'exponential', 'filters': {'min_total_count': 200}, 'epsilon': 1e-9}. scale (str): the time scale over which an event may have predictive power or causal effects

(default ‘5m,1h,4h,1d,3d,7d’).

shape (str): the decay function shape (default ‘exponential’). filters (dict): A default feature filters configuration dictionary. epsilon (float): The threshold value at which SUM_OPPORTUNITIES() rounds to zero.
attributes (str) – a list of expressions describing each event in the group for modeling heterogeneity. If columns with feature filter configurations are provided, the filter configurations are preserved when applying dynamic filters.
vars (dict) – a dictionary keyed with any additional ‘global_vars’, ‘timeline_vars’, or ‘event_vars’ required to evaluate any expression in the event group. See also Query.vars().

Returns

an EventGroup object

causmos.update_udf(project_id: int, *function_src)

Updates UDF(s) for the specified Project

Parameters

project_id (int) – ID of the Project to upload UDFs to
function_src – One or more function’s source string

Throws TQLAnalysisException if function doesn’t exist in the Project yet

causmos.upload_udf(project_id: int, *function_src)

Uploads udf(s) to the specified Project

Parameters

project_id (int) – ID of the Project to upload UDFs to
function_src – One or more function’s source string

Throws TQLAnalysisException if function already exists in the Project

causmos.validate_udf(project_id: int, *function_src)

Validates UDF(s)

Parameters

project_id (int) – ID of the Project to validate UDFs on
function_src – One or more function’s source string

Throws TQLAnalysisException and printout the error message and location if there’s any compilation error in udf

causmos.weight(expr, name: str = '_weight') → zeenk.tql.column.Column

Creates a weight column from a TQL expression. The expression provided to weight() is expected to return a numeric value for all rows. If a numeric or NaN/infinite value is not returned for any row, it will be replaced with the default weight value of 1.0. Weight columns will be automatically be named “_weight”. It is expected that a dataset will have at most one weight column.

Parameters

expr – A TQL expression string that returns a numeric value
name (str) – Optionally provide a name for the column. If not provided a default name will be assigned

Returns

A Column object to be provided to select(...)

Return type

Column

Subpackages

causmos.tests package

Submodules

causmos.causmos_model module

class causmos.causmos_model.Causmos(project_id: Optional[int] = None, name: Optional[str] = None, author: Optional[str] = None, type: Optional[str] = None, outcome: Optional[causmos.event_group.EventGroup] = None, treatment: Optional[causmos.event_group.EventGroup] = None, opportunity: Optional[causmos.event_group.EventGroup] = None, intervention: Optional[causmos.event_group.EventGroup] = None, predictor: Optional[causmos.event_group.EventGroup] = None, baseline: Optional[causmos.event_group.EventGroup] = None)

Bases: object

Causmos Model Factory: Estimand & Sampling - Provide a configuration for an outcome - (Optional) Provide a configuration for treatment, opportunity, intervention, and/or predictor/baseline. - Choose a specific sampling methodology for model training/estimation.

Causmos is the core modeling question: - my_causmos.pre_post() –> Modeling instance –> Query/Dataset –> Training –> Publish –> Metrics - my_causmos.panel() –> Modeling instance –> Query/Dataset… - my_causmos.uplift() –> Modeling instance…

Create a Causmos model configuration object.

Parameters

project_id (int) – Use timelines from this project_id.
name (str) – Name of the Causmos model that will be published.
author (str) – Author of the Causmos model.
type (str) – Causmos model type.
outcome (EventGroup) – An EventGroup object of type==’OUTCOME’ defining the outcome events in the model.
treatment (EventGroup) – An EventGroup object of type==’TREATMENT’ defining the treatment events in the model.
opportunity (EventGroup) – An EventGroup object of type==’OPPORTUNITY’ defining the opportunity events.
intervention (EventGroup) – An EventGroup object of type==’INTERVENTION’ defining the intervention events.
predictor (EventGroup) – An EventGroup object of type==’EVENT_GROUP’ defining additional predictor events.
baseline (EventGroup) – An EventGroup object of type==’EVENT_GROUP’ defining additional baseline events.

MODEL_TYPES = ['TREATED', 'UPLIFT', 'AB', 'AB_TOT', 'PRE_POST', 'DIFF_IN_DIFF', 'AB_DIFF', 'AB_TOT_DIFF', 'PRE_POST_PANEL', 'DIFF_IN_DIFF_PANEL', 'AB_DIFF_PANEL', 'AB_TOT_DIFF_PANEL', 'PRE_POST_CONTINUOUS', 'DIFF_IN_DIFF_CONTINUOUS', 'AB_DIFF_CONTINUOUS', 'AB_TOT_DIFF_CONTINUOUS']

PANEL_TYPES = ('SIMPLE', 'CROSS', 'PANEL2', 'PANEL', 'CONTINUOUS')

ab(duration): Create a cross-sectional AB model. Given Outcome event group, compute the effect of an Intervention, assuming that the Intervention also defines Treatment.

ab_diff(duration): Create a two-time-period AB difference model. Given Outcome and Intervention event groups, estimate the effect of Treatment = Intervention, subsetting the sample to timelines with at least one intervention.

ab_diff_continuous(continuous_rescale: Optional[float] = None)

ab_diff_panel(duration)

ab_tot(duration): Create a cross-sectional AB Treatment-on-the-Treated (ToT) model. Given Outcome and Intervention event groups, estimate the effect of Treatment induced by the Intervention.

ab_tot_diff(duration)

ab_tot_diff_continuous(continuous_rescale: Optional[float] = None)

ab_tot_diff_panel(duration)

baseline(my_baseline)

Specify the baseline using a Baseline EventGroup object.

Parameters: my_baseline (EventGroup) – A Baseline EventGroup object specifying predictor/baseline events.

cluster_by(cluster_expression=None): Define the cluster variable to return in order to compute clustered standard errors, either analytically or using the Bayesian bootstrap. The default is to use timeline.id.

continuous()

cross(post, duration, cohorts): Create a model using generic cross-sectional panel sampling.

diff_in_diff(duration): Create a two-time-period Difference-in-Differences model. Given Outcome and Opportunity event groups, estimate the effect of Treatment, subsetting the sample to timelines with at least one opportunity.

diff_in_diff_continuous(continuous_rescale: Optional[float] = None)

diff_in_diff_panel(duration)

get_baseline()

get_cluster_by()

get_intervention()

get_name()

get_opportunity()

get_outcome()

get_partition_by()

get_predictor()

get_treatment()

get_type()

intervention(my_intervention)

Specify the intervention using an Intervention EventGroup object.

Parameters: my_intervention (EventGroup) – An Intervention EventGroup object specifying the intervention events.

json()

model(model_type)

name(name): Set the name of the Causmos model.

observables(observables)

Set the model’s observables. This determines the groups of features that are generated for treatment models.

Parameters

observables (list of str) –

List of which SUM_OPPORTUNITIES() feature transformations you would like to apply to the ‘opportunity_features’. List elements should be drawn from ['KD','BKD','GKD','NKD']:

KD: Treatment (e.g., Impression/Dose); always included
BKD: Opportunity (e.g., Bids)
GKD: Ghost bids / Propensity Score / Expected win rate (requires ‘treatment_model’)
NKD: Control function / Endogeneity / Unexpected win rate (requires ‘treatment_model’)

opportunity(my_opportunity)

Specify the opportunity using an Opportunity EventGroup object.

Parameters: my_opportunity (EventGroup) – An Opportunity EventGroup object specifying the opportunity events.

outcome(my_outcome: causmos.event_group.EventGroup)

Specify the outcome using an Outcome EventGroup object.

Parameters: my_outcome (EventGroup) – An Outcome EventGroup object specifying the outcome events.

panel()

panel2(pre, post, duration, cohorts)

Create a model using generic two-time-period panel sampling.

Parameters

pre – An expression that generates a list of pre-period samples.
post – An expression that generates a list of post-period samples.
duration – Time, in milliseconds, over which to aggregate features.
cohorts – A filter expression that defines the cohorts to be included.

partition_by(partition_key_expression=None): Define how to partition a model’s records. Default is to randomly partition using the random_partition() columnset’s defaults.

pre_post(duration): Create a two-time-period Pre-Post model. Given an Outcome event group, estimate the effect of Treatment, subsetting the sample to timelines with at least one treatment.

pre_post_continuous(continuous_rescale: Optional[float] = None)

pre_post_panel(duration)

predictor(my_predictor)

Specify predictors using a Predictor EventGroup object.

Parameters: my_predictor (EventGroup) – A Predictor EventGroup object specifying predictor/baseline events.

simple(duration)

(Non-causal) Prediction model with time-bucket sampling of ‘duration’. Compatible with PREDICT(‘SIMPLE_MODEL_NAME’) but not with CAUSAL_EFFECT(‘SIMPLE_MODEL_NAME’) as there is no definition of treatment, only a definition of an Outcome event group.

For example, a daily sales-volume model where the level of sampling would be determined by the end of the day and outcomes.

simple_continuous()

(Non-causal) Prediction model with event-level sampling. Compatible with PREDICT(‘SIMPLE_MODEL_NAME’) but not with CAUSAL_EFFECT(‘SIMPLE_MODEL_NAME’) as there is no definition of treatment, only a definition of an Outcome event group.

For example, a click-through rate (CTR) model where the level of sampling would be impression and the label would be a field or look-forward to an event recording a click event related to the impression event.

tql(): Build and return the TQL query based on the choice of analysis method’s panel sampling.

treated(duration): Create a cross-sectional treated model. This will collect all users who are treated and create a model to predict their time-aggregated Outcomes.

treatment(my_treatment)

Specify the treatment using an Treatment EventGroup object.

Parameters: my_treatment (EventGroup) – A Treatment EventGroup object specifying the treatment events.

type(type): Set the model Analysis Method. See self.MODEL_TYPES.

uplift(duration): Create a cross-sectional uplift model. Given Outcome and Opportunity event groups, compute the effect of Treatment events by comparing timelines.

class causmos.causmos_model.Continuous(start: Optional[str] = None, end: Optional[str] = None, interval: Optional[str] = None, sample: Optional[str] = None, cohorts: Optional[str] = None, heterogeneous: bool = False, truncate: bool = True, balance: bool = False, initialize: Optional[str] = None, trim: Optional[str] = None, continuous_rescale: Optional[float] = None)

Bases: causmos.causmos_model.Panel

Create a continuous-time panel object. This usually includes multiple samples for outcome events (e.g., “positives”) and samples for non-outcome events (e.g., “negatives”) which reflect the measure of time against which the positives will be compared to compute conversion rates. For example, if the measure of non-outcome samples is 10 days and there are 5 outcome events, then the average outcome rate would be 0.5 outcomes per day, though a model would seek to explain why certain times and contexts lead to higher or lower outcome rates, especially a causal model would quantify the effects of treatment events on the outcomes.

causmos.event_group module

class causmos.event_group.EventGroup(where, value='1', name=None, type='EVENT_GROUP', project_id=None, generate=None, dedup=None, dynamics=None, attributes=None, vars: Optional[dict] = None)

Bases: object

Creates an event group from TQL expressions. EventGroup objects also have a number of additional parameters that can be customized for modeling.

Create a new EventGroup object.

Parameters

where (str) – a TQL expression string which subsets the timeline to this event group.
value (str) – a TQL expression string with the return value of the event for modeling.
name (str) – name of the event group. If not provided a default name will be assigned.
type (str) – optionally provide a type for the event group. If not provided, the column type will be EVENT_GROUP.
project_id (str) – project id from which to pull the timelines for modeling.
generate (str) – a SamplingSettings object or simple expression that can be coerced to generate events.
dedup (dict) – for now, a dictionary with several keys to configure deduplication. Future: a DedupEvents object. all (bool): uses all events or limits to the first event matching the “where” expression. zero_value (bool): skips any events whose ‘value’ expression is zero. where (str): skips any events which do not satisfy the expression for deduping. window (str): skips any events which have a non-skipped event within ‘window’ milliseconds prior. Default: {'all': False, 'zero_value': False, 'where': "TRUE", 'window': "0"} Example: {'all': True, 'zero_value': True, 'where': "TRUE", 'window': "7 * 24 * 3600 * 1000"}.
dynamics (dict) –
for now, a dictionary with ‘scale’, ‘shape’, ‘filters’, and ‘epsilon’. Future: a DynamicEvents object. Example: {'scale': '5m,1h,4h,1d,3d,7d', 'shape': 'exponential', 'filters': {'min_total_count': 200}, 'epsilon': 1e-9}. scale (str): the time scale over which an event may have predictive power or causal effects

(default ‘5m,1h,4h,1d,3d,7d’).

shape (str): the decay function shape (default ‘exponential’). filters (dict): A default feature filters configuration dictionary. epsilon (float): The threshold value at which SUM_OPPORTUNITIES() rounds to zero.
attributes (str) – a list of expressions describing each event in the group for modeling heterogeneity. If columns with feature filter configurations are provided, the filter configurations are preserved when applying dynamic filters.
vars (dict) – a dictionary keyed with any additional ‘global_vars’, ‘timeline_vars’, or ‘event_vars’ required to evaluate any expression in the event group. See also Query.vars().

Returns

an EventGroup object

attributes(attributes): Set the event group’s numerical() and categorical() attribute columns. These are used for heterogeneous effects and subgroup/slicing analyses of predictions and causal effects.

dedup(dedup): Set the dedup configuration. Default is {‘all’: True, ‘zero_value’: True, ‘where’: “TRUE”, ‘window’: “0”}. See help(EventGroup).

describe(): Describe the event group using TQL’s query.describe().

dynamics(dynamics): Set the dynamics configuration. Default is {‘scale’: ‘3h,1d,7d’, ‘shape’: ‘exp’, ‘filters’: {‘min_total_count’: 200}, ‘epsilon’: 1e-9}. See help(EventGroup).

generate(generate)

Attach a list of generate_events() configurations.

Parameters: generate (list of SamplingConf) – List of generate_events sampling configurations.

get_attributes()

get_dedup()

get_dynamics()

get_generate()

get_name()

get_project_id()

get_timeline_vars()

get_type()

get_value()

get_vars()

get_where()

get_where_event()

json(): Format this object into JSON.

name(name): Set the event group’s name.

project_id(project_id): Set the event group’s project.

tql(apply_dedup=True): Build a TQL query to inspect the event group object’s configuration.

type(type): Set the event group’s type. Options: ‘OUTCOME’, ‘TREATMENT’, ‘OPPORTUNITY’, ‘INTERVENTION’, ‘EVENT_GROUP’.

validate(): Validate this EventGroup is configured correctly, checking that all the TQL expressions compile. Internally, this function utilizes self.tql().validate() and throws a TQLAnalysisException if not valid.

value(expression): Set the event group’s value column.

vars(vars: dict): Set additional TQL query.vars() for the expressions used in the event group. Useful for constructing complex expressions such as event groups that depend on other event groups.

where(expression): Set the .where() expression for the event group.

causmos.event_group.intervention(where, value='1', name=None, project_id=None, generate=None, dedup=None, dynamics=None, attributes=None, vars: Optional[dict] = None)

Creates an event group from a TQL expression. EventGroup objects also have a number of additional parameters that can be customized for modeling.

Parameters

where (str) – a TQL expression string which subsets the timeline to this event group.
value (str) – a TQL expression string with the return value of the event for modeling.
name (str) – name of the event group. If not provided a default name will be assigned.
project_id (str) – project id from which to pull the timelines for modeling.
generate (str) – a SamplingSettings object or simple expression that can be coerced to generate events.
dedup (dict) – for now, a dictionary with several keys to configure deduplication. Future: a DedupEvents object. all (bool): uses all events or limits to the first event matching the “where” expression. zero_value (bool): skips any events whose ‘value’ expression is zero. where (str): skips any events which do not satisfy the expression for deduping. window (str): skips any events which have a non-skipped event within ‘window’ milliseconds prior. Default: {'all': False, 'zero_value': False, 'where': "TRUE", 'window': "0"} Example: {'all': True, 'zero_value': True, 'where': "TRUE", 'window': "7 * 24 * 3600 * 1000"}.
dynamics (dict) –
for now, a dictionary with ‘scale’, ‘shape’, ‘filters’, and ‘epsilon’. Future: a DynamicEvents object. Example: {'scale': '5m,1h,4h,1d,3d,7d', 'shape': 'exponential', 'filters': {'min_total_count': 200}, 'epsilon': 1e-9}. scale (str): the time scale over which an event may have predictive power or causal effects

(default ‘5m,1h,4h,1d,3d,7d’).

shape (str): the decay function shape (default ‘exponential’). filters (dict): A default feature filters configuration dictionary. epsilon (float): The threshold value at which SUM_OPPORTUNITIES() rounds to zero.
attributes (str) – a list of expressions describing each event in the group for modeling heterogeneity. If columns with feature filter configurations are provided, the filter configurations are preserved when applying dynamic filters.
vars (dict) – a dictionary keyed with any additional ‘global_vars’, ‘timeline_vars’, or ‘event_vars’ required to evaluate any expression in the event group. See also Query.vars().

Returns

an EventGroup object

causmos.event_group.opportunity(where: str, value: str = '1', name: Optional[str] = None, project_id: Optional[str] = None, generate=None, dedup: Optional[dict] = None, dynamics: Optional[dict] = None, attributes: Optional[list] = None, vars: Optional[dict] = None)

Creates an opportunity event group from a TQL expression. EventGroup objects also have a number of additional parameters that can be customized for modeling.

Parameters

where (str) – a TQL expression string which subsets the timeline to this event group.
value (str) – a TQL expression string with the return value of the event for modeling.
name (str) – name of the event group. If not provided a default name will be assigned.
project_id (str) – project id from which to pull the timelines for modeling.
generate (str) – a SamplingSettings object or simple expression that can be coerced to generate events.
dedup (dict) – for now, a dictionary with several keys to configure deduplication. Future: a DedupEvents object. all (bool): uses all events or limits to the first event matching the “where” expression. zero_value (bool): skips any events whose ‘value’ expression is zero. where (str): skips any events which do not satisfy the expression for deduping. window (str): skips any events which have a non-skipped event within ‘window’ milliseconds prior. Default: {'all': False, 'zero_value': False, 'where': "TRUE", 'window': "0"} Example: {'all': True, 'zero_value': True, 'where': "TRUE", 'window': "7 * 24 * 3600 * 1000"}.
dynamics (dict) –
for now, a dictionary with ‘scale’, ‘shape’, ‘filters’, and ‘epsilon’. Future: a DynamicEvents object. Example: {'scale': '5m,1h,4h,1d,3d,7d', 'shape': 'exponential', 'filters': {'min_total_count': 200}, 'epsilon': 1e-9}. scale (str): the time scale over which an event may have predictive power or causal effects

(default ‘5m,1h,4h,1d,3d,7d’).

shape (str): the decay function shape (default ‘exponential’). filters (dict): A default feature filters configuration dictionary. epsilon (float): The threshold value at which SUM_OPPORTUNITIES() rounds to zero.
attributes (list) – a list of expressions describing each event in the group for modeling heterogeneity. If columns with feature filter configurations are provided, the filter configurations are preserved when applying dynamic filters.
vars (dict) – a dictionary keyed with any additional ‘global_vars’, ‘timeline_vars’, or ‘event_vars’ required to evaluate any expression in the event group. See also Query.vars().

Returns

an EventGroup object

causmos.event_group.outcome(where, value='1', name=None, project_id=None, generate=None, dedup=None, dynamics=None, attributes=None, vars: Optional[dict] = None)

Creates an outcome event group from a TQL expression. EventGroup objects also have a number of additional parameters that can be customized for modeling.

Parameters

where (str) – a TQL expression string which subsets the timeline to this event group.
value (str) – a TQL expression string with the return value of the event for modeling.
name (str) – name of the event group. If not provided a default name will be assigned.
project_id (str) – project id from which to pull the timelines for modeling.
generate (str) – a SamplingConf object or simple expression that can be coerced to generate events.
dedup (dict) – for now, a dictionary with several keys to configure deduplication. Future: a DedupEvents object. all (bool): uses all events or limits to the first event matching the “where” expression. zero_value (bool): skips any events whose ‘value’ expression is zero. where (str): skips any events which do not satisfy the expression for deduping. window (str): skips any events which have a non-skipped event within ‘window’ milliseconds prior. Default: {'all': False, 'zero_value': False, 'where': "TRUE", 'window': "0"} Example: {'all': True, 'zero_value': True, 'where': "TRUE", 'window': "7 * 24 * 3600 * 1000"}.
dynamics (dict) –
for now, a dictionary with ‘scale’, ‘shape’, ‘filters’, and ‘epsilon’. Future: a DynamicEvents object. Example: {'scale': '5m,1h,4h,1d,3d,7d', 'shape': 'exponential', 'filters': {'min_total_count': 200}, 'epsilon': 1e-9}. scale (str): the time scale over which an event may have predictive power or causal effects

(default ‘5m,1h,4h,1d,3d,7d’).

shape (str): the decay function shape (default ‘exponential’). filters (dict): A default feature filters configuration dictionary. epsilon (float): The threshold value at which SUM_OPPORTUNITIES() rounds to zero.
attributes (str) – a list of expressions describing each event in the group for modeling heterogeneity. If columns with feature filter configurations are provided, the filter configurations are preserved when applying dynamic filters.
vars (dict) – a dictionary keyed with any additional ‘global_vars’, ‘timeline_vars’, or ‘event_vars’ required to evaluate any expression in the event group. See also Query.vars().

Returns

an EventGroup object

causmos.event_group.treatment(where, value='1', name=None, project_id=None, generate=None, dedup=None, dynamics=None, attributes=None, vars: Optional[dict] = None)

Creates a treatment event group from a TQL expression. EventGroup objects also have a number of additional parameters that can be customized for modeling.

Parameters

where (str) – a TQL expression string which subsets the timeline to this event group.
value (str) – a TQL expression string with the return value of the event for modeling.
name (str) – name of the event group. If not provided a default name will be assigned.
project_id (str) – project id from which to pull the timelines for modeling.
generate (str) – a SamplingSettings object or simple expression that can be coerced to generate events.
dedup (dict) – for now, a dictionary with several keys to configure deduplication. Future: a DedupEvents object. all (bool): uses all events or limits to the first event matching the “where” expression. zero_value (bool): skips any events whose ‘value’ expression is zero. where (str): skips any events which do not satisfy the expression for deduping. window (str): skips any events which have a non-skipped event within ‘window’ milliseconds prior. Default: {'all': False, 'zero_value': False, 'where': "TRUE", 'window': "0"} Example: {'all': True, 'zero_value': True, 'where': "TRUE", 'window': "7 * 24 * 3600 * 1000"}.
dynamics (dict) –
for now, a dictionary with ‘scale’, ‘shape’, ‘filters’, and ‘epsilon’. Future: a DynamicEvents object. Example: {'scale': '5m,1h,4h,1d,3d,7d', 'shape': 'exponential', 'filters': {'min_total_count': 200}, 'epsilon': 1e-9}. scale (str): the time scale over which an event may have predictive power or causal effects

(default ‘5m,1h,4h,1d,3d,7d’).

shape (str): the decay function shape (default ‘exponential’). filters (dict): A default feature filters configuration dictionary. epsilon (float): The threshold value at which SUM_OPPORTUNITIES() rounds to zero.
attributes (str) – a list of expressions describing each event in the group for modeling heterogeneity. If columns with feature filter configurations are provided, the filter configurations are preserved when applying dynamic filters.
vars (dict) – a dictionary keyed with any additional ‘global_vars’, ‘timeline_vars’, or ‘event_vars’ required to evaluate any expression in the event group. See also Query.vars().

Returns

an EventGroup object

causmos.globals module

causmos.globals.get_project()

causmos.globals.set_project(p_id)

causmos.validation module

causmos.validation.validate_attributes(attributes, group_name: Optional[str] = None): Validate list of attributes. They must be either NUMERICAL or CATEGORICAL.

causmos.validation.validate_dedup(dedup): Validate a ‘dedup’ dictionary configuration.