causmos package

causmos.categorical(expr, name: Optional[str] = None, filters: Optional[dict] = None) zeenk.tql.column.FeatureColumn

Creates a categorical feature column from a TQL expression, optionally providing name and filters

Parameters
  • expr – A TQL expression string

  • name (str) – Optionally provide a name for the column. If not provided a default name will be assigned

  • filters (dict) – Optionally provide a dictionary of column filter conditions for categorical or numerical filters

Returns

A FeatureColumn object to be provided to select(...)

Return type

FeatureColumn

causmos.causmos(project_id: Optional[int] = None, name: Optional[str] = None, author: Optional[str] = None)

Initialize a Causmos model. See the Causmos class for more detailed documentation.

Parameters
  • project_id (int) – Use timelines from this project_id.

  • name (str) – Name of the Causmos model that will be published.

  • author (str) – Author of the Causmos model.

Returns

A Causmos model configuration object.

causmos.col(c: any, name: Optional[str] = None, type: Optional[str] = None, filters: Optional[dict] = None)

Creates a column from a TQL expression or given input, optionally providing name, type, and filters. The first argument can be a variety of formats, including:

  • classes or subclasses of type Column

  • dictionary containing keys for name, expression, and type

  • raw strings, which will be interpreted as TQL expressions.

The created column is by default unnamed, and will be assigned a name if used in a TQL select(...) statement. The default type of columns is ‘METADATA’.

Parameters
  • c (any) – A TQL expression string, column object, dictionary, list, tuple, or FeatureColumn object

  • name (str) – Optionally provide a name for the column. If not provided a default name will be assigned

  • type (str) – Optionally provide a type for the column. If not provided, the column type will be METADATA

  • filters (dict) – Optionally provide a dictionary of column filter conditions for categorical or numerical filters

Returns

A Column or FeatureColumn object to be provided to select(...)

causmos.constant()

Used in select(…) statements to select the FeatureColumn “1.0” with name “constant”

Returns

A tuple with the FeatureColumn “1.0” with name “constant”

Return type

tuple

causmos.create_project(name: str, or_update: bool = False) zeenk.tql.timelines.Project

Creates a new project. This project will require further configuration before timelines can be created. If the project already exists and you wish to update it, use load_project(name), or give the or_update=True option here.

Parameters
  • name (str) – The name of the Project

  • or_update (bool) – Whether to update the Project

Returns

The Project that has just been created

Return type

Project

causmos.create_timeseries(name: str) zeenk.tql.timeseries.TimeSeries

Creates a TimeSeries with the given name

Parameters

name (str) – The name of the TimeSeries

Returns

A new TimeSeries with the given name

Return type

TimeSeries

causmos.debugger(project, expression: str = '', theme: str = 'light')

Create an expression debugger widget - default row limit of 10 is applied automatically. Only works in Jupyter Notebooks or Jupyter Labs

Parameters
  • project – The Project ID or name to run the expression against

  • expression (str) – The initial value of the expression, if any

  • theme (str) – The editor theme, ‘light’ or ‘dark’

Returns

A Jupyter Notebook/Lab widget

causmos.delete_udf(project_id: int, function_name: str)

Deletes a UDF for a specific Project by function name

Parameters
  • project_id (int) – ID of the Project to delete udf from

  • function_name (str) – Name of the udf to be deleted

causmos.describe_project(project_identifier, fail_if_not_found: bool = True) zeenk.tql.timelines.Project

Loads the specified project by name, or throws TQLAnalysisException if not found

Parameters
  • project_identifier – The name (str) or ID (int) of the Project

  • fail_if_not_found (bool) – Whether to throw an exception if the Project is not found

Returns

The Project with the specified name or ID

Return type

Project

causmos.drop_project(name_or_id, if_exists: bool = False)

Deletes the project by name or ID

Parameters
  • name_or_id – The name (str) or ID (int) of the Project

  • if_exists (bool) – Only drop the Project if it exists

causmos.event_metadata() tuple

To be used in select(…) statements for returning timeline.id, id, datetime, and type

Returns

A tuple of Columns

Return type

tuple

causmos.event_time() tuple

To be used in select(…) statements for returning timestamp and duration

Returns

A tuple of Columns

Return type

tuple

causmos.generate_events(sample_generator_expression: str, functions: Optional[list] = None, variables: Optional[dict] = None, attribute_expressions: Optional[dict] = None, inherit_attributes: bool = False) zeenk.tql.sampling.SamplingConf

Manual sampling is similar to importance sampling but allows the user to manually override parameters such as the list of sample events (e.g., timestamps at which sampling occurs) and other properties of each generated sample event. See generate_importance_events() for an advanced use case.

causmos.generate_importance_events(num_samples: float = - 1.0, min_ts: Optional[str] = None, max_ts: Optional[str] = None, time_shift_factor: float = 0.0, fraction_uniform_time: float = 0.0, sampling_distribution: str = 'exponential', sampling_events_expression: str = 'FILTER(timeline.events, (x) -> x.type=null)', sampling_kernels: str = '5m,15m,1h,4h,1d,3d,7d') zeenk.tql.sampling.SamplingConf

Importance sampling can be used to retain modeling unbiasedness (avoid introducing selection bias when sampling records) while still increasing the number of records where the modeling is most interesting. For example, when modeling the causal effect of a treatment on an outcome, we would like to ensure that most of our records (whether ‘positive’, an outcome event, or ‘negative’, a non-outcome sampling event) are in the vicinity of a treatment opportunity or an outcome. By so doing, we increase the model’s statistical power at deciphering the relationship between the two. In contrast, if most outcomes happen during only 10% of the sample time period, we most of our observations will be during the “boring” portion of the timeline when no events of interest are occurring.

generate_importance_events helps you configure what time periods are “interesting.” You configure how many records to randomly sample for each timeline, which timestamps or events you want to increase your sampling around, the distribution (shape and scale) around each event from which you would like to randomly sample, and the probability you would like to draw from the background uniform distribution (e.g., a random point in the timeline).

In summary, one way to generate negative (non-outcome) records would be to simply draw uniformly between the start and end of the timeline’s observation window. However, we can improve upon that by instructing the extractor to generate these negatives by a configurable time-importance-weighted sampling methodology around times of timeline events.

causmos.get_project()
causmos.get_udf(project_id: int, function_name: str) zeenk.tql.udf.UDF

Retrieve a specific UDF from a Project

Parameters
  • project_id (int) – ID of the Project to retrieve UDF on

  • function_name (str) – Name of the UDF to retrieve

Returns

A UDF or throws TQLAnalysisNotFound exception

Return type

UDF

causmos.intervention(where, value='1', name=None, project_id=None, generate=None, dedup=None, dynamics=None, attributes=None, vars: Optional[dict] = None)

Creates an event group from a TQL expression. EventGroup objects also have a number of additional parameters that can be customized for modeling.

Parameters
  • where (str) – a TQL expression string which subsets the timeline to this event group.

  • value (str) – a TQL expression string with the return value of the event for modeling.

  • name (str) – name of the event group. If not provided a default name will be assigned.

  • project_id (str) – project id from which to pull the timelines for modeling.

  • generate (str) – a SamplingSettings object or simple expression that can be coerced to generate events.

  • dedup (dict) – for now, a dictionary with several keys to configure deduplication. Future: a DedupEvents object. all (bool): uses all events or limits to the first event matching the “where” expression. zero_value (bool): skips any events whose ‘value’ expression is zero. where (str): skips any events which do not satisfy the expression for deduping. window (str): skips any events which have a non-skipped event within ‘window’ milliseconds prior. Default: {'all': False, 'zero_value': False, 'where': "TRUE", 'window': "0"} Example: {'all': True, 'zero_value': True, 'where': "TRUE", 'window': "7 * 24 * 3600 * 1000"}.

  • dynamics (dict) –

    for now, a dictionary with ‘scale’, ‘shape’, ‘filters’, and ‘epsilon’. Future: a DynamicEvents object. Example: {'scale': '5m,1h,4h,1d,3d,7d', 'shape': 'exponential', 'filters': {'min_total_count': 200}, 'epsilon': 1e-9}. scale (str): the time scale over which an event may have predictive power or causal effects

    (default ‘5m,1h,4h,1d,3d,7d’).

    shape (str): the decay function shape (default ‘exponential’). filters (dict): A default feature filters configuration dictionary. epsilon (float): The threshold value at which SUM_OPPORTUNITIES() rounds to zero.

  • attributes (str) – a list of expressions describing each event in the group for modeling heterogeneity. If columns with feature filter configurations are provided, the filter configurations are preserved when applying dynamic filters.

  • vars (dict) – a dictionary keyed with any additional ‘global_vars’, ‘timeline_vars’, or ‘event_vars’ required to evaluate any expression in the event group. See also Query.vars().

Returns

an EventGroup object

causmos.kernelize(features, name: Optional[str] = None, treatment_filter_event: Optional[str] = None, event_group_where_event: Optional[str] = None, intended_action: Optional[list] = None, actual_action: Optional[list] = None, treatment_model: Optional[str] = None, kernel_types: Optional[list] = None, opportunity_conf: Optional[zeenk.tql.opportunity_conf.OpportunityConf] = None, kernel_parameters: Optional[str] = None, kernel_distribution: Optional[str] = None, filters: Optional[dict] = None, parse_kv: bool = False)

Takes a list of opportunity-level features based on user/opportunity/treatment/proxy-outcome-level fields and create a transformed feature with SUM_OPPORTUNITIES() accumulating the effects across each opportunity. These “kernelized” features are for use in a Causmos timeline-based causal model where each observation record in a dataset is an outcome or a potential outcome (e.g., a moment in time when an outcome could have occurred). For example, we can power dataset generation using a treatment-propensity model to reduce the potential bias if treatment and the outcome are correlated. Apply this function to each list of features to be kernelized. There are several types of transformations described as (KD = ‘treatment’, BKD = ‘baseline’, GKD = ‘ghost’, NKD = ‘nonrandom’).

Parameters
  • features (list) – List of col() to be kernelized (KD,BKD,GKD,NKD). Examples: ["1.0", "IF(1.0,'L','R')"], [{"name":"constant", "expression":"1.0", "type":"NUMERICAL"}]). Columns can be “CATEGORICAL” or “NUMERICAL”. Each categorical expression creates an expansion of features; each numerical expression multiplies/reweights the opportunity’s contribution to the SUM_OPPORTUNITIES() sum.

  • name (str) – Base string for the feature sets. Will append type of kernel and prepend “w” to denote non-incrementality features for BKD, GKD, and NKD. If name=’’ (default), just return the list of features with the default prefix: AKD_..., wBKD_....

  • treatment_filter_event (str) – Column that defines whether a treatment opportunity resulted in treatment. For example, was the advertiser’s impression shown after bidding? In a sample dataset, request.impression is the relevant field.

  • event_group_where_event (str) – second where condition within passed EventGroup (currently this is set to be output from filtering like dedupe - but should be made more explicit).

  • intended_action – List of numerical/categorical expressions defining the intended/optimal decisions, e.g., bid_amount or eligible_treatment_groups.

  • actual_action – List of numerical/categorical expressions defining the actual action/decision taken, e.g., IF(ghost_bid, 0, bid_amount) or assigned_treatment_group.

  • treatment_model (str) – String for the deployed/published treatment prediction model (GKD, NKD-only). In practice, this will be the win-rate model based on the leaves.

  • kernel_types (list) – List of kernels types to include in the feature set. Subset of ['KD','BKD','GKD','NKD'].

  • opportunity_conf – An OpportunityConf() object that contains the defaults for opportunity_filter, kernel_parameters, and kernel_distribution.

  • kernel_parameters (str) – Calibration for kernel.days in the kernel parameters. Arrays of kernel features are created with suffixes using values such as seconds (s), minutes (m), hours (h), and days (d) in combination with a number, such as '15m,4h,3d': 'name_feature' becomes 'name_feature-15m', etc.

  • kernel_distribution (str) – Positive-support distributions (short-form abbreviations): exponential (e, exp), uniform (u, unif), triangular (t, tri), halfnormal (h, hnorm), and halflogistic (l, hlog). Positive- & negative-support distributions are the symmetric analogs of the positive-support distributions: laplace (a, lap), rectangular (r, rect), symmetrictriangular (s, stri), normal (n, norm), and logistic (o, log). Time-independent constant kernel (c, const) is also useful for various use cases such as static models in order to accumulate all opportunities and treatments for each outcome.

  • filters (dict) – column filters to apply to all of the features.

  • parse_kv (boolean) – Try to parse ‘NUMERICAL’ input features as key-value pairs ‘k:v’. This has extra overhead, but more complex mixed ‘categorical:numerical’ input features.

Returns

A list of feature sets for each kernel_type, each containing a list of kernelized features.

Return type

list

causmos.label(expr, name: str = '_label') zeenk.tql.column.Column

Creates a label column from a TQL expression. The expression provided to label() is expected to return a numeric value for all rows. If a numeric or NaN/infinite value is not returned for any row, it will be replaced with the default label value of 0.0. Label columns will be automatically be named “_label”. It is expected that a dataset will have at most one label column.

Parameters
  • expr – A TQL expression string

  • name (str) – Optionally provide a name for the column. If not provided a default name will be assigned

Returns

A Column object to be provided to select(...)

Return type

Column

causmos.list_udfs(project_id: int) zeenk.tql.udf.UDF

Retrieves all UDFs defined for a Project

Parameters

project_id (int) – ID of the Project to retrieve UDFs on

Returns

All existing UDFs for the Project

Return type

UDF

causmos.load_project(project_identifier, fail_if_not_found: bool = True) zeenk.tql.timelines.Project

Loads the specified project by name, or throws TQLAnalysisException if not found

Parameters
  • project_identifier – The name (str) or ID (int) of the Project

  • fail_if_not_found (bool) – Whether to throw an exception if the Project is not found

Returns

The Project with the specified name or ID

Return type

Project

causmos.load_query(id: int) zeenk.tql.query.Query

Loads the query from the given ID

Parameters

id (int) – The ID of the query

Returns

A new Query instance

Return type

Query

causmos.load_resultset(id: int) zeenk.tql.resultset.ResultSet

Loads the ResultSet with the given ID

Parameters

id (int) – The ID of the ResultSet to be loaded

Returns

The ResultSet

Return type

ResultSet

causmos.metadata(expr, name: Optional[str] = None) zeenk.tql.column.Column

Creates a metadata column from a TQL expression. Metadata columns will return the expression values “as is”, meaning they will not be post-processed with charset filtering or expansion of numerical columns. Metadata columns are also not subject to column filters. Metadata columns are the default column type and are often the correct choice for arbitrary datasets that are not specifically intended to be consumed by a ML training package.

Parameters
  • expr – A TQL expression string

  • name (str) – Optionally provide a name for the column. If not provided a default name will be assigned

Returns

A Column object to be provided to select(...)

Return type

Column

causmos.numerical(expr, name: Optional[str] = None, filters: Optional[dict] = None) zeenk.tql.column.FeatureColumn

Creates a numerical feature column from a TQL expression, optionally providing name and filters

Parameters
  • expr – A TQL expression string

  • name (str) – Optionally provide a name for the column. If not provided a default name will be assigned

  • filters (dict) – Optionally provide a dictionary of column filter conditions for categorical or numerical filters

Returns

A FeatureColumn object to be provided to select(...)

Return type

FeatureColumn

causmos.opportunity(where: str, value: str = '1', name: Optional[str] = None, project_id: Optional[str] = None, generate=None, dedup: Optional[dict] = None, dynamics: Optional[dict] = None, attributes: Optional[list] = None, vars: Optional[dict] = None)

Creates an opportunity event group from a TQL expression. EventGroup objects also have a number of additional parameters that can be customized for modeling.

Parameters
  • where (str) – a TQL expression string which subsets the timeline to this event group.

  • value (str) – a TQL expression string with the return value of the event for modeling.

  • name (str) – name of the event group. If not provided a default name will be assigned.

  • project_id (str) – project id from which to pull the timelines for modeling.

  • generate (str) – a SamplingSettings object or simple expression that can be coerced to generate events.

  • dedup (dict) – for now, a dictionary with several keys to configure deduplication. Future: a DedupEvents object. all (bool): uses all events or limits to the first event matching the “where” expression. zero_value (bool): skips any events whose ‘value’ expression is zero. where (str): skips any events which do not satisfy the expression for deduping. window (str): skips any events which have a non-skipped event within ‘window’ milliseconds prior. Default: {'all': False, 'zero_value': False, 'where': "TRUE", 'window': "0"} Example: {'all': True, 'zero_value': True, 'where': "TRUE", 'window': "7 * 24 * 3600 * 1000"}.

  • dynamics (dict) –

    for now, a dictionary with ‘scale’, ‘shape’, ‘filters’, and ‘epsilon’. Future: a DynamicEvents object. Example: {'scale': '5m,1h,4h,1d,3d,7d', 'shape': 'exponential', 'filters': {'min_total_count': 200}, 'epsilon': 1e-9}. scale (str): the time scale over which an event may have predictive power or causal effects

    (default ‘5m,1h,4h,1d,3d,7d’).

    shape (str): the decay function shape (default ‘exponential’). filters (dict): A default feature filters configuration dictionary. epsilon (float): The threshold value at which SUM_OPPORTUNITIES() rounds to zero.

  • attributes (list) – a list of expressions describing each event in the group for modeling heterogeneity. If columns with feature filter configurations are provided, the filter configurations are preserved when applying dynamic filters.

  • vars (dict) – a dictionary keyed with any additional ‘global_vars’, ‘timeline_vars’, or ‘event_vars’ required to evaluate any expression in the event group. See also Query.vars().

Returns

an EventGroup object

causmos.outcome(where, value='1', name=None, project_id=None, generate=None, dedup=None, dynamics=None, attributes=None, vars: Optional[dict] = None)

Creates an outcome event group from a TQL expression. EventGroup objects also have a number of additional parameters that can be customized for modeling.

Parameters
  • where (str) – a TQL expression string which subsets the timeline to this event group.

  • value (str) – a TQL expression string with the return value of the event for modeling.

  • name (str) – name of the event group. If not provided a default name will be assigned.

  • project_id (str) – project id from which to pull the timelines for modeling.

  • generate (str) – a SamplingConf object or simple expression that can be coerced to generate events.

  • dedup (dict) – for now, a dictionary with several keys to configure deduplication. Future: a DedupEvents object. all (bool): uses all events or limits to the first event matching the “where” expression. zero_value (bool): skips any events whose ‘value’ expression is zero. where (str): skips any events which do not satisfy the expression for deduping. window (str): skips any events which have a non-skipped event within ‘window’ milliseconds prior. Default: {'all': False, 'zero_value': False, 'where': "TRUE", 'window': "0"} Example: {'all': True, 'zero_value': True, 'where': "TRUE", 'window': "7 * 24 * 3600 * 1000"}.

  • dynamics (dict) –

    for now, a dictionary with ‘scale’, ‘shape’, ‘filters’, and ‘epsilon’. Future: a DynamicEvents object. Example: {'scale': '5m,1h,4h,1d,3d,7d', 'shape': 'exponential', 'filters': {'min_total_count': 200}, 'epsilon': 1e-9}. scale (str): the time scale over which an event may have predictive power or causal effects

    (default ‘5m,1h,4h,1d,3d,7d’).

    shape (str): the decay function shape (default ‘exponential’). filters (dict): A default feature filters configuration dictionary. epsilon (float): The threshold value at which SUM_OPPORTUNITIES() rounds to zero.

  • attributes (str) – a list of expressions describing each event in the group for modeling heterogeneity. If columns with feature filter configurations are provided, the filter configurations are preserved when applying dynamic filters.

  • vars (dict) – a dictionary keyed with any additional ‘global_vars’, ‘timeline_vars’, or ‘event_vars’ required to evaluate any expression in the event group. See also Query.vars().

Returns

an EventGroup object

causmos.query(project='lethe4')

A simple example query with event_metadata() and a limit of 10

Parameters

project – The Project or project ID to run the Query against

Returns

An example Query object

Return type

Query

causmos.random_partition(partition_var: str = 'timeline.id', seed: str = '', shares: list = [0.8, 0.2], names: list = ['train', 'test'])

Use the arguments to create a random partition TQL expression like col("MD5_PARTITIONS(timeline.id, 'my hashing seed', [0.008, 0.002], ['train','test'])"). This can be used as a column or with .partition_by().

Parameters
  • partition_var (str) – Single-line TQL subexpression

  • seed (str) – String to serve as the seed for the hash partitioning

  • shares (list) – List of the relative shares of each partition. Relative shares do not need to sum to one

  • names (list) – List of the names for each partition

Returns

A metadata column with the random_partition expression for use in .partition_by() or as a column

Return type

Column

causmos.select(*cols) zeenk.tql.query.Query

Create a new query object from one or more columns. Columns can be defined as TQL expression strings, or wrapped using one of the provided TQL column functions label(), weight(), tag(), categorical(), numerical(), or metadata(). Typically select(...) will immediately be followed by .from_timelines(...) or .from_events(...) during query construction.

Parameters

cols – One or more TQL column objects or expressions

Returns

A new tql.query.Query instance for further chaining/modification

Return type

Query

causmos.set_project(p_id)
causmos.show_projects()

Shows the available projects

causmos.tag(expr, name: str = '_tag') zeenk.tql.column.Column

Creates a tag column from a TQL expression. The expression provided to tag() is expected to return a non-null value for all rows. Typically this expression will uniquely identify the row, which is useful for debugging and tracing datasets later. Uniqueness is not required for the return value, but highly encouraged. Tag columns will automatically be named “_tag”. It is expected that a dataset will have at most one tag column.

Parameters
  • expr – A TQL expression string that returns a unique identifier for the row

  • name (str) – Optionally provide a name for the column. If not provided a default name will be assigned

Returns

A Column object to be provided to select(...)

Return type

Column

causmos.treatment(where, value='1', name=None, project_id=None, generate=None, dedup=None, dynamics=None, attributes=None, vars: Optional[dict] = None)

Creates a treatment event group from a TQL expression. EventGroup objects also have a number of additional parameters that can be customized for modeling.

Parameters
  • where (str) – a TQL expression string which subsets the timeline to this event group.

  • value (str) – a TQL expression string with the return value of the event for modeling.

  • name (str) – name of the event group. If not provided a default name will be assigned.

  • project_id (str) – project id from which to pull the timelines for modeling.

  • generate (str) – a SamplingSettings object or simple expression that can be coerced to generate events.

  • dedup (dict) – for now, a dictionary with several keys to configure deduplication. Future: a DedupEvents object. all (bool): uses all events or limits to the first event matching the “where” expression. zero_value (bool): skips any events whose ‘value’ expression is zero. where (str): skips any events which do not satisfy the expression for deduping. window (str): skips any events which have a non-skipped event within ‘window’ milliseconds prior. Default: {'all': False, 'zero_value': False, 'where': "TRUE", 'window': "0"} Example: {'all': True, 'zero_value': True, 'where': "TRUE", 'window': "7 * 24 * 3600 * 1000"}.

  • dynamics (dict) –

    for now, a dictionary with ‘scale’, ‘shape’, ‘filters’, and ‘epsilon’. Future: a DynamicEvents object. Example: {'scale': '5m,1h,4h,1d,3d,7d', 'shape': 'exponential', 'filters': {'min_total_count': 200}, 'epsilon': 1e-9}. scale (str): the time scale over which an event may have predictive power or causal effects

    (default ‘5m,1h,4h,1d,3d,7d’).

    shape (str): the decay function shape (default ‘exponential’). filters (dict): A default feature filters configuration dictionary. epsilon (float): The threshold value at which SUM_OPPORTUNITIES() rounds to zero.

  • attributes (str) – a list of expressions describing each event in the group for modeling heterogeneity. If columns with feature filter configurations are provided, the filter configurations are preserved when applying dynamic filters.

  • vars (dict) – a dictionary keyed with any additional ‘global_vars’, ‘timeline_vars’, or ‘event_vars’ required to evaluate any expression in the event group. See also Query.vars().

Returns

an EventGroup object

causmos.update_udf(project_id: int, *function_src)

Updates UDF(s) for the specified Project

Parameters
  • project_id (int) – ID of the Project to upload UDFs to

  • function_src – One or more function’s source string

Throws TQLAnalysisException if function doesn’t exist in the Project yet

causmos.upload_udf(project_id: int, *function_src)

Uploads udf(s) to the specified Project

Parameters
  • project_id (int) – ID of the Project to upload UDFs to

  • function_src – One or more function’s source string

Throws TQLAnalysisException if function already exists in the Project

causmos.validate_udf(project_id: int, *function_src)

Validates UDF(s)

Parameters
  • project_id (int) – ID of the Project to validate UDFs on

  • function_src – One or more function’s source string

Throws TQLAnalysisException and printout the error message and location if there’s any compilation error in udf

causmos.weight(expr, name: str = '_weight') zeenk.tql.column.Column

Creates a weight column from a TQL expression. The expression provided to weight() is expected to return a numeric value for all rows. If a numeric or NaN/infinite value is not returned for any row, it will be replaced with the default weight value of 1.0. Weight columns will be automatically be named “_weight”. It is expected that a dataset will have at most one weight column.

Parameters
  • expr – A TQL expression string that returns a numeric value

  • name (str) – Optionally provide a name for the column. If not provided a default name will be assigned

Returns

A Column object to be provided to select(...)

Return type

Column

Subpackages

Submodules

causmos.causmos_model module

class causmos.causmos_model.Causmos(project_id: Optional[int] = None, name: Optional[str] = None, author: Optional[str] = None, type: Optional[str] = None, outcome: Optional[causmos.event_group.EventGroup] = None, treatment: Optional[causmos.event_group.EventGroup] = None, opportunity: Optional[causmos.event_group.EventGroup] = None, intervention: Optional[causmos.event_group.EventGroup] = None, predictor: Optional[causmos.event_group.EventGroup] = None, baseline: Optional[causmos.event_group.EventGroup] = None)

Bases: object

Causmos Model Factory: Estimand & Sampling - Provide a configuration for an outcome - (Optional) Provide a configuration for treatment, opportunity, intervention, and/or predictor/baseline. - Choose a specific sampling methodology for model training/estimation.

Causmos is the core modeling question: - my_causmos.pre_post() –> Modeling instance –> Query/Dataset –> Training –> Publish –> Metrics - my_causmos.panel() –> Modeling instance –> Query/Dataset… - my_causmos.uplift() –> Modeling instance…

Create a Causmos model configuration object.

Parameters
  • project_id (int) – Use timelines from this project_id.

  • name (str) – Name of the Causmos model that will be published.

  • author (str) – Author of the Causmos model.

  • type (str) – Causmos model type.

  • outcome (EventGroup) – An EventGroup object of type==’OUTCOME’ defining the outcome events in the model.

  • treatment (EventGroup) – An EventGroup object of type==’TREATMENT’ defining the treatment events in the model.

  • opportunity (EventGroup) – An EventGroup object of type==’OPPORTUNITY’ defining the opportunity events.

  • intervention (EventGroup) – An EventGroup object of type==’INTERVENTION’ defining the intervention events.

  • predictor (EventGroup) – An EventGroup object of type==’EVENT_GROUP’ defining additional predictor events.

  • baseline (EventGroup) – An EventGroup object of type==’EVENT_GROUP’ defining additional baseline events.

MODEL_TYPES = ['TREATED', 'UPLIFT', 'AB', 'AB_TOT', 'PRE_POST', 'DIFF_IN_DIFF', 'AB_DIFF', 'AB_TOT_DIFF', 'PRE_POST_PANEL', 'DIFF_IN_DIFF_PANEL', 'AB_DIFF_PANEL', 'AB_TOT_DIFF_PANEL', 'PRE_POST_CONTINUOUS', 'DIFF_IN_DIFF_CONTINUOUS', 'AB_DIFF_CONTINUOUS', 'AB_TOT_DIFF_CONTINUOUS']
PANEL_TYPES = ('SIMPLE', 'CROSS', 'PANEL2', 'PANEL', 'CONTINUOUS')
ab(duration)

Create a cross-sectional AB model. Given Outcome event group, compute the effect of an Intervention, assuming that the Intervention also defines Treatment.

ab_diff(duration)

Create a two-time-period AB difference model. Given Outcome and Intervention event groups, estimate the effect of Treatment = Intervention, subsetting the sample to timelines with at least one intervention.

ab_diff_continuous(continuous_rescale: Optional[float] = None)
ab_diff_panel(duration)
ab_tot(duration)

Create a cross-sectional AB Treatment-on-the-Treated (ToT) model. Given Outcome and Intervention event groups, estimate the effect of Treatment induced by the Intervention.

ab_tot_diff(duration)
ab_tot_diff_continuous(continuous_rescale: Optional[float] = None)
ab_tot_diff_panel(duration)
baseline(my_baseline)

Specify the baseline using a Baseline EventGroup object.

Parameters

my_baseline (EventGroup) – A Baseline EventGroup object specifying predictor/baseline events.

cluster_by(cluster_expression=None)

Define the cluster variable to return in order to compute clustered standard errors, either analytically or using the Bayesian bootstrap. The default is to use timeline.id.

continuous()
cross(post, duration, cohorts)

Create a model using generic cross-sectional panel sampling.

diff_in_diff(duration)

Create a two-time-period Difference-in-Differences model. Given Outcome and Opportunity event groups, estimate the effect of Treatment, subsetting the sample to timelines with at least one opportunity.

diff_in_diff_continuous(continuous_rescale: Optional[float] = None)
diff_in_diff_panel(duration)
get_baseline()
get_cluster_by()
get_intervention()
get_name()
get_opportunity()
get_outcome()
get_partition_by()
get_predictor()
get_treatment()
get_type()
intervention(my_intervention)

Specify the intervention using an Intervention EventGroup object.

Parameters

my_intervention (EventGroup) – An Intervention EventGroup object specifying the intervention events.

json()
model(model_type)
name(name)

Set the name of the Causmos model.

observables(observables)

Set the model’s observables. This determines the groups of features that are generated for treatment models.

Parameters

observables (list of str) –

List of which SUM_OPPORTUNITIES() feature transformations you would like to apply to the ‘opportunity_features’. List elements should be drawn from ['KD','BKD','GKD','NKD']:

  • KD: Treatment (e.g., Impression/Dose); always included

  • BKD: Opportunity (e.g., Bids)

  • GKD: Ghost bids / Propensity Score / Expected win rate (requires ‘treatment_model’)

  • NKD: Control function / Endogeneity / Unexpected win rate (requires ‘treatment_model’)

opportunity(my_opportunity)

Specify the opportunity using an Opportunity EventGroup object.

Parameters

my_opportunity (EventGroup) – An Opportunity EventGroup object specifying the opportunity events.

outcome(my_outcome: causmos.event_group.EventGroup)

Specify the outcome using an Outcome EventGroup object.

Parameters

my_outcome (EventGroup) – An Outcome EventGroup object specifying the outcome events.

panel()
panel2(pre, post, duration, cohorts)

Create a model using generic two-time-period panel sampling.

Parameters
  • pre – An expression that generates a list of pre-period samples.

  • post – An expression that generates a list of post-period samples.

  • duration – Time, in milliseconds, over which to aggregate features.

  • cohorts – A filter expression that defines the cohorts to be included.

partition_by(partition_key_expression=None)

Define how to partition a model’s records. Default is to randomly partition using the random_partition() columnset’s defaults.

pre_post(duration)

Create a two-time-period Pre-Post model. Given an Outcome event group, estimate the effect of Treatment, subsetting the sample to timelines with at least one treatment.

pre_post_continuous(continuous_rescale: Optional[float] = None)
pre_post_panel(duration)
predictor(my_predictor)

Specify predictors using a Predictor EventGroup object.

Parameters

my_predictor (EventGroup) – A Predictor EventGroup object specifying predictor/baseline events.

simple(duration)

(Non-causal) Prediction model with time-bucket sampling of ‘duration’. Compatible with PREDICT(‘SIMPLE_MODEL_NAME’) but not with CAUSAL_EFFECT(‘SIMPLE_MODEL_NAME’) as there is no definition of treatment, only a definition of an Outcome event group.

For example, a daily sales-volume model where the level of sampling would be determined by the end of the day and outcomes.

simple_continuous()

(Non-causal) Prediction model with event-level sampling. Compatible with PREDICT(‘SIMPLE_MODEL_NAME’) but not with CAUSAL_EFFECT(‘SIMPLE_MODEL_NAME’) as there is no definition of treatment, only a definition of an Outcome event group.

For example, a click-through rate (CTR) model where the level of sampling would be impression and the label would be a field or look-forward to an event recording a click event related to the impression event.

tql()

Build and return the TQL query based on the choice of analysis method’s panel sampling.

treated(duration)

Create a cross-sectional treated model. This will collect all users who are treated and create a model to predict their time-aggregated Outcomes.

treatment(my_treatment)

Specify the treatment using an Treatment EventGroup object.

Parameters

my_treatment (EventGroup) – A Treatment EventGroup object specifying the treatment events.

type(type)

Set the model Analysis Method. See self.MODEL_TYPES.

uplift(duration)

Create a cross-sectional uplift model. Given Outcome and Opportunity event groups, compute the effect of Treatment events by comparing timelines.

class causmos.causmos_model.Continuous(start: Optional[str] = None, end: Optional[str] = None, interval: Optional[str] = None, sample: Optional[str] = None, cohorts: Optional[str] = None, heterogeneous: bool = False, truncate: bool = True, balance: bool = False, initialize: Optional[str] = None, trim: Optional[str] = None, continuous_rescale: Optional[float] = None)

Bases: causmos.causmos_model.Panel

Create a continuous-time panel object. This usually includes multiple samples for outcome events (e.g., “positives”) and samples for non-outcome events (e.g., “negatives”) which reflect the measure of time against which the positives will be compared to compute conversion rates. For example, if the measure of non-outcome samples is 10 days and there are 5 outcome events, then the average outcome rate would be 0.5 outcomes per day, though a model would seek to explain why certain times and contexts lead to higher or lower outcome rates, especially a causal model would quantify the effects of treatment events on the outcomes.

See also help(Panel).

Create a Panel dataset object.

Parameters
  • start (str) – A TQL string expression defining the start of a timeline’s sampling.

  • end (str) – A TQL string expression defining the end of a timeline’s sampling.

  • interval (str) – A TQL string expression defining the uniform spacing of the timeline’s sampling.

  • sample (str) – A TQL string expression defining the sample generator expression (list of timestamps or events).

  • duration (str) – A TQL string expression defining the duration of each of the sample timestamps/events.

  • cohorts (str) – A way to include additional cohorts and define their sampling time (e.g., untreated timelines do not have “treatment” so cannot be sampled pre/post treatment.

  • heterogeneous (bool) – include the columns for a heterogeneous effects model.

  • truncate (bool) – Rescale the sample’s records to account for the observation window. For example, if a timeline’s max timestamp is 4 days prior to the sample timestamp but the duration is 7 days, adjust the timestamp and duration to correspond to 4 days prior and 3 days, respectively.

  • balance (bool) – If the sample’s timestamp is outside [min_ts, max_ts] for a timeline, drop record (e.g., daily samples, but not all timelines span all sampled days).

  • initialize (str) – A TQL string expression defining the duration to trim at the start of a timeline. For example, if the sample’s timestamp is [min_ts + initialize, max_ts] for a timeline, drop the record.

  • trim (str) – A TQL string expression defining the duration to trim at both the start and end of a timeline. For example, if the sample’s timestamp is [min_ts + trim, max_ts - trim] for a timeline, drop the record.

Returns

An object of class Panel for building longitudinal or panel datasets.

continuous_rescale(continuous_rescale)

Continuous models require a rescale parameter to remove finite-sample bias.

Parameters

continuous_rescale (float) – This rescale parameter should be set to a value such that the sum of the negative weights is much larger than the sum of the positives weights / continuous_rescale.

get_continuous_rescale()
class causmos.causmos_model.Cross(post, duration, cohorts: Optional[str] = None, heterogeneous: bool = False, truncate: bool = True, balance: bool = False, initialize: Optional[str] = None, trim: Optional[str] = None)

Bases: causmos.causmos_model.Panel

Create a Cross-sectional panel object. This usually implies creating one sample per timeline. See also help(Panel).

Create a Panel dataset object.

Parameters
  • start (str) – A TQL string expression defining the start of a timeline’s sampling.

  • end (str) – A TQL string expression defining the end of a timeline’s sampling.

  • interval (str) – A TQL string expression defining the uniform spacing of the timeline’s sampling.

  • sample (str) – A TQL string expression defining the sample generator expression (list of timestamps or events).

  • duration (str) – A TQL string expression defining the duration of each of the sample timestamps/events.

  • cohorts (str) – A way to include additional cohorts and define their sampling time (e.g., untreated timelines do not have “treatment” so cannot be sampled pre/post treatment.

  • heterogeneous (bool) – include the columns for a heterogeneous effects model.

  • truncate (bool) – Rescale the sample’s records to account for the observation window. For example, if a timeline’s max timestamp is 4 days prior to the sample timestamp but the duration is 7 days, adjust the timestamp and duration to correspond to 4 days prior and 3 days, respectively.

  • balance (bool) – If the sample’s timestamp is outside [min_ts, max_ts] for a timeline, drop record (e.g., daily samples, but not all timelines span all sampled days).

  • initialize (str) – A TQL string expression defining the duration to trim at the start of a timeline. For example, if the sample’s timestamp is [min_ts + initialize, max_ts] for a timeline, drop the record.

  • trim (str) – A TQL string expression defining the duration to trim at both the start and end of a timeline. For example, if the sample’s timestamp is [min_ts + trim, max_ts - trim] for a timeline, drop the record.

Returns

An object of class Panel for building longitudinal or panel datasets.

class causmos.causmos_model.DiscretePanel(start: Optional[str] = None, end: Optional[str] = None, interval: Optional[str] = None, sample: Optional[str] = None, duration: Optional[str] = None, cohorts: Optional[str] = None, heterogeneous: bool = False, truncate: bool = True, balance: bool = False, initialize: Optional[str] = None, trim: Optional[str] = None)

Bases: causmos.causmos_model.Panel

A DiscretePanel object defines a sampling configuration for a longitudinal/panel model. See also help(Panel).

Create a DiscretePanel object.

Parameters
  • start (str) –

  • end (str) –

  • interval (str) –

  • sample (str) –

  • duration (str) –

  • cohorts (str) –

  • heterogeneous (bool) –

  • truncate (bool) –

  • balance (bool) –

  • initialize (str) –

  • trim (str) –

Returns

A DiscretePanel class object.

class causmos.causmos_model.Panel(start: Optional[str] = None, end: Optional[str] = None, interval: Optional[str] = None, sample: Optional[str] = None, duration: Optional[str] = None, cohorts: Optional[str] = None, heterogeneous: bool = False, truncate: bool = True, balance: bool = False, initialize: Optional[str] = None, trim: Optional[str] = None)

Bases: object

Create a Panel dataset object.

Parameters
  • start (str) – A TQL string expression defining the start of a timeline’s sampling.

  • end (str) – A TQL string expression defining the end of a timeline’s sampling.

  • interval (str) – A TQL string expression defining the uniform spacing of the timeline’s sampling.

  • sample (str) – A TQL string expression defining the sample generator expression (list of timestamps or events).

  • duration (str) – A TQL string expression defining the duration of each of the sample timestamps/events.

  • cohorts (str) – A way to include additional cohorts and define their sampling time (e.g., untreated timelines do not have “treatment” so cannot be sampled pre/post treatment.

  • heterogeneous (bool) – include the columns for a heterogeneous effects model.

  • truncate (bool) – Rescale the sample’s records to account for the observation window. For example, if a timeline’s max timestamp is 4 days prior to the sample timestamp but the duration is 7 days, adjust the timestamp and duration to correspond to 4 days prior and 3 days, respectively.

  • balance (bool) – If the sample’s timestamp is outside [min_ts, max_ts] for a timeline, drop record (e.g., daily samples, but not all timelines span all sampled days).

  • initialize (str) – A TQL string expression defining the duration to trim at the start of a timeline. For example, if the sample’s timestamp is [min_ts + initialize, max_ts] for a timeline, drop the record.

  • trim (str) – A TQL string expression defining the duration to trim at both the start and end of a timeline. For example, if the sample’s timestamp is [min_ts + trim, max_ts - trim] for a timeline, drop the record.

Returns

An object of class Panel for building longitudinal or panel datasets.

daily()

Set the panel’s time sampling to daily, given start, end, and interval.

get_heterogeneous()
get_homogenous()
get_options()
heterogeneous(heterogeneous=True)

Configure a Causmos model to use event group ‘attributes’ to build additional columns.

homogeneous(homogeneous=True)

Configure a Causmos model to ignore event group ‘attributes’ (only use the event group’s value or a constant) when building the model’s feature columns.

hourly()

Set the panel’s time sampling to hourly, given start, end, and interval.

options(truncate: Optional[bool] = None, balance: Optional[bool] = None, initialize: Optional[str] = None, trim: Optional[str] = None)

Update panel dataset options.

Parameters
  • truncate (bool) – Rescale the sample’s records to account for the observation window. For example, if a timeline’s max timestamp is 4 days prior to the sample timestamp but the duration is 7 days, adjust the timestamp and duration to correspond to 4 days prior and 3 days, respectively.

  • balance (bool) – If the sample’s timestamp is outside [min_ts, max_ts] for a timeline, drop record (e.g., daily samples, but not all timelines span all sampled days).

  • initialize (str) – A TQL string expression defining the duration to trim at the start of a timeline. For example, if the sample’s timestamp is [min_ts + initialize, max_ts] for a timeline, drop the record.

  • trim (str) – A TQL string expression defining the duration to trim at both the start and end of a timeline. For example, if the sample’s timestamp is [min_ts + trim, max_ts - trim] for a timeline, drop the record.

timestamp_range(start, end, interval)

Set the panel’s time sampling to evenly spaced intervals, given start, end, and interval.

weekly()

Set the panel’s time sampling to weekly, given start, end, and interval.

class causmos.causmos_model.Panel2(pre, post, duration: Optional[str] = None, cohorts: Optional[str] = None, heterogeneous: bool = False, truncate: bool = True, balance: bool = False, initialize: Optional[str] = None, trim: Optional[str] = None)

Bases: causmos.causmos_model.Panel

Create a two time-period panel object. This usually implies a “pre” period and a “post” period, relative to some event. See also help(Panel).

Create a Panel dataset object.

Parameters
  • start (str) – A TQL string expression defining the start of a timeline’s sampling.

  • end (str) – A TQL string expression defining the end of a timeline’s sampling.

  • interval (str) – A TQL string expression defining the uniform spacing of the timeline’s sampling.

  • sample (str) – A TQL string expression defining the sample generator expression (list of timestamps or events).

  • duration (str) – A TQL string expression defining the duration of each of the sample timestamps/events.

  • cohorts (str) – A way to include additional cohorts and define their sampling time (e.g., untreated timelines do not have “treatment” so cannot be sampled pre/post treatment.

  • heterogeneous (bool) – include the columns for a heterogeneous effects model.

  • truncate (bool) – Rescale the sample’s records to account for the observation window. For example, if a timeline’s max timestamp is 4 days prior to the sample timestamp but the duration is 7 days, adjust the timestamp and duration to correspond to 4 days prior and 3 days, respectively.

  • balance (bool) – If the sample’s timestamp is outside [min_ts, max_ts] for a timeline, drop record (e.g., daily samples, but not all timelines span all sampled days).

  • initialize (str) – A TQL string expression defining the duration to trim at the start of a timeline. For example, if the sample’s timestamp is [min_ts + initialize, max_ts] for a timeline, drop the record.

  • trim (str) – A TQL string expression defining the duration to trim at both the start and end of a timeline. For example, if the sample’s timestamp is [min_ts + trim, max_ts - trim] for a timeline, drop the record.

Returns

An object of class Panel for building longitudinal or panel datasets.

class causmos.causmos_model.Simple(sample, duration, cohorts: Optional[str] = None, heterogeneous: bool = False, truncate: bool = True, balance: bool = False, initialize: Optional[str] = None, trim: Optional[str] = None)

Bases: causmos.causmos_model.Panel

Create a Panel dataset object.

Parameters
  • start (str) – A TQL string expression defining the start of a timeline’s sampling.

  • end (str) – A TQL string expression defining the end of a timeline’s sampling.

  • interval (str) – A TQL string expression defining the uniform spacing of the timeline’s sampling.

  • sample (str) – A TQL string expression defining the sample generator expression (list of timestamps or events).

  • duration (str) – A TQL string expression defining the duration of each of the sample timestamps/events.

  • cohorts (str) – A way to include additional cohorts and define their sampling time (e.g., untreated timelines do not have “treatment” so cannot be sampled pre/post treatment.

  • heterogeneous (bool) – include the columns for a heterogeneous effects model.

  • truncate (bool) – Rescale the sample’s records to account for the observation window. For example, if a timeline’s max timestamp is 4 days prior to the sample timestamp but the duration is 7 days, adjust the timestamp and duration to correspond to 4 days prior and 3 days, respectively.

  • balance (bool) – If the sample’s timestamp is outside [min_ts, max_ts] for a timeline, drop record (e.g., daily samples, but not all timelines span all sampled days).

  • initialize (str) – A TQL string expression defining the duration to trim at the start of a timeline. For example, if the sample’s timestamp is [min_ts + initialize, max_ts] for a timeline, drop the record.

  • trim (str) – A TQL string expression defining the duration to trim at both the start and end of a timeline. For example, if the sample’s timestamp is [min_ts + trim, max_ts - trim] for a timeline, drop the record.

Returns

An object of class Panel for building longitudinal or panel datasets.

causmos.causmos_model.causmos(project_id: Optional[int] = None, name: Optional[str] = None, author: Optional[str] = None)

Initialize a Causmos model. See the Causmos class for more detailed documentation.

Parameters
  • project_id (int) – Use timelines from this project_id.

  • name (str) – Name of the Causmos model that will be published.

  • author (str) – Author of the Causmos model.

Returns

A Causmos model configuration object.

causmos.event_group module

class causmos.event_group.EventGroup(where, value='1', name=None, type='EVENT_GROUP', project_id=None, generate=None, dedup=None, dynamics=None, attributes=None, vars: Optional[dict] = None)

Bases: object

Creates an event group from TQL expressions. EventGroup objects also have a number of additional parameters that can be customized for modeling.

Create a new EventGroup object.

Parameters
  • where (str) – a TQL expression string which subsets the timeline to this event group.

  • value (str) – a TQL expression string with the return value of the event for modeling.

  • name (str) – name of the event group. If not provided a default name will be assigned.

  • type (str) – optionally provide a type for the event group. If not provided, the column type will be EVENT_GROUP.

  • project_id (str) – project id from which to pull the timelines for modeling.

  • generate (str) – a SamplingSettings object or simple expression that can be coerced to generate events.

  • dedup (dict) – for now, a dictionary with several keys to configure deduplication. Future: a DedupEvents object. all (bool): uses all events or limits to the first event matching the “where” expression. zero_value (bool): skips any events whose ‘value’ expression is zero. where (str): skips any events which do not satisfy the expression for deduping. window (str): skips any events which have a non-skipped event within ‘window’ milliseconds prior. Default: {'all': False, 'zero_value': False, 'where': "TRUE", 'window': "0"} Example: {'all': True, 'zero_value': True, 'where': "TRUE", 'window': "7 * 24 * 3600 * 1000"}.

  • dynamics (dict) –

    for now, a dictionary with ‘scale’, ‘shape’, ‘filters’, and ‘epsilon’. Future: a DynamicEvents object. Example: {'scale': '5m,1h,4h,1d,3d,7d', 'shape': 'exponential', 'filters': {'min_total_count': 200}, 'epsilon': 1e-9}. scale (str): the time scale over which an event may have predictive power or causal effects

    (default ‘5m,1h,4h,1d,3d,7d’).

    shape (str): the decay function shape (default ‘exponential’). filters (dict): A default feature filters configuration dictionary. epsilon (float): The threshold value at which SUM_OPPORTUNITIES() rounds to zero.

  • attributes (str) – a list of expressions describing each event in the group for modeling heterogeneity. If columns with feature filter configurations are provided, the filter configurations are preserved when applying dynamic filters.

  • vars (dict) – a dictionary keyed with any additional ‘global_vars’, ‘timeline_vars’, or ‘event_vars’ required to evaluate any expression in the event group. See also Query.vars().

Returns

an EventGroup object

attributes(attributes)

Set the event group’s numerical() and categorical() attribute columns. These are used for heterogeneous effects and subgroup/slicing analyses of predictions and causal effects.

dedup(dedup)

Set the dedup configuration. Default is {‘all’: True, ‘zero_value’: True, ‘where’: “TRUE”, ‘window’: “0”}. See help(EventGroup).

describe()

Describe the event group using TQL’s query.describe().

dynamics(dynamics)

Set the dynamics configuration. Default is {‘scale’: ‘3h,1d,7d’, ‘shape’: ‘exp’, ‘filters’: {‘min_total_count’: 200}, ‘epsilon’: 1e-9}. See help(EventGroup).

generate(generate)

Attach a list of generate_events() configurations.

Parameters

generate (list of SamplingConf) – List of generate_events sampling configurations.

get_attributes()
get_dedup()
get_dynamics()
get_generate()
get_name()
get_project_id()
get_timeline_vars()
get_type()
get_value()
get_vars()
get_where()
get_where_event()
json()

Format this object into JSON.

name(name)

Set the event group’s name.

project_id(project_id)

Set the event group’s project.

tql(apply_dedup=True)

Build a TQL query to inspect the event group object’s configuration.

type(type)

Set the event group’s type. Options: ‘OUTCOME’, ‘TREATMENT’, ‘OPPORTUNITY’, ‘INTERVENTION’, ‘EVENT_GROUP’.

validate()

Validate this EventGroup is configured correctly, checking that all the TQL expressions compile. Internally, this function utilizes self.tql().validate() and throws a TQLAnalysisException if not valid.

value(expression)

Set the event group’s value column.

vars(vars: dict)

Set additional TQL query.vars() for the expressions used in the event group. Useful for constructing complex expressions such as event groups that depend on other event groups.

where(expression)

Set the .where() expression for the event group.

causmos.event_group.intervention(where, value='1', name=None, project_id=None, generate=None, dedup=None, dynamics=None, attributes=None, vars: Optional[dict] = None)

Creates an event group from a TQL expression. EventGroup objects also have a number of additional parameters that can be customized for modeling.

Parameters
  • where (str) – a TQL expression string which subsets the timeline to this event group.

  • value (str) – a TQL expression string with the return value of the event for modeling.

  • name (str) – name of the event group. If not provided a default name will be assigned.

  • project_id (str) – project id from which to pull the timelines for modeling.

  • generate (str) – a SamplingSettings object or simple expression that can be coerced to generate events.

  • dedup (dict) – for now, a dictionary with several keys to configure deduplication. Future: a DedupEvents object. all (bool): uses all events or limits to the first event matching the “where” expression. zero_value (bool): skips any events whose ‘value’ expression is zero. where (str): skips any events which do not satisfy the expression for deduping. window (str): skips any events which have a non-skipped event within ‘window’ milliseconds prior. Default: {'all': False, 'zero_value': False, 'where': "TRUE", 'window': "0"} Example: {'all': True, 'zero_value': True, 'where': "TRUE", 'window': "7 * 24 * 3600 * 1000"}.

  • dynamics (dict) –

    for now, a dictionary with ‘scale’, ‘shape’, ‘filters’, and ‘epsilon’. Future: a DynamicEvents object. Example: {'scale': '5m,1h,4h,1d,3d,7d', 'shape': 'exponential', 'filters': {'min_total_count': 200}, 'epsilon': 1e-9}. scale (str): the time scale over which an event may have predictive power or causal effects

    (default ‘5m,1h,4h,1d,3d,7d’).

    shape (str): the decay function shape (default ‘exponential’). filters (dict): A default feature filters configuration dictionary. epsilon (float): The threshold value at which SUM_OPPORTUNITIES() rounds to zero.

  • attributes (str) – a list of expressions describing each event in the group for modeling heterogeneity. If columns with feature filter configurations are provided, the filter configurations are preserved when applying dynamic filters.

  • vars (dict) – a dictionary keyed with any additional ‘global_vars’, ‘timeline_vars’, or ‘event_vars’ required to evaluate any expression in the event group. See also Query.vars().

Returns

an EventGroup object

causmos.event_group.opportunity(where: str, value: str = '1', name: Optional[str] = None, project_id: Optional[str] = None, generate=None, dedup: Optional[dict] = None, dynamics: Optional[dict] = None, attributes: Optional[list] = None, vars: Optional[dict] = None)

Creates an opportunity event group from a TQL expression. EventGroup objects also have a number of additional parameters that can be customized for modeling.

Parameters
  • where (str) – a TQL expression string which subsets the timeline to this event group.

  • value (str) – a TQL expression string with the return value of the event for modeling.

  • name (str) – name of the event group. If not provided a default name will be assigned.

  • project_id (str) – project id from which to pull the timelines for modeling.

  • generate (str) – a SamplingSettings object or simple expression that can be coerced to generate events.

  • dedup (dict) – for now, a dictionary with several keys to configure deduplication. Future: a DedupEvents object. all (bool): uses all events or limits to the first event matching the “where” expression. zero_value (bool): skips any events whose ‘value’ expression is zero. where (str): skips any events which do not satisfy the expression for deduping. window (str): skips any events which have a non-skipped event within ‘window’ milliseconds prior. Default: {'all': False, 'zero_value': False, 'where': "TRUE", 'window': "0"} Example: {'all': True, 'zero_value': True, 'where': "TRUE", 'window': "7 * 24 * 3600 * 1000"}.

  • dynamics (dict) –

    for now, a dictionary with ‘scale’, ‘shape’, ‘filters’, and ‘epsilon’. Future: a DynamicEvents object. Example: {'scale': '5m,1h,4h,1d,3d,7d', 'shape': 'exponential', 'filters': {'min_total_count': 200}, 'epsilon': 1e-9}. scale (str): the time scale over which an event may have predictive power or causal effects

    (default ‘5m,1h,4h,1d,3d,7d’).

    shape (str): the decay function shape (default ‘exponential’). filters (dict): A default feature filters configuration dictionary. epsilon (float): The threshold value at which SUM_OPPORTUNITIES() rounds to zero.

  • attributes (list) – a list of expressions describing each event in the group for modeling heterogeneity. If columns with feature filter configurations are provided, the filter configurations are preserved when applying dynamic filters.

  • vars (dict) – a dictionary keyed with any additional ‘global_vars’, ‘timeline_vars’, or ‘event_vars’ required to evaluate any expression in the event group. See also Query.vars().

Returns

an EventGroup object

causmos.event_group.outcome(where, value='1', name=None, project_id=None, generate=None, dedup=None, dynamics=None, attributes=None, vars: Optional[dict] = None)

Creates an outcome event group from a TQL expression. EventGroup objects also have a number of additional parameters that can be customized for modeling.

Parameters
  • where (str) – a TQL expression string which subsets the timeline to this event group.

  • value (str) – a TQL expression string with the return value of the event for modeling.

  • name (str) – name of the event group. If not provided a default name will be assigned.

  • project_id (str) – project id from which to pull the timelines for modeling.

  • generate (str) – a SamplingConf object or simple expression that can be coerced to generate events.

  • dedup (dict) – for now, a dictionary with several keys to configure deduplication. Future: a DedupEvents object. all (bool): uses all events or limits to the first event matching the “where” expression. zero_value (bool): skips any events whose ‘value’ expression is zero. where (str): skips any events which do not satisfy the expression for deduping. window (str): skips any events which have a non-skipped event within ‘window’ milliseconds prior. Default: {'all': False, 'zero_value': False, 'where': "TRUE", 'window': "0"} Example: {'all': True, 'zero_value': True, 'where': "TRUE", 'window': "7 * 24 * 3600 * 1000"}.

  • dynamics (dict) –

    for now, a dictionary with ‘scale’, ‘shape’, ‘filters’, and ‘epsilon’. Future: a DynamicEvents object. Example: {'scale': '5m,1h,4h,1d,3d,7d', 'shape': 'exponential', 'filters': {'min_total_count': 200}, 'epsilon': 1e-9}. scale (str): the time scale over which an event may have predictive power or causal effects

    (default ‘5m,1h,4h,1d,3d,7d’).

    shape (str): the decay function shape (default ‘exponential’). filters (dict): A default feature filters configuration dictionary. epsilon (float): The threshold value at which SUM_OPPORTUNITIES() rounds to zero.

  • attributes (str) – a list of expressions describing each event in the group for modeling heterogeneity. If columns with feature filter configurations are provided, the filter configurations are preserved when applying dynamic filters.

  • vars (dict) – a dictionary keyed with any additional ‘global_vars’, ‘timeline_vars’, or ‘event_vars’ required to evaluate any expression in the event group. See also Query.vars().

Returns

an EventGroup object

causmos.event_group.treatment(where, value='1', name=None, project_id=None, generate=None, dedup=None, dynamics=None, attributes=None, vars: Optional[dict] = None)

Creates a treatment event group from a TQL expression. EventGroup objects also have a number of additional parameters that can be customized for modeling.

Parameters
  • where (str) – a TQL expression string which subsets the timeline to this event group.

  • value (str) – a TQL expression string with the return value of the event for modeling.

  • name (str) – name of the event group. If not provided a default name will be assigned.

  • project_id (str) – project id from which to pull the timelines for modeling.

  • generate (str) – a SamplingSettings object or simple expression that can be coerced to generate events.

  • dedup (dict) – for now, a dictionary with several keys to configure deduplication. Future: a DedupEvents object. all (bool): uses all events or limits to the first event matching the “where” expression. zero_value (bool): skips any events whose ‘value’ expression is zero. where (str): skips any events which do not satisfy the expression for deduping. window (str): skips any events which have a non-skipped event within ‘window’ milliseconds prior. Default: {'all': False, 'zero_value': False, 'where': "TRUE", 'window': "0"} Example: {'all': True, 'zero_value': True, 'where': "TRUE", 'window': "7 * 24 * 3600 * 1000"}.

  • dynamics (dict) –

    for now, a dictionary with ‘scale’, ‘shape’, ‘filters’, and ‘epsilon’. Future: a DynamicEvents object. Example: {'scale': '5m,1h,4h,1d,3d,7d', 'shape': 'exponential', 'filters': {'min_total_count': 200}, 'epsilon': 1e-9}. scale (str): the time scale over which an event may have predictive power or causal effects

    (default ‘5m,1h,4h,1d,3d,7d’).

    shape (str): the decay function shape (default ‘exponential’). filters (dict): A default feature filters configuration dictionary. epsilon (float): The threshold value at which SUM_OPPORTUNITIES() rounds to zero.

  • attributes (str) – a list of expressions describing each event in the group for modeling heterogeneity. If columns with feature filter configurations are provided, the filter configurations are preserved when applying dynamic filters.

  • vars (dict) – a dictionary keyed with any additional ‘global_vars’, ‘timeline_vars’, or ‘event_vars’ required to evaluate any expression in the event group. See also Query.vars().

Returns

an EventGroup object

causmos.globals module

causmos.globals.get_project()
causmos.globals.set_project(p_id)

causmos.validation module

causmos.validation.validate_attributes(attributes, group_name: Optional[str] = None)

Validate list of attributes. They must be either NUMERICAL or CATEGORICAL.

causmos.validation.validate_dedup(dedup)

Validate a ‘dedup’ dictionary configuration.