causmos package
- causmos.categorical(expr, name: Optional[str] = None, filters: Optional[dict] = None) zeenk.tql.column.FeatureColumn
Creates a categorical feature column from a TQL expression, optionally providing name and filters
- Parameters
expr – A TQL expression string
name (str) – Optionally provide a name for the column. If not provided a default name will be assigned
filters (dict) – Optionally provide a dictionary of column filter conditions for categorical or numerical filters
- Returns
A FeatureColumn object to be provided to
select(...)
- Return type
- causmos.causmos(project_id: Optional[int] = None, name: Optional[str] = None, author: Optional[str] = None)
Initialize a Causmos model. See the Causmos class for more detailed documentation.
- Parameters
project_id (int) – Use timelines from this project_id.
name (str) – Name of the Causmos model that will be published.
author (str) – Author of the Causmos model.
- Returns
A Causmos model configuration object.
- causmos.col(c: any, name: Optional[str] = None, type: Optional[str] = None, filters: Optional[dict] = None)
Creates a column from a TQL expression or given input, optionally providing name, type, and filters. The first argument can be a variety of formats, including:
classes or subclasses of type Column
dictionary containing keys for name, expression, and type
raw strings, which will be interpreted as TQL expressions.
The created column is by default unnamed, and will be assigned a name if used in a TQL
select(...)
statement. The default type of columns is ‘METADATA’.- Parameters
c (any) – A TQL expression string, column object, dictionary, list, tuple, or FeatureColumn object
name (str) – Optionally provide a name for the column. If not provided a default name will be assigned
type (str) – Optionally provide a type for the column. If not provided, the column type will be METADATA
filters (dict) – Optionally provide a dictionary of column filter conditions for categorical or numerical filters
- Returns
A Column or FeatureColumn object to be provided to
select(...)
- causmos.constant()
Used in select(…) statements to select the FeatureColumn “1.0” with name “constant”
- Returns
A tuple with the FeatureColumn “1.0” with name “constant”
- Return type
tuple
- causmos.create_project(name: str, or_update: bool = False) zeenk.tql.timelines.Project
Creates a new project. This project will require further configuration before timelines can be created. If the project already exists and you wish to update it, use
load_project(name)
, or give theor_update=True
option here.- Parameters
name (str) – The name of the Project
or_update (bool) – Whether to update the Project
- Returns
The Project that has just been created
- Return type
- causmos.create_timeseries(name: str) zeenk.tql.timeseries.TimeSeries
Creates a TimeSeries with the given name
- Parameters
name (str) – The name of the TimeSeries
- Returns
A new TimeSeries with the given name
- Return type
- causmos.debugger(project, expression: str = '', theme: str = 'light')
Create an expression debugger widget - default row limit of 10 is applied automatically. Only works in Jupyter Notebooks or Jupyter Labs
- Parameters
project – The Project ID or name to run the expression against
expression (str) – The initial value of the expression, if any
theme (str) – The editor theme, ‘light’ or ‘dark’
- Returns
A Jupyter Notebook/Lab widget
- causmos.delete_udf(project_id: int, function_name: str)
Deletes a UDF for a specific Project by function name
- Parameters
project_id (int) – ID of the Project to delete udf from
function_name (str) – Name of the udf to be deleted
- causmos.describe_project(project_identifier, fail_if_not_found: bool = True) zeenk.tql.timelines.Project
Loads the specified project by name, or throws TQLAnalysisException if not found
- Parameters
project_identifier – The name (str) or ID (int) of the Project
fail_if_not_found (bool) – Whether to throw an exception if the Project is not found
- Returns
The Project with the specified name or ID
- Return type
- causmos.drop_project(name_or_id, if_exists: bool = False)
Deletes the project by name or ID
- Parameters
name_or_id – The name (str) or ID (int) of the Project
if_exists (bool) – Only drop the Project if it exists
- causmos.event_metadata() tuple
To be used in select(…) statements for returning timeline.id, id, datetime, and type
- Returns
A tuple of Columns
- Return type
tuple
- causmos.event_time() tuple
To be used in select(…) statements for returning timestamp and duration
- Returns
A tuple of Columns
- Return type
tuple
- causmos.generate_events(sample_generator_expression: str, functions: Optional[list] = None, variables: Optional[dict] = None, attribute_expressions: Optional[dict] = None, inherit_attributes: bool = False) zeenk.tql.sampling.SamplingConf
Manual sampling is similar to importance sampling but allows the user to manually override parameters such as the list of sample events (e.g., timestamps at which sampling occurs) and other properties of each generated sample event. See generate_importance_events() for an advanced use case.
- causmos.generate_importance_events(num_samples: float = - 1.0, min_ts: Optional[str] = None, max_ts: Optional[str] = None, time_shift_factor: float = 0.0, fraction_uniform_time: float = 0.0, sampling_distribution: str = 'exponential', sampling_events_expression: str = 'FILTER(timeline.events, (x) -> x.type=null)', sampling_kernels: str = '5m,15m,1h,4h,1d,3d,7d') zeenk.tql.sampling.SamplingConf
Importance sampling can be used to retain modeling unbiasedness (avoid introducing selection bias when sampling records) while still increasing the number of records where the modeling is most interesting. For example, when modeling the causal effect of a treatment on an outcome, we would like to ensure that most of our records (whether ‘positive’, an outcome event, or ‘negative’, a non-outcome sampling event) are in the vicinity of a treatment opportunity or an outcome. By so doing, we increase the model’s statistical power at deciphering the relationship between the two. In contrast, if most outcomes happen during only 10% of the sample time period, we most of our observations will be during the “boring” portion of the timeline when no events of interest are occurring.
generate_importance_events helps you configure what time periods are “interesting.” You configure how many records to randomly sample for each timeline, which timestamps or events you want to increase your sampling around, the distribution (shape and scale) around each event from which you would like to randomly sample, and the probability you would like to draw from the background uniform distribution (e.g., a random point in the timeline).
In summary, one way to generate negative (non-outcome) records would be to simply draw uniformly between the start and end of the timeline’s observation window. However, we can improve upon that by instructing the extractor to generate these negatives by a configurable time-importance-weighted sampling methodology around times of timeline events.
- causmos.get_project()
- causmos.get_udf(project_id: int, function_name: str) zeenk.tql.udf.UDF
Retrieve a specific UDF from a Project
- Parameters
project_id (int) – ID of the Project to retrieve UDF on
function_name (str) – Name of the UDF to retrieve
- Returns
A UDF or throws TQLAnalysisNotFound exception
- Return type
- causmos.intervention(where, value='1', name=None, project_id=None, generate=None, dedup=None, dynamics=None, attributes=None, vars: Optional[dict] = None)
Creates an event group from a TQL expression. EventGroup objects also have a number of additional parameters that can be customized for modeling.
- Parameters
where (str) – a TQL expression string which subsets the timeline to this event group.
value (str) – a TQL expression string with the return value of the event for modeling.
name (str) – name of the event group. If not provided a default name will be assigned.
project_id (str) – project id from which to pull the timelines for modeling.
generate (str) – a SamplingSettings object or simple expression that can be coerced to generate events.
dedup (dict) – for now, a dictionary with several keys to configure deduplication. Future: a DedupEvents object. all (bool): uses all events or limits to the first event matching the “where” expression. zero_value (bool): skips any events whose ‘value’ expression is zero. where (str): skips any events which do not satisfy the expression for deduping. window (str): skips any events which have a non-skipped event within ‘window’ milliseconds prior. Default:
{'all': False, 'zero_value': False, 'where': "TRUE", 'window': "0"}
Example:{'all': True, 'zero_value': True, 'where': "TRUE", 'window': "7 * 24 * 3600 * 1000"}
.dynamics (dict) –
for now, a dictionary with ‘scale’, ‘shape’, ‘filters’, and ‘epsilon’. Future: a DynamicEvents object. Example:
{'scale': '5m,1h,4h,1d,3d,7d', 'shape': 'exponential', 'filters': {'min_total_count': 200}, 'epsilon': 1e-9}
. scale (str): the time scale over which an event may have predictive power or causal effects(default ‘5m,1h,4h,1d,3d,7d’).
shape (str): the decay function shape (default ‘exponential’). filters (dict): A default feature filters configuration dictionary. epsilon (float): The threshold value at which SUM_OPPORTUNITIES() rounds to zero.
attributes (str) – a list of expressions describing each event in the group for modeling heterogeneity. If columns with feature filter configurations are provided, the filter configurations are preserved when applying dynamic filters.
vars (dict) – a dictionary keyed with any additional ‘global_vars’, ‘timeline_vars’, or ‘event_vars’ required to evaluate any expression in the event group. See also Query.vars().
- Returns
an EventGroup object
- causmos.kernelize(features, name: Optional[str] = None, treatment_filter_event: Optional[str] = None, event_group_where_event: Optional[str] = None, intended_action: Optional[list] = None, actual_action: Optional[list] = None, treatment_model: Optional[str] = None, kernel_types: Optional[list] = None, opportunity_conf: Optional[zeenk.tql.opportunity_conf.OpportunityConf] = None, kernel_parameters: Optional[str] = None, kernel_distribution: Optional[str] = None, filters: Optional[dict] = None, parse_kv: bool = False)
Takes a list of opportunity-level features based on user/opportunity/treatment/proxy-outcome-level fields and create a transformed feature with
SUM_OPPORTUNITIES()
accumulating the effects across each opportunity. These “kernelized” features are for use in a Causmos timeline-based causal model where each observation record in a dataset is an outcome or a potential outcome (e.g., a moment in time when an outcome could have occurred). For example, we can power dataset generation using a treatment-propensity model to reduce the potential bias if treatment and the outcome are correlated. Apply this function to each list of features to be kernelized. There are several types of transformations described as (KD = ‘treatment’, BKD = ‘baseline’, GKD = ‘ghost’, NKD = ‘nonrandom’).- Parameters
features (list) – List of col() to be kernelized (KD,BKD,GKD,NKD). Examples:
["1.0", "IF(1.0,'L','R')"]
,[{"name":"constant", "expression":"1.0", "type":"NUMERICAL"}]
). Columns can be “CATEGORICAL” or “NUMERICAL”. Each categorical expression creates an expansion of features; each numerical expression multiplies/reweights the opportunity’s contribution to theSUM_OPPORTUNITIES()
sum.name (str) – Base string for the feature sets. Will append type of kernel and prepend “w” to denote non-incrementality features for BKD, GKD, and NKD. If name=’’ (default), just return the list of features with the default prefix:
AKD_...
,wBKD_...
.treatment_filter_event (str) – Column that defines whether a treatment opportunity resulted in treatment. For example, was the advertiser’s impression shown after bidding? In a sample dataset,
request.impression
is the relevant field.event_group_where_event (str) – second where condition within passed EventGroup (currently this is set to be output from filtering like dedupe - but should be made more explicit).
intended_action – List of numerical/categorical expressions defining the intended/optimal decisions, e.g.,
bid_amount
oreligible_treatment_groups
.actual_action – List of numerical/categorical expressions defining the actual action/decision taken, e.g.,
IF(ghost_bid, 0, bid_amount)
orassigned_treatment_group
.treatment_model (str) – String for the deployed/published treatment prediction model (GKD, NKD-only). In practice, this will be the win-rate model based on the leaves.
kernel_types (list) – List of kernels types to include in the feature set. Subset of
['KD','BKD','GKD','NKD']
.opportunity_conf – An OpportunityConf() object that contains the defaults for
opportunity_filter
,kernel_parameters
, andkernel_distribution
.kernel_parameters (str) – Calibration for
kernel.days
in the kernel parameters. Arrays of kernel features are created with suffixes using values such as seconds (s), minutes (m), hours (h), and days (d) in combination with a number, such as'15m,4h,3d'
:'name_feature'
becomes'name_feature-15m'
, etc.kernel_distribution (str) – Positive-support distributions (short-form abbreviations):
exponential
(e
,exp
),uniform
(u
,unif
),triangular
(t
,tri
),halfnormal
(h
,hnorm
), andhalflogistic
(l
,hlog
). Positive- & negative-support distributions are the symmetric analogs of the positive-support distributions:laplace
(a
,lap
),rectangular
(r
,rect
),symmetrictriangular
(s
,stri
),normal
(n
,norm
), andlogistic
(o
,log
). Time-independentconstant
kernel (c
,const
) is also useful for various use cases such as static models in order to accumulate all opportunities and treatments for each outcome.filters (dict) – column filters to apply to all of the features.
parse_kv (boolean) – Try to parse ‘NUMERICAL’ input
features
as key-value pairs ‘k:v’. This has extra overhead, but more complex mixed ‘categorical:numerical’ input features.
- Returns
A list of feature sets for each kernel_type, each containing a list of kernelized features.
- Return type
list
- causmos.label(expr, name: str = '_label') zeenk.tql.column.Column
Creates a label column from a TQL expression. The expression provided to label() is expected to return a numeric value for all rows. If a numeric or NaN/infinite value is not returned for any row, it will be replaced with the default label value of 0.0. Label columns will be automatically be named “_label”. It is expected that a dataset will have at most one label column.
- Parameters
expr – A TQL expression string
name (str) – Optionally provide a name for the column. If not provided a default name will be assigned
- Returns
A Column object to be provided to
select(...)
- Return type
- causmos.list_udfs(project_id: int) zeenk.tql.udf.UDF
Retrieves all UDFs defined for a Project
- Parameters
project_id (int) – ID of the Project to retrieve UDFs on
- Returns
All existing UDFs for the Project
- Return type
- causmos.load_project(project_identifier, fail_if_not_found: bool = True) zeenk.tql.timelines.Project
Loads the specified project by name, or throws TQLAnalysisException if not found
- Parameters
project_identifier – The name (str) or ID (int) of the Project
fail_if_not_found (bool) – Whether to throw an exception if the Project is not found
- Returns
The Project with the specified name or ID
- Return type
- causmos.load_query(id: int) zeenk.tql.query.Query
Loads the query from the given ID
- Parameters
id (int) – The ID of the query
- Returns
A new Query instance
- Return type
- causmos.load_resultset(id: int) zeenk.tql.resultset.ResultSet
Loads the ResultSet with the given ID
- Parameters
id (int) – The ID of the ResultSet to be loaded
- Returns
The ResultSet
- Return type
- causmos.metadata(expr, name: Optional[str] = None) zeenk.tql.column.Column
Creates a metadata column from a TQL expression. Metadata columns will return the expression values “as is”, meaning they will not be post-processed with charset filtering or expansion of numerical columns. Metadata columns are also not subject to column filters. Metadata columns are the default column type and are often the correct choice for arbitrary datasets that are not specifically intended to be consumed by a ML training package.
- Parameters
expr – A TQL expression string
name (str) – Optionally provide a name for the column. If not provided a default name will be assigned
- Returns
A Column object to be provided to
select(...)
- Return type
- causmos.numerical(expr, name: Optional[str] = None, filters: Optional[dict] = None) zeenk.tql.column.FeatureColumn
Creates a numerical feature column from a TQL expression, optionally providing name and filters
- Parameters
expr – A TQL expression string
name (str) – Optionally provide a name for the column. If not provided a default name will be assigned
filters (dict) – Optionally provide a dictionary of column filter conditions for categorical or numerical filters
- Returns
A FeatureColumn object to be provided to
select(...)
- Return type
- causmos.opportunity(where: str, value: str = '1', name: Optional[str] = None, project_id: Optional[str] = None, generate=None, dedup: Optional[dict] = None, dynamics: Optional[dict] = None, attributes: Optional[list] = None, vars: Optional[dict] = None)
Creates an opportunity event group from a TQL expression. EventGroup objects also have a number of additional parameters that can be customized for modeling.
- Parameters
where (str) – a TQL expression string which subsets the timeline to this event group.
value (str) – a TQL expression string with the return value of the event for modeling.
name (str) – name of the event group. If not provided a default name will be assigned.
project_id (str) – project id from which to pull the timelines for modeling.
generate (str) – a SamplingSettings object or simple expression that can be coerced to generate events.
dedup (dict) – for now, a dictionary with several keys to configure deduplication. Future: a DedupEvents object. all (bool): uses all events or limits to the first event matching the “where” expression. zero_value (bool): skips any events whose ‘value’ expression is zero. where (str): skips any events which do not satisfy the expression for deduping. window (str): skips any events which have a non-skipped event within ‘window’ milliseconds prior. Default:
{'all': False, 'zero_value': False, 'where': "TRUE", 'window': "0"}
Example:{'all': True, 'zero_value': True, 'where': "TRUE", 'window': "7 * 24 * 3600 * 1000"}
.dynamics (dict) –
for now, a dictionary with ‘scale’, ‘shape’, ‘filters’, and ‘epsilon’. Future: a DynamicEvents object. Example:
{'scale': '5m,1h,4h,1d,3d,7d', 'shape': 'exponential', 'filters': {'min_total_count': 200}, 'epsilon': 1e-9}
. scale (str): the time scale over which an event may have predictive power or causal effects(default ‘5m,1h,4h,1d,3d,7d’).
shape (str): the decay function shape (default ‘exponential’). filters (dict): A default feature filters configuration dictionary. epsilon (float): The threshold value at which SUM_OPPORTUNITIES() rounds to zero.
attributes (list) – a list of expressions describing each event in the group for modeling heterogeneity. If columns with feature filter configurations are provided, the filter configurations are preserved when applying dynamic filters.
vars (dict) – a dictionary keyed with any additional ‘global_vars’, ‘timeline_vars’, or ‘event_vars’ required to evaluate any expression in the event group. See also Query.vars().
- Returns
an EventGroup object
- causmos.outcome(where, value='1', name=None, project_id=None, generate=None, dedup=None, dynamics=None, attributes=None, vars: Optional[dict] = None)
Creates an outcome event group from a TQL expression. EventGroup objects also have a number of additional parameters that can be customized for modeling.
- Parameters
where (str) – a TQL expression string which subsets the timeline to this event group.
value (str) – a TQL expression string with the return value of the event for modeling.
name (str) – name of the event group. If not provided a default name will be assigned.
project_id (str) – project id from which to pull the timelines for modeling.
generate (str) – a SamplingConf object or simple expression that can be coerced to generate events.
dedup (dict) – for now, a dictionary with several keys to configure deduplication. Future: a DedupEvents object. all (bool): uses all events or limits to the first event matching the “where” expression. zero_value (bool): skips any events whose ‘value’ expression is zero. where (str): skips any events which do not satisfy the expression for deduping. window (str): skips any events which have a non-skipped event within ‘window’ milliseconds prior. Default:
{'all': False, 'zero_value': False, 'where': "TRUE", 'window': "0"}
Example:{'all': True, 'zero_value': True, 'where': "TRUE", 'window': "7 * 24 * 3600 * 1000"}
.dynamics (dict) –
for now, a dictionary with ‘scale’, ‘shape’, ‘filters’, and ‘epsilon’. Future: a DynamicEvents object. Example:
{'scale': '5m,1h,4h,1d,3d,7d', 'shape': 'exponential', 'filters': {'min_total_count': 200}, 'epsilon': 1e-9}
. scale (str): the time scale over which an event may have predictive power or causal effects(default ‘5m,1h,4h,1d,3d,7d’).
shape (str): the decay function shape (default ‘exponential’). filters (dict): A default feature filters configuration dictionary. epsilon (float): The threshold value at which SUM_OPPORTUNITIES() rounds to zero.
attributes (str) – a list of expressions describing each event in the group for modeling heterogeneity. If columns with feature filter configurations are provided, the filter configurations are preserved when applying dynamic filters.
vars (dict) – a dictionary keyed with any additional ‘global_vars’, ‘timeline_vars’, or ‘event_vars’ required to evaluate any expression in the event group. See also Query.vars().
- Returns
an EventGroup object
- causmos.query(project='lethe4')
A simple example query with event_metadata() and a limit of 10
- Parameters
project – The Project or project ID to run the Query against
- Returns
An example Query object
- Return type
- causmos.random_partition(partition_var: str = 'timeline.id', seed: str = '', shares: list = [0.8, 0.2], names: list = ['train', 'test'])
Use the arguments to create a random partition TQL expression like
col("MD5_PARTITIONS(timeline.id, 'my hashing seed', [0.008, 0.002], ['train','test'])")
. This can be used as a column or with .partition_by().- Parameters
partition_var (str) – Single-line TQL subexpression
seed (str) – String to serve as the seed for the hash partitioning
shares (list) – List of the relative shares of each partition. Relative shares do not need to sum to one
names (list) – List of the names for each partition
- Returns
A metadata column with the random_partition expression for use in .partition_by() or as a column
- Return type
- causmos.select(*cols) zeenk.tql.query.Query
Create a new query object from one or more columns. Columns can be defined as TQL expression strings, or wrapped using one of the provided TQL column functions
label()
,weight()
,tag()
,categorical()
,numerical()
, ormetadata()
. Typicallyselect(...)
will immediately be followed by.from_timelines(...)
or.from_events(...)
during query construction.- Parameters
cols – One or more TQL column objects or expressions
- Returns
A new tql.query.Query instance for further chaining/modification
- Return type
- causmos.set_project(p_id)
- causmos.show_projects()
Shows the available projects
- causmos.tag(expr, name: str = '_tag') zeenk.tql.column.Column
Creates a tag column from a TQL expression. The expression provided to tag() is expected to return a non-null value for all rows. Typically this expression will uniquely identify the row, which is useful for debugging and tracing datasets later. Uniqueness is not required for the return value, but highly encouraged. Tag columns will automatically be named “_tag”. It is expected that a dataset will have at most one tag column.
- Parameters
expr – A TQL expression string that returns a unique identifier for the row
name (str) – Optionally provide a name for the column. If not provided a default name will be assigned
- Returns
A Column object to be provided to
select(...)
- Return type
- causmos.treatment(where, value='1', name=None, project_id=None, generate=None, dedup=None, dynamics=None, attributes=None, vars: Optional[dict] = None)
Creates a treatment event group from a TQL expression. EventGroup objects also have a number of additional parameters that can be customized for modeling.
- Parameters
where (str) – a TQL expression string which subsets the timeline to this event group.
value (str) – a TQL expression string with the return value of the event for modeling.
name (str) – name of the event group. If not provided a default name will be assigned.
project_id (str) – project id from which to pull the timelines for modeling.
generate (str) – a SamplingSettings object or simple expression that can be coerced to generate events.
dedup (dict) – for now, a dictionary with several keys to configure deduplication. Future: a DedupEvents object. all (bool): uses all events or limits to the first event matching the “where” expression. zero_value (bool): skips any events whose ‘value’ expression is zero. where (str): skips any events which do not satisfy the expression for deduping. window (str): skips any events which have a non-skipped event within ‘window’ milliseconds prior. Default:
{'all': False, 'zero_value': False, 'where': "TRUE", 'window': "0"}
Example:{'all': True, 'zero_value': True, 'where': "TRUE", 'window': "7 * 24 * 3600 * 1000"}
.dynamics (dict) –
for now, a dictionary with ‘scale’, ‘shape’, ‘filters’, and ‘epsilon’. Future: a DynamicEvents object. Example:
{'scale': '5m,1h,4h,1d,3d,7d', 'shape': 'exponential', 'filters': {'min_total_count': 200}, 'epsilon': 1e-9}
. scale (str): the time scale over which an event may have predictive power or causal effects(default ‘5m,1h,4h,1d,3d,7d’).
shape (str): the decay function shape (default ‘exponential’). filters (dict): A default feature filters configuration dictionary. epsilon (float): The threshold value at which SUM_OPPORTUNITIES() rounds to zero.
attributes (str) – a list of expressions describing each event in the group for modeling heterogeneity. If columns with feature filter configurations are provided, the filter configurations are preserved when applying dynamic filters.
vars (dict) – a dictionary keyed with any additional ‘global_vars’, ‘timeline_vars’, or ‘event_vars’ required to evaluate any expression in the event group. See also Query.vars().
- Returns
an EventGroup object
- causmos.update_udf(project_id: int, *function_src)
Updates UDF(s) for the specified Project
- Parameters
project_id (int) – ID of the Project to upload UDFs to
function_src – One or more function’s source string
Throws TQLAnalysisException if function doesn’t exist in the Project yet
- causmos.upload_udf(project_id: int, *function_src)
Uploads udf(s) to the specified Project
- Parameters
project_id (int) – ID of the Project to upload UDFs to
function_src – One or more function’s source string
Throws TQLAnalysisException if function already exists in the Project
- causmos.validate_udf(project_id: int, *function_src)
Validates UDF(s)
- Parameters
project_id (int) – ID of the Project to validate UDFs on
function_src – One or more function’s source string
Throws TQLAnalysisException and printout the error message and location if there’s any compilation error in udf
- causmos.weight(expr, name: str = '_weight') zeenk.tql.column.Column
Creates a weight column from a TQL expression. The expression provided to weight() is expected to return a numeric value for all rows. If a numeric or NaN/infinite value is not returned for any row, it will be replaced with the default weight value of 1.0. Weight columns will be automatically be named “_weight”. It is expected that a dataset will have at most one weight column.
- Parameters
expr – A TQL expression string that returns a numeric value
name (str) – Optionally provide a name for the column. If not provided a default name will be assigned
- Returns
A Column object to be provided to
select(...)
- Return type
Subpackages
Submodules
causmos.causmos_model module
- class causmos.causmos_model.Causmos(project_id: Optional[int] = None, name: Optional[str] = None, author: Optional[str] = None, type: Optional[str] = None, outcome: Optional[causmos.event_group.EventGroup] = None, treatment: Optional[causmos.event_group.EventGroup] = None, opportunity: Optional[causmos.event_group.EventGroup] = None, intervention: Optional[causmos.event_group.EventGroup] = None, predictor: Optional[causmos.event_group.EventGroup] = None, baseline: Optional[causmos.event_group.EventGroup] = None)
Bases:
object
Causmos Model Factory: Estimand & Sampling - Provide a configuration for an outcome - (Optional) Provide a configuration for treatment, opportunity, intervention, and/or predictor/baseline. - Choose a specific sampling methodology for model training/estimation.
Causmos is the core modeling question: - my_causmos.pre_post() –> Modeling instance –> Query/Dataset –> Training –> Publish –> Metrics - my_causmos.panel() –> Modeling instance –> Query/Dataset… - my_causmos.uplift() –> Modeling instance…
Create a Causmos model configuration object.
- Parameters
project_id (int) – Use timelines from this project_id.
name (str) – Name of the Causmos model that will be published.
author (str) – Author of the Causmos model.
type (str) – Causmos model type.
outcome (EventGroup) – An EventGroup object of type==’OUTCOME’ defining the outcome events in the model.
treatment (EventGroup) – An EventGroup object of type==’TREATMENT’ defining the treatment events in the model.
opportunity (EventGroup) – An EventGroup object of type==’OPPORTUNITY’ defining the opportunity events.
intervention (EventGroup) – An EventGroup object of type==’INTERVENTION’ defining the intervention events.
predictor (EventGroup) – An EventGroup object of type==’EVENT_GROUP’ defining additional predictor events.
baseline (EventGroup) – An EventGroup object of type==’EVENT_GROUP’ defining additional baseline events.
- MODEL_TYPES = ['TREATED', 'UPLIFT', 'AB', 'AB_TOT', 'PRE_POST', 'DIFF_IN_DIFF', 'AB_DIFF', 'AB_TOT_DIFF', 'PRE_POST_PANEL', 'DIFF_IN_DIFF_PANEL', 'AB_DIFF_PANEL', 'AB_TOT_DIFF_PANEL', 'PRE_POST_CONTINUOUS', 'DIFF_IN_DIFF_CONTINUOUS', 'AB_DIFF_CONTINUOUS', 'AB_TOT_DIFF_CONTINUOUS']
- PANEL_TYPES = ('SIMPLE', 'CROSS', 'PANEL2', 'PANEL', 'CONTINUOUS')
- ab(duration)
Create a cross-sectional AB model. Given Outcome event group, compute the effect of an Intervention, assuming that the Intervention also defines Treatment.
- ab_diff(duration)
Create a two-time-period AB difference model. Given Outcome and Intervention event groups, estimate the effect of Treatment = Intervention, subsetting the sample to timelines with at least one intervention.
- ab_diff_continuous(continuous_rescale: Optional[float] = None)
- ab_diff_panel(duration)
- ab_tot(duration)
Create a cross-sectional AB Treatment-on-the-Treated (ToT) model. Given Outcome and Intervention event groups, estimate the effect of Treatment induced by the Intervention.
- ab_tot_diff(duration)
- ab_tot_diff_continuous(continuous_rescale: Optional[float] = None)
- ab_tot_diff_panel(duration)
- baseline(my_baseline)
Specify the baseline using a Baseline EventGroup object.
- Parameters
my_baseline (EventGroup) – A Baseline EventGroup object specifying predictor/baseline events.
- cluster_by(cluster_expression=None)
Define the cluster variable to return in order to compute clustered standard errors, either analytically or using the Bayesian bootstrap. The default is to use timeline.id.
- continuous()
- cross(post, duration, cohorts)
Create a model using generic cross-sectional panel sampling.
- diff_in_diff(duration)
Create a two-time-period Difference-in-Differences model. Given Outcome and Opportunity event groups, estimate the effect of Treatment, subsetting the sample to timelines with at least one opportunity.
- diff_in_diff_continuous(continuous_rescale: Optional[float] = None)
- diff_in_diff_panel(duration)
- get_baseline()
- get_cluster_by()
- get_intervention()
- get_name()
- get_opportunity()
- get_outcome()
- get_partition_by()
- get_predictor()
- get_treatment()
- get_type()
- intervention(my_intervention)
Specify the intervention using an Intervention EventGroup object.
- Parameters
my_intervention (EventGroup) – An Intervention EventGroup object specifying the intervention events.
- json()
- model(model_type)
- name(name)
Set the name of the Causmos model.
- observables(observables)
Set the model’s observables. This determines the groups of features that are generated for treatment models.
- Parameters
observables (list of str) –
List of which
SUM_OPPORTUNITIES()
feature transformations you would like to apply to the ‘opportunity_features’. List elements should be drawn from['KD','BKD','GKD','NKD']
:KD: Treatment (e.g., Impression/Dose); always included
BKD: Opportunity (e.g., Bids)
GKD: Ghost bids / Propensity Score / Expected win rate (requires ‘treatment_model’)
NKD: Control function / Endogeneity / Unexpected win rate (requires ‘treatment_model’)
- opportunity(my_opportunity)
Specify the opportunity using an Opportunity EventGroup object.
- Parameters
my_opportunity (EventGroup) – An Opportunity EventGroup object specifying the opportunity events.
- outcome(my_outcome: causmos.event_group.EventGroup)
Specify the outcome using an Outcome EventGroup object.
- Parameters
my_outcome (EventGroup) – An Outcome EventGroup object specifying the outcome events.
- panel()
- panel2(pre, post, duration, cohorts)
Create a model using generic two-time-period panel sampling.
- Parameters
pre – An expression that generates a list of pre-period samples.
post – An expression that generates a list of post-period samples.
duration – Time, in milliseconds, over which to aggregate features.
cohorts – A filter expression that defines the cohorts to be included.
- partition_by(partition_key_expression=None)
Define how to partition a model’s records. Default is to randomly partition using the random_partition() columnset’s defaults.
- pre_post(duration)
Create a two-time-period Pre-Post model. Given an Outcome event group, estimate the effect of Treatment, subsetting the sample to timelines with at least one treatment.
- pre_post_continuous(continuous_rescale: Optional[float] = None)
- pre_post_panel(duration)
- predictor(my_predictor)
Specify predictors using a Predictor EventGroup object.
- Parameters
my_predictor (EventGroup) – A Predictor EventGroup object specifying predictor/baseline events.
- simple(duration)
(Non-causal) Prediction model with time-bucket sampling of ‘duration’. Compatible with PREDICT(‘SIMPLE_MODEL_NAME’) but not with CAUSAL_EFFECT(‘SIMPLE_MODEL_NAME’) as there is no definition of treatment, only a definition of an Outcome event group.
For example, a daily sales-volume model where the level of sampling would be determined by the end of the day and outcomes.
- simple_continuous()
(Non-causal) Prediction model with event-level sampling. Compatible with PREDICT(‘SIMPLE_MODEL_NAME’) but not with CAUSAL_EFFECT(‘SIMPLE_MODEL_NAME’) as there is no definition of treatment, only a definition of an Outcome event group.
For example, a click-through rate (CTR) model where the level of sampling would be impression and the label would be a field or look-forward to an event recording a click event related to the impression event.
- tql()
Build and return the TQL query based on the choice of analysis method’s panel sampling.
- treated(duration)
Create a cross-sectional treated model. This will collect all users who are treated and create a model to predict their time-aggregated Outcomes.
- treatment(my_treatment)
Specify the treatment using an Treatment EventGroup object.
- Parameters
my_treatment (EventGroup) – A Treatment EventGroup object specifying the treatment events.
- type(type)
Set the model Analysis Method. See self.MODEL_TYPES.
- uplift(duration)
Create a cross-sectional uplift model. Given Outcome and Opportunity event groups, compute the effect of Treatment events by comparing timelines.
- class causmos.causmos_model.Continuous(start: Optional[str] = None, end: Optional[str] = None, interval: Optional[str] = None, sample: Optional[str] = None, cohorts: Optional[str] = None, heterogeneous: bool = False, truncate: bool = True, balance: bool = False, initialize: Optional[str] = None, trim: Optional[str] = None, continuous_rescale: Optional[float] = None)
Bases:
causmos.causmos_model.Panel
Create a continuous-time panel object. This usually includes multiple samples for outcome events (e.g., “positives”) and samples for non-outcome events (e.g., “negatives”) which reflect the measure of time against which the positives will be compared to compute conversion rates. For example, if the measure of non-outcome samples is 10 days and there are 5 outcome events, then the average outcome rate would be 0.5 outcomes per day, though a model would seek to explain why certain times and contexts lead to higher or lower outcome rates, especially a causal model would quantify the effects of treatment events on the outcomes.
See also help(Panel).
Create a Panel dataset object.
- Parameters
start (str) – A TQL string expression defining the start of a timeline’s sampling.
end (str) – A TQL string expression defining the end of a timeline’s sampling.
interval (str) – A TQL string expression defining the uniform spacing of the timeline’s sampling.
sample (str) – A TQL string expression defining the sample generator expression (list of timestamps or events).
duration (str) – A TQL string expression defining the duration of each of the sample timestamps/events.
cohorts (str) – A way to include additional cohorts and define their sampling time (e.g., untreated timelines do not have “treatment” so cannot be sampled pre/post treatment.
heterogeneous (bool) – include the columns for a heterogeneous effects model.
truncate (bool) – Rescale the sample’s records to account for the observation window. For example, if a timeline’s max timestamp is 4 days prior to the sample timestamp but the duration is 7 days, adjust the timestamp and duration to correspond to 4 days prior and 3 days, respectively.
balance (bool) – If the sample’s timestamp is outside [min_ts, max_ts] for a timeline, drop record (e.g., daily samples, but not all timelines span all sampled days).
initialize (str) – A TQL string expression defining the duration to trim at the start of a timeline. For example, if the sample’s timestamp is [min_ts + initialize, max_ts] for a timeline, drop the record.
trim (str) – A TQL string expression defining the duration to trim at both the start and end of a timeline. For example, if the sample’s timestamp is [min_ts + trim, max_ts - trim] for a timeline, drop the record.
- Returns
An object of class Panel for building longitudinal or panel datasets.
- continuous_rescale(continuous_rescale)
Continuous models require a rescale parameter to remove finite-sample bias.
- Parameters
continuous_rescale (float) – This rescale parameter should be set to a value such that the sum of the negative weights is much larger than the sum of the positives weights / continuous_rescale.
- get_continuous_rescale()
- class causmos.causmos_model.Cross(post, duration, cohorts: Optional[str] = None, heterogeneous: bool = False, truncate: bool = True, balance: bool = False, initialize: Optional[str] = None, trim: Optional[str] = None)
Bases:
causmos.causmos_model.Panel
Create a Cross-sectional panel object. This usually implies creating one sample per timeline. See also help(Panel).
Create a Panel dataset object.
- Parameters
start (str) – A TQL string expression defining the start of a timeline’s sampling.
end (str) – A TQL string expression defining the end of a timeline’s sampling.
interval (str) – A TQL string expression defining the uniform spacing of the timeline’s sampling.
sample (str) – A TQL string expression defining the sample generator expression (list of timestamps or events).
duration (str) – A TQL string expression defining the duration of each of the sample timestamps/events.
cohorts (str) – A way to include additional cohorts and define their sampling time (e.g., untreated timelines do not have “treatment” so cannot be sampled pre/post treatment.
heterogeneous (bool) – include the columns for a heterogeneous effects model.
truncate (bool) – Rescale the sample’s records to account for the observation window. For example, if a timeline’s max timestamp is 4 days prior to the sample timestamp but the duration is 7 days, adjust the timestamp and duration to correspond to 4 days prior and 3 days, respectively.
balance (bool) – If the sample’s timestamp is outside [min_ts, max_ts] for a timeline, drop record (e.g., daily samples, but not all timelines span all sampled days).
initialize (str) – A TQL string expression defining the duration to trim at the start of a timeline. For example, if the sample’s timestamp is [min_ts + initialize, max_ts] for a timeline, drop the record.
trim (str) – A TQL string expression defining the duration to trim at both the start and end of a timeline. For example, if the sample’s timestamp is [min_ts + trim, max_ts - trim] for a timeline, drop the record.
- Returns
An object of class Panel for building longitudinal or panel datasets.
- class causmos.causmos_model.DiscretePanel(start: Optional[str] = None, end: Optional[str] = None, interval: Optional[str] = None, sample: Optional[str] = None, duration: Optional[str] = None, cohorts: Optional[str] = None, heterogeneous: bool = False, truncate: bool = True, balance: bool = False, initialize: Optional[str] = None, trim: Optional[str] = None)
Bases:
causmos.causmos_model.Panel
A DiscretePanel object defines a sampling configuration for a longitudinal/panel model. See also help(Panel).
Create a DiscretePanel object.
- Parameters
start (str) –
end (str) –
interval (str) –
sample (str) –
duration (str) –
cohorts (str) –
heterogeneous (bool) –
truncate (bool) –
balance (bool) –
initialize (str) –
trim (str) –
- Returns
A DiscretePanel class object.
- class causmos.causmos_model.Panel(start: Optional[str] = None, end: Optional[str] = None, interval: Optional[str] = None, sample: Optional[str] = None, duration: Optional[str] = None, cohorts: Optional[str] = None, heterogeneous: bool = False, truncate: bool = True, balance: bool = False, initialize: Optional[str] = None, trim: Optional[str] = None)
Bases:
object
Create a Panel dataset object.
- Parameters
start (str) – A TQL string expression defining the start of a timeline’s sampling.
end (str) – A TQL string expression defining the end of a timeline’s sampling.
interval (str) – A TQL string expression defining the uniform spacing of the timeline’s sampling.
sample (str) – A TQL string expression defining the sample generator expression (list of timestamps or events).
duration (str) – A TQL string expression defining the duration of each of the sample timestamps/events.
cohorts (str) – A way to include additional cohorts and define their sampling time (e.g., untreated timelines do not have “treatment” so cannot be sampled pre/post treatment.
heterogeneous (bool) – include the columns for a heterogeneous effects model.
truncate (bool) – Rescale the sample’s records to account for the observation window. For example, if a timeline’s max timestamp is 4 days prior to the sample timestamp but the duration is 7 days, adjust the timestamp and duration to correspond to 4 days prior and 3 days, respectively.
balance (bool) – If the sample’s timestamp is outside [min_ts, max_ts] for a timeline, drop record (e.g., daily samples, but not all timelines span all sampled days).
initialize (str) – A TQL string expression defining the duration to trim at the start of a timeline. For example, if the sample’s timestamp is [min_ts + initialize, max_ts] for a timeline, drop the record.
trim (str) – A TQL string expression defining the duration to trim at both the start and end of a timeline. For example, if the sample’s timestamp is [min_ts + trim, max_ts - trim] for a timeline, drop the record.
- Returns
An object of class Panel for building longitudinal or panel datasets.
- daily()
Set the panel’s time sampling to daily, given start, end, and interval.
- get_heterogeneous()
- get_homogenous()
- get_options()
- heterogeneous(heterogeneous=True)
Configure a Causmos model to use event group ‘attributes’ to build additional columns.
- homogeneous(homogeneous=True)
Configure a Causmos model to ignore event group ‘attributes’ (only use the event group’s value or a constant) when building the model’s feature columns.
- hourly()
Set the panel’s time sampling to hourly, given start, end, and interval.
- options(truncate: Optional[bool] = None, balance: Optional[bool] = None, initialize: Optional[str] = None, trim: Optional[str] = None)
Update panel dataset options.
- Parameters
truncate (bool) – Rescale the sample’s records to account for the observation window. For example, if a timeline’s max timestamp is 4 days prior to the sample timestamp but the duration is 7 days, adjust the timestamp and duration to correspond to 4 days prior and 3 days, respectively.
balance (bool) – If the sample’s timestamp is outside [min_ts, max_ts] for a timeline, drop record (e.g., daily samples, but not all timelines span all sampled days).
initialize (str) – A TQL string expression defining the duration to trim at the start of a timeline. For example, if the sample’s timestamp is [min_ts + initialize, max_ts] for a timeline, drop the record.
trim (str) – A TQL string expression defining the duration to trim at both the start and end of a timeline. For example, if the sample’s timestamp is [min_ts + trim, max_ts - trim] for a timeline, drop the record.
- timestamp_range(start, end, interval)
Set the panel’s time sampling to evenly spaced intervals, given start, end, and interval.
- weekly()
Set the panel’s time sampling to weekly, given start, end, and interval.
- class causmos.causmos_model.Panel2(pre, post, duration: Optional[str] = None, cohorts: Optional[str] = None, heterogeneous: bool = False, truncate: bool = True, balance: bool = False, initialize: Optional[str] = None, trim: Optional[str] = None)
Bases:
causmos.causmos_model.Panel
Create a two time-period panel object. This usually implies a “pre” period and a “post” period, relative to some event. See also help(Panel).
Create a Panel dataset object.
- Parameters
start (str) – A TQL string expression defining the start of a timeline’s sampling.
end (str) – A TQL string expression defining the end of a timeline’s sampling.
interval (str) – A TQL string expression defining the uniform spacing of the timeline’s sampling.
sample (str) – A TQL string expression defining the sample generator expression (list of timestamps or events).
duration (str) – A TQL string expression defining the duration of each of the sample timestamps/events.
cohorts (str) – A way to include additional cohorts and define their sampling time (e.g., untreated timelines do not have “treatment” so cannot be sampled pre/post treatment.
heterogeneous (bool) – include the columns for a heterogeneous effects model.
truncate (bool) – Rescale the sample’s records to account for the observation window. For example, if a timeline’s max timestamp is 4 days prior to the sample timestamp but the duration is 7 days, adjust the timestamp and duration to correspond to 4 days prior and 3 days, respectively.
balance (bool) – If the sample’s timestamp is outside [min_ts, max_ts] for a timeline, drop record (e.g., daily samples, but not all timelines span all sampled days).
initialize (str) – A TQL string expression defining the duration to trim at the start of a timeline. For example, if the sample’s timestamp is [min_ts + initialize, max_ts] for a timeline, drop the record.
trim (str) – A TQL string expression defining the duration to trim at both the start and end of a timeline. For example, if the sample’s timestamp is [min_ts + trim, max_ts - trim] for a timeline, drop the record.
- Returns
An object of class Panel for building longitudinal or panel datasets.
- class causmos.causmos_model.Simple(sample, duration, cohorts: Optional[str] = None, heterogeneous: bool = False, truncate: bool = True, balance: bool = False, initialize: Optional[str] = None, trim: Optional[str] = None)
Bases:
causmos.causmos_model.Panel
Create a Panel dataset object.
- Parameters
start (str) – A TQL string expression defining the start of a timeline’s sampling.
end (str) – A TQL string expression defining the end of a timeline’s sampling.
interval (str) – A TQL string expression defining the uniform spacing of the timeline’s sampling.
sample (str) – A TQL string expression defining the sample generator expression (list of timestamps or events).
duration (str) – A TQL string expression defining the duration of each of the sample timestamps/events.
cohorts (str) – A way to include additional cohorts and define their sampling time (e.g., untreated timelines do not have “treatment” so cannot be sampled pre/post treatment.
heterogeneous (bool) – include the columns for a heterogeneous effects model.
truncate (bool) – Rescale the sample’s records to account for the observation window. For example, if a timeline’s max timestamp is 4 days prior to the sample timestamp but the duration is 7 days, adjust the timestamp and duration to correspond to 4 days prior and 3 days, respectively.
balance (bool) – If the sample’s timestamp is outside [min_ts, max_ts] for a timeline, drop record (e.g., daily samples, but not all timelines span all sampled days).
initialize (str) – A TQL string expression defining the duration to trim at the start of a timeline. For example, if the sample’s timestamp is [min_ts + initialize, max_ts] for a timeline, drop the record.
trim (str) – A TQL string expression defining the duration to trim at both the start and end of a timeline. For example, if the sample’s timestamp is [min_ts + trim, max_ts - trim] for a timeline, drop the record.
- Returns
An object of class Panel for building longitudinal or panel datasets.
- causmos.causmos_model.causmos(project_id: Optional[int] = None, name: Optional[str] = None, author: Optional[str] = None)
Initialize a Causmos model. See the Causmos class for more detailed documentation.
- Parameters
project_id (int) – Use timelines from this project_id.
name (str) – Name of the Causmos model that will be published.
author (str) – Author of the Causmos model.
- Returns
A Causmos model configuration object.
causmos.event_group module
- class causmos.event_group.EventGroup(where, value='1', name=None, type='EVENT_GROUP', project_id=None, generate=None, dedup=None, dynamics=None, attributes=None, vars: Optional[dict] = None)
Bases:
object
Creates an event group from TQL expressions. EventGroup objects also have a number of additional parameters that can be customized for modeling.
Create a new EventGroup object.
- Parameters
where (str) – a TQL expression string which subsets the timeline to this event group.
value (str) – a TQL expression string with the return value of the event for modeling.
name (str) – name of the event group. If not provided a default name will be assigned.
type (str) – optionally provide a type for the event group. If not provided, the column type will be EVENT_GROUP.
project_id (str) – project id from which to pull the timelines for modeling.
generate (str) – a SamplingSettings object or simple expression that can be coerced to generate events.
dedup (dict) – for now, a dictionary with several keys to configure deduplication. Future: a DedupEvents object. all (bool): uses all events or limits to the first event matching the “where” expression. zero_value (bool): skips any events whose ‘value’ expression is zero. where (str): skips any events which do not satisfy the expression for deduping. window (str): skips any events which have a non-skipped event within ‘window’ milliseconds prior. Default:
{'all': False, 'zero_value': False, 'where': "TRUE", 'window': "0"}
Example:{'all': True, 'zero_value': True, 'where': "TRUE", 'window': "7 * 24 * 3600 * 1000"}
.dynamics (dict) –
for now, a dictionary with ‘scale’, ‘shape’, ‘filters’, and ‘epsilon’. Future: a DynamicEvents object. Example:
{'scale': '5m,1h,4h,1d,3d,7d', 'shape': 'exponential', 'filters': {'min_total_count': 200}, 'epsilon': 1e-9}
. scale (str): the time scale over which an event may have predictive power or causal effects(default ‘5m,1h,4h,1d,3d,7d’).
shape (str): the decay function shape (default ‘exponential’). filters (dict): A default feature filters configuration dictionary. epsilon (float): The threshold value at which SUM_OPPORTUNITIES() rounds to zero.
attributes (str) – a list of expressions describing each event in the group for modeling heterogeneity. If columns with feature filter configurations are provided, the filter configurations are preserved when applying dynamic filters.
vars (dict) – a dictionary keyed with any additional ‘global_vars’, ‘timeline_vars’, or ‘event_vars’ required to evaluate any expression in the event group. See also Query.vars().
- Returns
an EventGroup object
- attributes(attributes)
Set the event group’s numerical() and categorical() attribute columns. These are used for heterogeneous effects and subgroup/slicing analyses of predictions and causal effects.
- dedup(dedup)
Set the dedup configuration. Default is {‘all’: True, ‘zero_value’: True, ‘where’: “TRUE”, ‘window’: “0”}. See help(EventGroup).
- describe()
Describe the event group using TQL’s query.describe().
- dynamics(dynamics)
Set the dynamics configuration. Default is {‘scale’: ‘3h,1d,7d’, ‘shape’: ‘exp’, ‘filters’: {‘min_total_count’: 200}, ‘epsilon’: 1e-9}. See help(EventGroup).
- generate(generate)
Attach a list of generate_events() configurations.
- Parameters
generate (list of SamplingConf) – List of generate_events sampling configurations.
- get_attributes()
- get_dedup()
- get_dynamics()
- get_generate()
- get_name()
- get_project_id()
- get_timeline_vars()
- get_type()
- get_value()
- get_vars()
- get_where()
- get_where_event()
- json()
Format this object into JSON.
- name(name)
Set the event group’s name.
- project_id(project_id)
Set the event group’s project.
- tql(apply_dedup=True)
Build a TQL query to inspect the event group object’s configuration.
- type(type)
Set the event group’s type. Options: ‘OUTCOME’, ‘TREATMENT’, ‘OPPORTUNITY’, ‘INTERVENTION’, ‘EVENT_GROUP’.
- validate()
Validate this EventGroup is configured correctly, checking that all the TQL expressions compile. Internally, this function utilizes self.tql().validate() and throws a TQLAnalysisException if not valid.
- value(expression)
Set the event group’s value column.
- vars(vars: dict)
Set additional TQL query.vars() for the expressions used in the event group. Useful for constructing complex expressions such as event groups that depend on other event groups.
- where(expression)
Set the .where() expression for the event group.
- causmos.event_group.intervention(where, value='1', name=None, project_id=None, generate=None, dedup=None, dynamics=None, attributes=None, vars: Optional[dict] = None)
Creates an event group from a TQL expression. EventGroup objects also have a number of additional parameters that can be customized for modeling.
- Parameters
where (str) – a TQL expression string which subsets the timeline to this event group.
value (str) – a TQL expression string with the return value of the event for modeling.
name (str) – name of the event group. If not provided a default name will be assigned.
project_id (str) – project id from which to pull the timelines for modeling.
generate (str) – a SamplingSettings object or simple expression that can be coerced to generate events.
dedup (dict) – for now, a dictionary with several keys to configure deduplication. Future: a DedupEvents object. all (bool): uses all events or limits to the first event matching the “where” expression. zero_value (bool): skips any events whose ‘value’ expression is zero. where (str): skips any events which do not satisfy the expression for deduping. window (str): skips any events which have a non-skipped event within ‘window’ milliseconds prior. Default:
{'all': False, 'zero_value': False, 'where': "TRUE", 'window': "0"}
Example:{'all': True, 'zero_value': True, 'where': "TRUE", 'window': "7 * 24 * 3600 * 1000"}
.dynamics (dict) –
for now, a dictionary with ‘scale’, ‘shape’, ‘filters’, and ‘epsilon’. Future: a DynamicEvents object. Example:
{'scale': '5m,1h,4h,1d,3d,7d', 'shape': 'exponential', 'filters': {'min_total_count': 200}, 'epsilon': 1e-9}
. scale (str): the time scale over which an event may have predictive power or causal effects(default ‘5m,1h,4h,1d,3d,7d’).
shape (str): the decay function shape (default ‘exponential’). filters (dict): A default feature filters configuration dictionary. epsilon (float): The threshold value at which SUM_OPPORTUNITIES() rounds to zero.
attributes (str) – a list of expressions describing each event in the group for modeling heterogeneity. If columns with feature filter configurations are provided, the filter configurations are preserved when applying dynamic filters.
vars (dict) – a dictionary keyed with any additional ‘global_vars’, ‘timeline_vars’, or ‘event_vars’ required to evaluate any expression in the event group. See also Query.vars().
- Returns
an EventGroup object
- causmos.event_group.opportunity(where: str, value: str = '1', name: Optional[str] = None, project_id: Optional[str] = None, generate=None, dedup: Optional[dict] = None, dynamics: Optional[dict] = None, attributes: Optional[list] = None, vars: Optional[dict] = None)
Creates an opportunity event group from a TQL expression. EventGroup objects also have a number of additional parameters that can be customized for modeling.
- Parameters
where (str) – a TQL expression string which subsets the timeline to this event group.
value (str) – a TQL expression string with the return value of the event for modeling.
name (str) – name of the event group. If not provided a default name will be assigned.
project_id (str) – project id from which to pull the timelines for modeling.
generate (str) – a SamplingSettings object or simple expression that can be coerced to generate events.
dedup (dict) – for now, a dictionary with several keys to configure deduplication. Future: a DedupEvents object. all (bool): uses all events or limits to the first event matching the “where” expression. zero_value (bool): skips any events whose ‘value’ expression is zero. where (str): skips any events which do not satisfy the expression for deduping. window (str): skips any events which have a non-skipped event within ‘window’ milliseconds prior. Default:
{'all': False, 'zero_value': False, 'where': "TRUE", 'window': "0"}
Example:{'all': True, 'zero_value': True, 'where': "TRUE", 'window': "7 * 24 * 3600 * 1000"}
.dynamics (dict) –
for now, a dictionary with ‘scale’, ‘shape’, ‘filters’, and ‘epsilon’. Future: a DynamicEvents object. Example:
{'scale': '5m,1h,4h,1d,3d,7d', 'shape': 'exponential', 'filters': {'min_total_count': 200}, 'epsilon': 1e-9}
. scale (str): the time scale over which an event may have predictive power or causal effects(default ‘5m,1h,4h,1d,3d,7d’).
shape (str): the decay function shape (default ‘exponential’). filters (dict): A default feature filters configuration dictionary. epsilon (float): The threshold value at which SUM_OPPORTUNITIES() rounds to zero.
attributes (list) – a list of expressions describing each event in the group for modeling heterogeneity. If columns with feature filter configurations are provided, the filter configurations are preserved when applying dynamic filters.
vars (dict) – a dictionary keyed with any additional ‘global_vars’, ‘timeline_vars’, or ‘event_vars’ required to evaluate any expression in the event group. See also Query.vars().
- Returns
an EventGroup object
- causmos.event_group.outcome(where, value='1', name=None, project_id=None, generate=None, dedup=None, dynamics=None, attributes=None, vars: Optional[dict] = None)
Creates an outcome event group from a TQL expression. EventGroup objects also have a number of additional parameters that can be customized for modeling.
- Parameters
where (str) – a TQL expression string which subsets the timeline to this event group.
value (str) – a TQL expression string with the return value of the event for modeling.
name (str) – name of the event group. If not provided a default name will be assigned.
project_id (str) – project id from which to pull the timelines for modeling.
generate (str) – a SamplingConf object or simple expression that can be coerced to generate events.
dedup (dict) – for now, a dictionary with several keys to configure deduplication. Future: a DedupEvents object. all (bool): uses all events or limits to the first event matching the “where” expression. zero_value (bool): skips any events whose ‘value’ expression is zero. where (str): skips any events which do not satisfy the expression for deduping. window (str): skips any events which have a non-skipped event within ‘window’ milliseconds prior. Default:
{'all': False, 'zero_value': False, 'where': "TRUE", 'window': "0"}
Example:{'all': True, 'zero_value': True, 'where': "TRUE", 'window': "7 * 24 * 3600 * 1000"}
.dynamics (dict) –
for now, a dictionary with ‘scale’, ‘shape’, ‘filters’, and ‘epsilon’. Future: a DynamicEvents object. Example:
{'scale': '5m,1h,4h,1d,3d,7d', 'shape': 'exponential', 'filters': {'min_total_count': 200}, 'epsilon': 1e-9}
. scale (str): the time scale over which an event may have predictive power or causal effects(default ‘5m,1h,4h,1d,3d,7d’).
shape (str): the decay function shape (default ‘exponential’). filters (dict): A default feature filters configuration dictionary. epsilon (float): The threshold value at which SUM_OPPORTUNITIES() rounds to zero.
attributes (str) – a list of expressions describing each event in the group for modeling heterogeneity. If columns with feature filter configurations are provided, the filter configurations are preserved when applying dynamic filters.
vars (dict) – a dictionary keyed with any additional ‘global_vars’, ‘timeline_vars’, or ‘event_vars’ required to evaluate any expression in the event group. See also Query.vars().
- Returns
an EventGroup object
- causmos.event_group.treatment(where, value='1', name=None, project_id=None, generate=None, dedup=None, dynamics=None, attributes=None, vars: Optional[dict] = None)
Creates a treatment event group from a TQL expression. EventGroup objects also have a number of additional parameters that can be customized for modeling.
- Parameters
where (str) – a TQL expression string which subsets the timeline to this event group.
value (str) – a TQL expression string with the return value of the event for modeling.
name (str) – name of the event group. If not provided a default name will be assigned.
project_id (str) – project id from which to pull the timelines for modeling.
generate (str) – a SamplingSettings object or simple expression that can be coerced to generate events.
dedup (dict) – for now, a dictionary with several keys to configure deduplication. Future: a DedupEvents object. all (bool): uses all events or limits to the first event matching the “where” expression. zero_value (bool): skips any events whose ‘value’ expression is zero. where (str): skips any events which do not satisfy the expression for deduping. window (str): skips any events which have a non-skipped event within ‘window’ milliseconds prior. Default:
{'all': False, 'zero_value': False, 'where': "TRUE", 'window': "0"}
Example:{'all': True, 'zero_value': True, 'where': "TRUE", 'window': "7 * 24 * 3600 * 1000"}
.dynamics (dict) –
for now, a dictionary with ‘scale’, ‘shape’, ‘filters’, and ‘epsilon’. Future: a DynamicEvents object. Example:
{'scale': '5m,1h,4h,1d,3d,7d', 'shape': 'exponential', 'filters': {'min_total_count': 200}, 'epsilon': 1e-9}
. scale (str): the time scale over which an event may have predictive power or causal effects(default ‘5m,1h,4h,1d,3d,7d’).
shape (str): the decay function shape (default ‘exponential’). filters (dict): A default feature filters configuration dictionary. epsilon (float): The threshold value at which SUM_OPPORTUNITIES() rounds to zero.
attributes (str) – a list of expressions describing each event in the group for modeling heterogeneity. If columns with feature filter configurations are provided, the filter configurations are preserved when applying dynamic filters.
vars (dict) – a dictionary keyed with any additional ‘global_vars’, ‘timeline_vars’, or ‘event_vars’ required to evaluate any expression in the event group. See also Query.vars().
- Returns
an EventGroup object
causmos.globals module
- causmos.globals.get_project()
- causmos.globals.set_project(p_id)
causmos.validation module
- causmos.validation.validate_attributes(attributes, group_name: Optional[str] = None)
Validate list of attributes. They must be either NUMERICAL or CATEGORICAL.
- causmos.validation.validate_dedup(dedup)
Validate a ‘dedup’ dictionary configuration.