lethe package
Subpackages
Submodules
lethe.config module
- lethe.config.base_config(days=15, users=1000)
Lethe draws all simulation parameters from a config object. This function returns an object populated with all required fields, which can be examined to see the options and used as a base to extend in other functions.
- Parameters
days – Days for the simulation to cover
users – Number of users to simulate
- Returns
a nested SimpleNamespace containing configs for the stages of simulation
- Return type
config
Examples
>>> config = lethe.config.base_config(days, users)
- Structure:
config.user_count: the number of users to consider in the simulation config.dates: the start and end date for the simulation The remaining top-level configs align with the stages of the Lethe simulation:
- config.user: user generation. Users are generated with features defined in the
config.user.features. The features _activity_group and _incrementality_group link to config.user.activity and config.user.incrementality, which control the organic and advertising-driven behavior of users in other phases of the simulation. Individuality can add random correlated cdf features for use in FBVMs.
- config.request: request generation. Exchange user activity is generated across the
simulation period controlled by config.user.activity and additional general activity patterns defined in config.request.shape. Features for the resulting requests are defined by config.request.features.
- config.activity: (organic) activity event generation. Site activity is generated like to
exchange activity, with features from config.activity.features.
- config.incremental: incremental activity event generation - won auctions result in impressions, which
lead to user site activity controlled by config.user.incrementality. Features for incremental activity events are controlled by config.incremental.features, which should align with config.activity.features
- config.auction: simulated auction on requests. The dynamics of the auction are controlled
by these parameters.
config.seed: random generator seeds are available at each simulation stage
- Related objects, implemented as namedtuples:
- Feature: parameters for features on users, requests, activities. Name and values should avoid
separator characters aside from underscores.
.name: column name in generated dataframe .values: list of value names that will appear in rows .occurrence: list of base probability of occurrence of each value .interactions: list of Interactions which influence occurrence probabilities .hidden: boolean flag for whether modeling may consider the feature
- Interaction: influences of other features on a feature
- .having: FV(.type, .name, .value, .negate): definition of the feature/value that leads
to this interaction.
.type: the data type of the other feature: user, request, activity, incremental .name: the other feature name .value: the other feature’s individual value .negate: False - match the value, True - match anything but the value
.affects: list of values in the owning feature whose occurrence probability is impacted .factor: list of scale factors to apply, matcing values in affects
- Activity: user-level traffic generation parameters
.fraction: list [request, activity] of probability of being active any given day .frequency: list [request, activity] of average frequency on active days. .lifetime_distribution: list [request, activity] of distribution of lifetime activity
variation across users. Called once per sim.
- .daily_distribution: list [request, activity] of distribution of daily activity variation
across users. Called each day. Sampling is weighted as lifetime * daily
- .weekday: list [0-5, 6-10, 11-13, 14-18, 19-23] of daypart sample scaling factors for
generating activity event times on weekdays. If the list does not sum to 1, fraction and frequency will be reduced.
- .weekend: list [0-5, 6-10, 11-13, 14-18, 19-23] of daypart sample scaling factors for
generating activity event times on weekends. If the list does not sum to 1, fraction and frequency will be reduced.
- Incrementality: user-level response to impressions
- .response: FBVM for probability of an interaction following an impression
adds special frequency_<period> features - a helper function frequency_vcos makes these easier to build
.frequency_features: list [Cap] capped frequency features to make available .delay_distribution: scipy stats type distribution for delay from impression to interaction
- Shape: general traffic shaping parameters (adjust user-leve specifics).
helper function unshaped() available
- fraction: list [weekday] of scales for fraction of users active each day relative
to the week average
- frequency: list [weekday] of scales for average user frequency each day relative
to the week average
hours: list [24 hours] scale for requests / activities each hour relative to average for the day fraction_fuzzer: callable to generate a specific number of users active from a nominal value. frequency_fuzzer: callable to generate a specific average frequency for a day given a
nominal value. eg: lambda x: np.random.normal(x, x * 0.05)
- FBVM: feature based value model - generate a value from a featured row
constant: constant term for the model feature_coefficients: list [FCO] of terms that adjust away from the constant, applied in order
.type: data type of feature - user, request, activity, incremental .name: name of feature .value_coefficients: list[VCO] of terms for the feature. match in order, one per FCO
.value: individual value of the feature .negate: False - match the value, True - match anything but the value .coefficient: coefficient to apply .operator: callable operator to use as value = operator(value, coefficient)
- Cap: frequency cap/reaction specification
period: duration of cap in seconds from triggering event setting: maximum value allowed in cap (in auction) or coefficient per value (in incrementality)
- lethe.config.extend_config(base, module, seed=1)
Generate a config object from a module of functions following a standard naming pattern. Starts with a base config object, then calls functions and assigns their results in place of the named part of the base config. Copies the original that base config object so it does not get changed.
- Parameters
on (base - base config object to build) –
functions (module - a module containing build) –
seed (seed = 1 - random) –
none (False for) –
- Returns
a nested SimpleNamespace containing configs for the stages of simulation
- Return type
config
Examples
>>> config = lethe.config.base_config(days=10, users=1000) >>> config = lethe.config.extend_config(config, lethe.reasonably_rich)
- functions called are, in order:
- returns any custom settings, will be added to config as settings:
settings(config)
- expected to treat config (updated along the way) as read-only and return results for assignment:
request_shape(config), {name}_activity_shape(config), user_features(config), {name}_request_features(config), {name}_activity_features(config),
{name}_incremental_features(config)
user_activity(config), {name}_user_incrementality(config), {name}_user_individuality(config), auction(config)
- expected to modify config in place
finalize(config)
Convention - make config parameter optional if you do not use it.
The idea is that you can write your own config generators to alter all or parts of an established (or base) config in structured notebooks rather than modifying an object ad-hoc in your notebook. eg. >>> config = lethe.config.base_config(days=10, users=1000) >>> config = lethe.config.extend_config(config, lethe.configs.reasonably_rich) >>> config = lethe.config.extend_config(config, my_module) >>> lethe.simulator.set_config(config) >>> lethe.simulator.simulate() >>> training_data = lethe.simulator.build_training_data()
- lethe.config.modify_config(base, operation)
Make changes to config objects using this method so they get added to build instructions. “config” is the base object
- lethe.config.print_config(config, parts={'activity', 'auction', 'base', 'incremental', 'request', 'seed', 'user'}, wrap_at=100)
Pretty printer for config objects
lethe.config_objects module
- class lethe.config_objects.Activity(fraction, frequency, lifetime_distribution, daily_distribution, correlation, weekday, weekend)
Bases:
tuple
Create new instance of Activity(fraction, frequency, lifetime_distribution, daily_distribution, correlation, weekday, weekend)
- property correlation
Alias for field number 4
- property daily_distribution
Alias for field number 3
- property fraction
Alias for field number 0
- property frequency
Alias for field number 1
- property lifetime_distribution
Alias for field number 2
- property weekday
Alias for field number 5
- property weekend
Alias for field number 6
- class lethe.config_objects.Cap(period, setting)
Bases:
tuple
Create new instance of Cap(period, setting)
- property period
Alias for field number 0
- property setting
Alias for field number 1
- class lethe.config_objects.FBVM(constant, feature_coefficients, transform)
Bases:
tuple
Create new instance of FBVM(constant, feature_coefficients, transform)
- property constant
Alias for field number 0
- property feature_coefficients
Alias for field number 1
- property transform
Alias for field number 2
- class lethe.config_objects.FCO(type, name, value_coefficients)
Bases:
tuple
Create new instance of FCO(type, name, value_coefficients)
- property name
Alias for field number 1
- property type
Alias for field number 0
- property value_coefficients
Alias for field number 2
- class lethe.config_objects.FV(type, name, value, negate)
Bases:
tuple
Create new instance of FV(type, name, value, negate)
- property name
Alias for field number 1
- property negate
Alias for field number 3
- property type
Alias for field number 0
- property value
Alias for field number 2
- class lethe.config_objects.Feature(name, values, occurrence, interactions, hidden)
Bases:
tuple
Create new instance of Feature(name, values, occurrence, interactions, hidden)
Alias for field number 4
- property interactions
Alias for field number 3
- property name
Alias for field number 0
- property occurrence
Alias for field number 2
- property values
Alias for field number 1
- class lethe.config_objects.Incrementality(response, frequency_features, delay_distribution)
Bases:
tuple
Create new instance of Incrementality(response, frequency_features, delay_distribution)
- property delay_distribution
Alias for field number 2
- property frequency_features
Alias for field number 1
- property response
Alias for field number 0
- class lethe.config_objects.Interaction(having, affects, factor)
Bases:
tuple
Create new instance of Interaction(having, affects, factor)
- property affects
Alias for field number 1
- property factor
Alias for field number 2
- property having
Alias for field number 0
- class lethe.config_objects.Shape(fraction, frequency, hours, fraction_fuzzer, frequency_fuzzer)
Bases:
tuple
Create new instance of Shape(fraction, frequency, hours, fraction_fuzzer, frequency_fuzzer)
- property fraction
Alias for field number 0
- property fraction_fuzzer
Alias for field number 3
- property frequency
Alias for field number 1
- property frequency_fuzzer
Alias for field number 4
- property hours
Alias for field number 2
- class lethe.config_objects.VCO(value, negate, coefficient, operator)
Bases:
tuple
Create new instance of VCO(value, negate, coefficient, operator)
- property coefficient
Alias for field number 2
- property negate
Alias for field number 1
- property operator
Alias for field number 3
- property value
Alias for field number 0
- lethe.config_objects.config_from_build_py(build_py)
- lethe.config_objects.constant_draw(value)
- lethe.config_objects.frequency_fco_cap(period, coeffs, op=<built-in function mul>)
- lethe.config_objects.full_fco(feature_type, feature, coeffs, op)
- lethe.config_objects.unshaped()
lethe.core module
- lethe.core.auction(config, users, requests)
- lethe.core.generate_incrementals(config, wins, users, requests)
- lethe.core.generate_organic(config, all_users)
- lethe.core.generate_users(config)
lethe.lethe module
- lethe.lethe.build_training_data(re_id=False)
Generate public (training) data from a simulation. Training data strips out internal simulator state and truth signals to look like log-level data. re_id: generate new indices so there is no leakage of hidden data through id fields.
recommend just not training on them.
- lethe.lethe.generate_lethe(sim, dates, users, randomize_users)
generate lethe simulation data
- lethe.lethe.get_config()
Get the config of the current simulation
- lethe.lethe.get_simulation_data()
Get the previously-built simulation data
- lethe.lethe.get_training_data()
Get the previously-built training data
- lethe.lethe.list_saves()
Return a list of simulations in the bucket
- lethe.lethe.load(run_name)
Load the simulation config and state for run_name from the bucket
- lethe.lethe.load_training_data(name=None)
Load training data from this simulation, or clear the simulation and load just the training data for the provided name. Returns the training data
- lethe.lethe.purge(run_name, labels=None)
Purge a simulation from the bucket
- lethe.lethe.save(run_name, overwrite=None)
Save the simulation config and state for run_name to the bucket
- lethe.lethe.save_training_data(partition=None, compress=None, overwrite=None, name=None)
Save training data from the simulation, possibly under a different name
- lethe.lethe.set_config(config)
Set the configuration for the simulator. This clears internal state for a new simulation
- lethe.lethe.simulate(stages={'auction', 'incremental', 'organic', 'user'})
Run a full or partial simulation based on current config. Persists results to simulator state.
- Parameters
{'user' (stages =) –
'organic' –
'auction' –
'incremental' –
'public'} –
returns simulation_data
lethe.storage module
- class lethe.storage.TmpBackedFile(return_path=False)
Bases:
object
- close()
- open(file_path, rw='r')
- tmp_path = '/tmp'
- lethe.storage.list_saves()
- lethe.storage.load_state(state)
Build a new simulator from a saved dump
- lethe.storage.load_training(run_name)
- lethe.storage.purge(run_name, labels=['state', 'training'])
- lethe.storage.save_state(state, overwrite=False)
Save config and current state into bucket
- lethe.storage.save_training(run_name, training_data, partition=True, compress=False, overwrite=True)
- lethe.storage.set_path(path)
lethe.util module
- class lethe.util.CopulaCorrelator(length, covariance)
Bases:
object
Generates sets of random numbers related by a covariance matrix The correlated sets are built on initialization, and can be mapped to different distributions when retrieved Hat tip to @bscannell and https://twiecki.io/blog/2018/05/03/copulas/ for this.
methods: CopulaCorrelated(length, covariance)
generate the correlated samples in cdf space. the number of sets is determined by the size of the covariance matrix eg 10000, [[1.0, 0.5], prepares two sets of 10000 random choices with 50% correlation
[0.5, 1.0]]
- .values(index, distribution)
returns the values mapped through distribution.ppf()
- static corr2d(value)
- values(index, distribution=False)
- class lethe.util.FBVMEvaluator(fbvm, for_type)
Bases:
object
- class lethe.util.FrequencyCapper(cap)
Bases:
object
- capped(user_id, timestamp)
- capped_impressions(user_id)
- impress(user_id, timestamp)
- static required_features()
- class lethe.util.ProgressPrinter(message, digits=1)
Bases:
object
- finish(message=False, indent=2)
- indent = 0
- print_progress = True
- reset(mod_group)
- set_auction_scale(requests)
- set_impression_scale(wins)
- set_traffic_scale(user_frame, config, index)
- spaces(indent=0)
- update(message, mod_group='none', indent=2)
- class lethe.util.TimeFeature
Bases:
object
- classmethod builder(name)
- static hour(ts)
- static timestamp(ts)
- lethe.util.daypart_index(second)
- lethe.util.empty_typed_frame(cols)
- lethe.util.frame_for_features(features, base_frame, frames)
- lethe.util.generate_feature(feature, feature_on, frames)
- lethe.util.generate_traffic(users, date, shape, settings, settings_index)
- lethe.util.passthrough_optional(opt_list, passed)
- lethe.util.second_interpolator(dayparts, hours)
- lethe.util.second_weights(dayparts, hours)
- lethe.util.sum_to(sumto, arraylike)
- lethe.util.sum_to_one(arraylike)