lethe package

Subpackages

Submodules

lethe.config module

lethe.config.base_config(days=15, users=1000)

Lethe draws all simulation parameters from a config object. This function returns an object populated with all required fields, which can be examined to see the options and used as a base to extend in other functions.

Parameters
  • days – Days for the simulation to cover

  • users – Number of users to simulate

Returns

a nested SimpleNamespace containing configs for the stages of simulation

Return type

config

Examples

>>> config = lethe.config.base_config(days, users)
Structure:

config.user_count: the number of users to consider in the simulation config.dates: the start and end date for the simulation The remaining top-level configs align with the stages of the Lethe simulation:

config.user: user generation. Users are generated with features defined in the

config.user.features. The features _activity_group and _incrementality_group link to config.user.activity and config.user.incrementality, which control the organic and advertising-driven behavior of users in other phases of the simulation. Individuality can add random correlated cdf features for use in FBVMs.

config.request: request generation. Exchange user activity is generated across the

simulation period controlled by config.user.activity and additional general activity patterns defined in config.request.shape. Features for the resulting requests are defined by config.request.features.

config.activity: (organic) activity event generation. Site activity is generated like to

exchange activity, with features from config.activity.features.

config.incremental: incremental activity event generation - won auctions result in impressions, which

lead to user site activity controlled by config.user.incrementality. Features for incremental activity events are controlled by config.incremental.features, which should align with config.activity.features

config.auction: simulated auction on requests. The dynamics of the auction are controlled

by these parameters.

config.seed: random generator seeds are available at each simulation stage

Related objects, implemented as namedtuples:
Feature: parameters for features on users, requests, activities. Name and values should avoid

separator characters aside from underscores.

.name: column name in generated dataframe .values: list of value names that will appear in rows .occurrence: list of base probability of occurrence of each value .interactions: list of Interactions which influence occurrence probabilities .hidden: boolean flag for whether modeling may consider the feature

Interaction: influences of other features on a feature
.having: FV(.type, .name, .value, .negate): definition of the feature/value that leads

to this interaction.

.type: the data type of the other feature: user, request, activity, incremental .name: the other feature name .value: the other feature’s individual value .negate: False - match the value, True - match anything but the value

.affects: list of values in the owning feature whose occurrence probability is impacted .factor: list of scale factors to apply, matcing values in affects

Activity: user-level traffic generation parameters

.fraction: list [request, activity] of probability of being active any given day .frequency: list [request, activity] of average frequency on active days. .lifetime_distribution: list [request, activity] of distribution of lifetime activity

variation across users. Called once per sim.

.daily_distribution: list [request, activity] of distribution of daily activity variation

across users. Called each day. Sampling is weighted as lifetime * daily

.weekday: list [0-5, 6-10, 11-13, 14-18, 19-23] of daypart sample scaling factors for

generating activity event times on weekdays. If the list does not sum to 1, fraction and frequency will be reduced.

.weekend: list [0-5, 6-10, 11-13, 14-18, 19-23] of daypart sample scaling factors for

generating activity event times on weekends. If the list does not sum to 1, fraction and frequency will be reduced.

Incrementality: user-level response to impressions
.response: FBVM for probability of an interaction following an impression

adds special frequency_<period> features - a helper function frequency_vcos makes these easier to build

.frequency_features: list [Cap] capped frequency features to make available .delay_distribution: scipy stats type distribution for delay from impression to interaction

Shape: general traffic shaping parameters (adjust user-leve specifics).

helper function unshaped() available

fraction: list [weekday] of scales for fraction of users active each day relative

to the week average

frequency: list [weekday] of scales for average user frequency each day relative

to the week average

hours: list [24 hours] scale for requests / activities each hour relative to average for the day fraction_fuzzer: callable to generate a specific number of users active from a nominal value. frequency_fuzzer: callable to generate a specific average frequency for a day given a

nominal value. eg: lambda x: np.random.normal(x, x * 0.05)

FBVM: feature based value model - generate a value from a featured row

constant: constant term for the model feature_coefficients: list [FCO] of terms that adjust away from the constant, applied in order

.type: data type of feature - user, request, activity, incremental .name: name of feature .value_coefficients: list[VCO] of terms for the feature. match in order, one per FCO

.value: individual value of the feature .negate: False - match the value, True - match anything but the value .coefficient: coefficient to apply .operator: callable operator to use as value = operator(value, coefficient)

Cap: frequency cap/reaction specification

period: duration of cap in seconds from triggering event setting: maximum value allowed in cap (in auction) or coefficient per value (in incrementality)

lethe.config.extend_config(base, module, seed=1)

Generate a config object from a module of functions following a standard naming pattern. Starts with a base config object, then calls functions and assigns their results in place of the named part of the base config. Copies the original that base config object so it does not get changed.

Parameters
  • on (base - base config object to build) –

  • functions (module - a module containing build) –

  • seed (seed = 1 - random) –

  • none (False for) –

Returns

a nested SimpleNamespace containing configs for the stages of simulation

Return type

config

Examples

>>> config = lethe.config.base_config(days=10, users=1000)
>>> config = lethe.config.extend_config(config, lethe.reasonably_rich)
functions called are, in order:
returns any custom settings, will be added to config as settings:

settings(config)

expected to treat config (updated along the way) as read-only and return results for assignment:

request_shape(config), {name}_activity_shape(config), user_features(config), {name}_request_features(config), {name}_activity_features(config),

{name}_incremental_features(config)

user_activity(config), {name}_user_incrementality(config), {name}_user_individuality(config), auction(config)

expected to modify config in place

finalize(config)

Convention - make config parameter optional if you do not use it.

The idea is that you can write your own config generators to alter all or parts of an established (or base) config in structured notebooks rather than modifying an object ad-hoc in your notebook. eg. >>> config = lethe.config.base_config(days=10, users=1000) >>> config = lethe.config.extend_config(config, lethe.configs.reasonably_rich) >>> config = lethe.config.extend_config(config, my_module) >>> lethe.simulator.set_config(config) >>> lethe.simulator.simulate() >>> training_data = lethe.simulator.build_training_data()

lethe.config.modify_config(base, operation)

Make changes to config objects using this method so they get added to build instructions. “config” is the base object

lethe.config.print_config(config, parts={'activity', 'auction', 'base', 'incremental', 'request', 'seed', 'user'}, wrap_at=100)

Pretty printer for config objects

lethe.config_objects module

class lethe.config_objects.Activity(fraction, frequency, lifetime_distribution, daily_distribution, correlation, weekday, weekend)

Bases: tuple

Create new instance of Activity(fraction, frequency, lifetime_distribution, daily_distribution, correlation, weekday, weekend)

property correlation

Alias for field number 4

property daily_distribution

Alias for field number 3

property fraction

Alias for field number 0

property frequency

Alias for field number 1

property lifetime_distribution

Alias for field number 2

property weekday

Alias for field number 5

property weekend

Alias for field number 6

class lethe.config_objects.Cap(period, setting)

Bases: tuple

Create new instance of Cap(period, setting)

property period

Alias for field number 0

property setting

Alias for field number 1

class lethe.config_objects.FBVM(constant, feature_coefficients, transform)

Bases: tuple

Create new instance of FBVM(constant, feature_coefficients, transform)

property constant

Alias for field number 0

property feature_coefficients

Alias for field number 1

property transform

Alias for field number 2

class lethe.config_objects.FCO(type, name, value_coefficients)

Bases: tuple

Create new instance of FCO(type, name, value_coefficients)

property name

Alias for field number 1

property type

Alias for field number 0

property value_coefficients

Alias for field number 2

class lethe.config_objects.FV(type, name, value, negate)

Bases: tuple

Create new instance of FV(type, name, value, negate)

property name

Alias for field number 1

property negate

Alias for field number 3

property type

Alias for field number 0

property value

Alias for field number 2

class lethe.config_objects.Feature(name, values, occurrence, interactions, hidden)

Bases: tuple

Create new instance of Feature(name, values, occurrence, interactions, hidden)

property hidden

Alias for field number 4

property interactions

Alias for field number 3

property name

Alias for field number 0

property occurrence

Alias for field number 2

property values

Alias for field number 1

class lethe.config_objects.Incrementality(response, frequency_features, delay_distribution)

Bases: tuple

Create new instance of Incrementality(response, frequency_features, delay_distribution)

property delay_distribution

Alias for field number 2

property frequency_features

Alias for field number 1

property response

Alias for field number 0

class lethe.config_objects.Interaction(having, affects, factor)

Bases: tuple

Create new instance of Interaction(having, affects, factor)

property affects

Alias for field number 1

property factor

Alias for field number 2

property having

Alias for field number 0

class lethe.config_objects.Shape(fraction, frequency, hours, fraction_fuzzer, frequency_fuzzer)

Bases: tuple

Create new instance of Shape(fraction, frequency, hours, fraction_fuzzer, frequency_fuzzer)

property fraction

Alias for field number 0

property fraction_fuzzer

Alias for field number 3

property frequency

Alias for field number 1

property frequency_fuzzer

Alias for field number 4

property hours

Alias for field number 2

class lethe.config_objects.VCO(value, negate, coefficient, operator)

Bases: tuple

Create new instance of VCO(value, negate, coefficient, operator)

property coefficient

Alias for field number 2

property negate

Alias for field number 1

property operator

Alias for field number 3

property value

Alias for field number 0

lethe.config_objects.config_from_build_py(build_py)
lethe.config_objects.constant_draw(value)
lethe.config_objects.frequency_fco_cap(period, coeffs, op=<built-in function mul>)
lethe.config_objects.full_fco(feature_type, feature, coeffs, op)
lethe.config_objects.unshaped()

lethe.core module

lethe.core.auction(config, users, requests)
lethe.core.generate_incrementals(config, wins, users, requests)
lethe.core.generate_organic(config, all_users)
lethe.core.generate_users(config)

lethe.lethe module

lethe.lethe.build_training_data(re_id=False)

Generate public (training) data from a simulation. Training data strips out internal simulator state and truth signals to look like log-level data. re_id: generate new indices so there is no leakage of hidden data through id fields.

recommend just not training on them.

lethe.lethe.generate_lethe(sim, dates, users, randomize_users)

generate lethe simulation data

lethe.lethe.get_config()

Get the config of the current simulation

lethe.lethe.get_simulation_data()

Get the previously-built simulation data

lethe.lethe.get_training_data()

Get the previously-built training data

lethe.lethe.list_saves()

Return a list of simulations in the bucket

lethe.lethe.load(run_name)

Load the simulation config and state for run_name from the bucket

lethe.lethe.load_training_data(name=None)

Load training data from this simulation, or clear the simulation and load just the training data for the provided name. Returns the training data

lethe.lethe.purge(run_name, labels=None)

Purge a simulation from the bucket

lethe.lethe.save(run_name, overwrite=None)

Save the simulation config and state for run_name to the bucket

lethe.lethe.save_training_data(partition=None, compress=None, overwrite=None, name=None)

Save training data from the simulation, possibly under a different name

lethe.lethe.set_config(config)

Set the configuration for the simulator. This clears internal state for a new simulation

lethe.lethe.simulate(stages={'auction', 'incremental', 'organic', 'user'})

Run a full or partial simulation based on current config. Persists results to simulator state.

Parameters
  • {'user' (stages =) –

  • 'organic'

  • 'auction'

  • 'incremental'

  • 'public'}

returns simulation_data

lethe.storage module

class lethe.storage.TmpBackedFile(return_path=False)

Bases: object

close()
open(file_path, rw='r')
tmp_path = '/tmp'
lethe.storage.list_saves()
lethe.storage.load_state(state)

Build a new simulator from a saved dump

lethe.storage.load_training(run_name)
lethe.storage.purge(run_name, labels=['state', 'training'])
lethe.storage.save_state(state, overwrite=False)

Save config and current state into bucket

lethe.storage.save_training(run_name, training_data, partition=True, compress=False, overwrite=True)
lethe.storage.set_path(path)

lethe.util module

class lethe.util.CopulaCorrelator(length, covariance)

Bases: object

Generates sets of random numbers related by a covariance matrix The correlated sets are built on initialization, and can be mapped to different distributions when retrieved Hat tip to @bscannell and https://twiecki.io/blog/2018/05/03/copulas/ for this.

methods: CopulaCorrelated(length, covariance)

generate the correlated samples in cdf space. the number of sets is determined by the size of the covariance matrix eg 10000, [[1.0, 0.5], prepares two sets of 10000 random choices with 50% correlation

[0.5, 1.0]]

.values(index, distribution)

returns the values mapped through distribution.ppf()

static corr2d(value)
values(index, distribution=False)
class lethe.util.FBVMEvaluator(fbvm, for_type)

Bases: object

class lethe.util.FrequencyCapper(cap)

Bases: object

capped(user_id, timestamp)
capped_impressions(user_id)
impress(user_id, timestamp)
static required_features()
class lethe.util.GhostBidder(ghost_pct, ghost_key)

Bases: object

evaluate(row)
class lethe.util.ProgressPrinter(message, digits=1)

Bases: object

finish(message=False, indent=2)
indent = 0
print_progress = True
reset(mod_group)
set_auction_scale(requests)
set_impression_scale(wins)
set_traffic_scale(user_frame, config, index)
spaces(indent=0)
update(message, mod_group='none', indent=2)
class lethe.util.TimeFeature

Bases: object

classmethod builder(name)
static hour(ts)
static timestamp(ts)
lethe.util.daypart_index(second)
lethe.util.empty_typed_frame(cols)
lethe.util.frame_for_features(features, base_frame, frames)
lethe.util.generate_feature(feature, feature_on, frames)
lethe.util.generate_traffic(users, date, shape, settings, settings_index)
lethe.util.passthrough_optional(opt_list, passed)
lethe.util.second_interpolator(dayparts, hours)
lethe.util.second_weights(dayparts, hours)
lethe.util.sum_to(sumto, arraylike)
lethe.util.sum_to_one(arraylike)