synthesized3.model package#

class synthesized3.model.ModelCollection#

Bases: Mapping[Tuple[str, …], Model]

Container of Models. Mapping between tuple of col names and Model.

Args: models: Sequence of Models that are fit on either a single or multiple cols.

__init__(models: Sequence[Model])#

property models: List[Model]#

fit(data: DatasetV2, epochs: int, steps_per_epoch: int, callbacks: List[Callback] | None = None, verbose: int = 1) → ModelCollection#

sample(num_rows: int, seed: int | None = None) → Dict[str, Tensor]#

Subpackages#

Submodules#

synthesized3.model.model module#

synthesized3.model.model_collection module#

class synthesized3.model.model_collection.ModelCollection#

Bases: Mapping[Tuple[str, …], Model]

Container of Models. Mapping between tuple of col names and Model.

Args: models: Sequence of Models that are fit on either a single or multiple cols.

__init__(models: Sequence[Model])#

property models: List[Model]#

fit(data: DatasetV2, epochs: int, steps_per_epoch: int, callbacks: List[Callback] | None = None, verbose: int = 1) → ModelCollection#

sample(num_rows: int, seed: int | None = None) → Dict[str, Tensor]#

synthesized3.model.model_collection_test module#

synthesized3.model.model_factory module#

synthesized3.model.model_factory_test module#

synthesized3.model.model_override module#

class synthesized3.model.model_override.ModelOverride#

Bases: BaseModel

Structure for model override.

class Config#

Bases: object

extra = 'forbid'#

model_type: str | Type[Model]#

model_kwargs: Dict[str, Any] | None#

classmethod validate_model(value)#: Validates that the model type is a valid model.

class synthesized3.model.model_override.ModelOverrides#

Bases: BaseModel

Dictionary mapping meta name(s) to model to override the default model for the meta(s).

overrides: Dict[str | Tuple[str, ...], ModelOverride]#

classmethod validate_override_keys_are_unique(value)#

static get_all_keys(overrides)#

synthesized3.model.model_override.verify_and_extract_models(model_overrides: Mapping[Tuple[str] | str, Dict[str, Type[Model] | Dict[str, Any]]])#: Verify the model overrides and replace string model types with actual model types.

synthesized3.model.model_override_test module#

synthesized3.model.model_test module#

synthesized3.model.spark_training_wrapper module#

class synthesized3.model.spark_training_wrapper.SparkTrainingWrapper#

Bases: object

Wraps around a keras model training function and deploys it on a spark cluster.

__init__(train_fn, rdd, num_workers)#

class synthesized3.model.spark_training_wrapper.SparkTrainFunction#

Bases: object

Creates a callable class that acts as a training function for a keras model.

itertools = <module 'itertools' (built-in)>#

json = <module 'json' from '/opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/json/__init__.py'>#

math = <module 'math' from '/opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/lib-dynload/math.cpython-39-x86_64-linux-gnu.so'>#

os = <module 'os' from '/opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/os.py'>#

socket = <module 'socket' from '/opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/socket.py'>#

class closing#

Bases: AbstractContextManager

Context to automatically close something at the end of a block.

Code like this:

with closing(<module>.open(<arguments>)) as f:
<block>

is equivalent to this:

f = <module>.open(<arguments>) try:

<block>

finally:
f.close()

__init__(thing)#

class partial#

Bases: object

partial(func, *args, **keywords) - new function with partial application of the given arguments and keywords.

__new__(**kwargs)#

args#: tuple of arguments to future partial calls

func#: function object to use in future partial calls

keywords#: dictionary of keyword arguments to future partial calls

class BarrierTaskContext#

Bases: TaskContext

A TaskContext with extra contextual info and tooling for tasks in a barrier stage. Use BarrierTaskContext.get() to obtain the barrier context for a running barrier task.

New in version 2.4.0.

Notes

This API is experimental

Examples

Set a barrier, and execute it with RDD.

>>> from pyspark import BarrierTaskContext
>>> def block_and_do_something(itr):
...     taskcontext = BarrierTaskContext.get()
...     # Do something.
...
...     # Wait until all tasks finished.
...     taskcontext.barrier()
...
...     return itr
...
>>> rdd = spark.sparkContext.parallelize([1])
>>> rdd.barrier().mapPartitions(block_and_do_something).collect()
[1]

allGather(message: str = '') → List[str]#

This function blocks until all tasks in the same stage have reached this routine. Each task passes in a message and returns with a list of all the messages passed in by each of those tasks.

New in version 3.0.0.

Notes

This API is experimental

In a barrier stage, each task much have the same number of barrier() calls, in all possible code branches. Otherwise, you may get the job hanging or a SparkException after timeout.

barrier() → None#

Sets a global barrier and waits until all tasks in this stage hit this barrier. Similar to MPI_Barrier function in MPI, this function blocks until all tasks in the same stage have reached this routine.

New in version 2.4.0.

Notes

This API is experimental

In a barrier stage, each task much have the same number of barrier() calls, in all possible code branches. Otherwise, you may get the job hanging or a SparkException after timeout.

classmethod get() → BarrierTaskContext#

Return the currently active BarrierTaskContext. This can be called inside of user functions to access contextual information about running tasks.

Notes

Must be called on the worker, not the driver. Returns None if not initialized. An Exception will raise if it is not in a barrier stage.

This API is experimental

getTaskInfos() → List[BarrierTaskInfo]#

Returns BarrierTaskInfo for all tasks in this barrier stage, ordered by partition ID.

New in version 2.4.0.

Notes

This API is experimental

Examples

>>> from pyspark import BarrierTaskContext
>>> rdd = spark.sparkContext.parallelize([1])
>>> barrier_info = rdd.barrier().mapPartitions(
...     lambda _: [BarrierTaskContext.get().getTaskInfos()]).collect()[0][0]
>>> barrier_info.address
'...:...'

warnings_utils = <module 'synthesized3.utils.warnings_utils' from '/home/runner/work/sdk/sdk/src/synthesized3/utils/warnings_utils.py'>#

__init__(model, signature, dataset_length, process_data_row, schema, transformer_collection, meta_collection, batch_size, steps_per_epoch, epochs, callbacks, verbose)#

set_tf_config(context)#

static generate_data(dataset, process_data_row, schema, meta_collection)#

synthesized3.model.spark_training_wrapper.fit_model_on_spark(model, data_interface, transformer_collection, meta_collection, batch_size: int, epochs: int, steps_per_epoch: int, num_workers: int, callbacks: List[Callback] | None = None, verbose: int = 0)#: Train a model using Spark cluster