synthesized3.model package#
- class synthesized3.model.ModelCollection#
Bases:
Mapping
[Tuple
[str
, …],Model
]Container of Models. Mapping between tuple of col names and Model.
- Args
models: Sequence of Models that are fit on either a single or multiple cols.
- __init__(models: Sequence[Model])#
- property models: List[Model]#
- fit(data: DatasetV2, epochs: int, steps_per_epoch: int, callbacks: List[Callback] | None = None, verbose: int = 1) ModelCollection #
- sample(num_rows: int, seed: int | None = None) Dict[str, Tensor] #
Subpackages#
- synthesized3.model.callbacks package
- synthesized3.model.models package
DeepTableModel
SamplingModel
EnumerationModel
- Submodules
- synthesized3.model.models.deep_table_model module
- synthesized3.model.models.deep_table_model_test module
- synthesized3.model.models.enumeration_model module
- synthesized3.model.models.enumeration_model_test module
- synthesized3.model.models.sampling_model module
- synthesized3.model.models.sampling_model_test module
Submodules#
synthesized3.model.model module#
synthesized3.model.model_collection module#
- class synthesized3.model.model_collection.ModelCollection#
Bases:
Mapping
[Tuple
[str
, …],Model
]Container of Models. Mapping between tuple of col names and Model.
- Args
models: Sequence of Models that are fit on either a single or multiple cols.
- __init__(models: Sequence[Model])#
- property models: List[Model]#
- fit(data: DatasetV2, epochs: int, steps_per_epoch: int, callbacks: List[Callback] | None = None, verbose: int = 1) ModelCollection #
- sample(num_rows: int, seed: int | None = None) Dict[str, Tensor] #
synthesized3.model.model_collection_test module#
synthesized3.model.model_factory module#
synthesized3.model.model_factory_test module#
synthesized3.model.model_override module#
- class synthesized3.model.model_override.ModelOverride#
Bases:
BaseModel
Structure for model override.
- model_type: str | Type[Model]#
- model_kwargs: Dict[str, Any] | None#
- classmethod validate_model(value)#
Validates that the model type is a valid model.
- class synthesized3.model.model_override.ModelOverrides#
Bases:
BaseModel
Dictionary mapping meta name(s) to model to override the default model for the meta(s).
- overrides: Dict[str | Tuple[str, ...], ModelOverride]#
- classmethod validate_override_keys_are_unique(value)#
- static get_all_keys(overrides)#
- synthesized3.model.model_override.verify_and_extract_models(model_overrides: Mapping[Tuple[str] | str, Dict[str, Type[Model] | Dict[str, Any]]])#
Verify the model overrides and replace string model types with actual model types.
synthesized3.model.model_override_test module#
synthesized3.model.model_test module#
synthesized3.model.spark_training_wrapper module#
- class synthesized3.model.spark_training_wrapper.SparkTrainingWrapper#
Bases:
object
Wraps around a keras model training function and deploys it on a spark cluster.
- __init__(train_fn, rdd, num_workers)#
- class synthesized3.model.spark_training_wrapper.SparkTrainFunction#
Bases:
object
Creates a callable class that acts as a training function for a keras model.
- itertools = <module 'itertools' (built-in)>#
- json = <module 'json' from '/opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/json/__init__.py'>#
- math = <module 'math' from '/opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/lib-dynload/math.cpython-39-x86_64-linux-gnu.so'>#
- os = <module 'os' from '/opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/os.py'>#
- socket = <module 'socket' from '/opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/socket.py'>#
- class closing#
Bases:
AbstractContextManager
Context to automatically close something at the end of a block.
Code like this:
- with closing(<module>.open(<arguments>)) as f:
<block>
is equivalent to this:
f = <module>.open(<arguments>) try:
<block>
- finally:
f.close()
- __init__(thing)#
- class partial#
Bases:
object
partial(func, *args, **keywords) - new function with partial application of the given arguments and keywords.
- __new__(**kwargs)#
- args#
tuple of arguments to future partial calls
- func#
function object to use in future partial calls
- keywords#
dictionary of keyword arguments to future partial calls
- class BarrierTaskContext#
Bases:
TaskContext
A
TaskContext
with extra contextual info and tooling for tasks in a barrier stage. UseBarrierTaskContext.get()
to obtain the barrier context for a running barrier task.New in version 2.4.0.
Notes
This API is experimental
Examples
Set a barrier, and execute it with RDD.
>>> from pyspark import BarrierTaskContext >>> def block_and_do_something(itr): ... taskcontext = BarrierTaskContext.get() ... # Do something. ... ... # Wait until all tasks finished. ... taskcontext.barrier() ... ... return itr ... >>> rdd = spark.sparkContext.parallelize([1]) >>> rdd.barrier().mapPartitions(block_and_do_something).collect() [1]
- allGather(message: str = '') List[str] #
This function blocks until all tasks in the same stage have reached this routine. Each task passes in a message and returns with a list of all the messages passed in by each of those tasks.
New in version 3.0.0.
Notes
This API is experimental
In a barrier stage, each task much have the same number of barrier() calls, in all possible code branches. Otherwise, you may get the job hanging or a SparkException after timeout.
- barrier() None #
Sets a global barrier and waits until all tasks in this stage hit this barrier. Similar to MPI_Barrier function in MPI, this function blocks until all tasks in the same stage have reached this routine.
New in version 2.4.0.
Notes
This API is experimental
In a barrier stage, each task much have the same number of barrier() calls, in all possible code branches. Otherwise, you may get the job hanging or a SparkException after timeout.
- classmethod get() BarrierTaskContext #
Return the currently active
BarrierTaskContext
. This can be called inside of user functions to access contextual information about running tasks.Notes
Must be called on the worker, not the driver. Returns
None
if not initialized. An Exception will raise if it is not in a barrier stage.This API is experimental
- getTaskInfos() List[BarrierTaskInfo] #
Returns
BarrierTaskInfo
for all tasks in this barrier stage, ordered by partition ID.New in version 2.4.0.
Notes
This API is experimental
Examples
>>> from pyspark import BarrierTaskContext >>> rdd = spark.sparkContext.parallelize([1]) >>> barrier_info = rdd.barrier().mapPartitions( ... lambda _: [BarrierTaskContext.get().getTaskInfos()]).collect()[0][0] >>> barrier_info.address '...:...'
- warnings_utils = <module 'synthesized3.utils.warnings_utils' from '/home/runner/work/sdk/sdk/src/synthesized3/utils/warnings_utils.py'>#
- __init__(model, signature, dataset_length, process_data_row, schema, transformer_collection, meta_collection, batch_size, steps_per_epoch, epochs, callbacks, verbose)#
- set_tf_config(context)#
- static generate_data(dataset, process_data_row, schema, meta_collection)#
- synthesized3.model.spark_training_wrapper.fit_model_on_spark(model, data_interface, transformer_collection, meta_collection, batch_size: int, epochs: int, steps_per_epoch: int, num_workers: int, callbacks: List[Callback] | None = None, verbose: int = 0)#
Train a model using Spark cluster