`scalarstop.datablob`¶

Group together and name your training, validation, and test sets.

The classes in this module are used to group together data into training, validation, and test sets used for training machine learning models. We also record the hyperparameters used to process the dataset.

The DataBlob subclass name and hyperparameters are used to create a unique content-addressable name that makes it easy to keep track of many datasets at once.

Module Contents¶

Classes¶

`DataBlobBase`	The abstract base class describing the properties common to all DataBlobs.
`DataBlob`	Subclass this to group your training, validation, and test sets for training machine learning models.
`DataFrameDataBlob`	Subclass this to transform a `pandas.DataFrame` into your training, validation, and test sets.
`AppendDataBlob`	Subclass this to create a new `DataBlob` that extends an existing `DataBlob`.
`DistributedDataBlob`	Wraps a `DataBlob` to create a TensorFlow `tf.distribute.DistributedDataset`.

class DataBlobBase(*, hyperparams: Optional[Union[Mapping[str, Any], scalarstop.hyperparams.HyperparamsType]] = None, **kwargs)¶

Bases: scalarstop._single_namespace.SingleNamespace

The abstract base class describing the properties common to all DataBlobs.

Parameters: hyperparams – The hyperparameters to initialize this class with.

Hyperparams :Type[scalarstop.hyperparams.HyperparamsType]¶

set_training(self) → Any¶: Creates and returns a new object representing the training set.

property training(self) → Any¶: An object representing the training set.

set_validation(self) → Any¶: Creates and returns a new object representing the validation set.

property validation(self) → Any¶: An object representing the validation set.

set_test(self) → Any¶: Creates and returns a new object representing the test set.

property test(self) → Any¶: An object representing the test set.

classmethod calculate_name(cls, *, hyperparams: Optional[Union[Mapping[str, Any], scalarstop.hyperparams.HyperparamsType]] = None) → str¶

Calculate the hashed name of this object, given the hyperparameters.

This classmethod can be used to calculate what an object would be without actually having to call __init__().

property group_name(self) → str¶: The “group” name is this object’s Python class name.

property name(self) → str¶: The group (class) name and a calculated hash of the hyperparameters.

property hyperparams(self) → scalarstop.hyperparams.HyperparamsType¶: Returns a HyperparamsType instance containing hyperparameters.

property hyperparams_flat(self) → Dict[str, Any]¶

Returns a Python dictionary of “flattened” hyperparameters.

AppendDataBlob objects modify a “parent” DataBlob, nesting the parent’s Hyperparams within the AppendDataBlob ‘s own Hyperparams.

This makes it hard to look up a given hyperparams key. A value at parent_datablob.hyperparams.a is stored at child_datablob.hyperparams.parent.hyperparams.a.

This hyperparams_flat property provides all nested hyperparams keys as a flat Python dictionary. If a child AppendDataBlob has a hyperparameter key that that conflicts with the parent, the child’s value will overwrite the parent’s value.

class DataBlob(*, hyperparams: Optional[Union[Mapping[str, Any], scalarstop.hyperparams.HyperparamsType]] = None, **kwargs)¶

Bases: DataBlobBase

Subclass this to group your training, validation, and test sets for training machine learning models.

Here is how to use DataBlob to group your training, validation, and test sets:

Subclass DataBlob with a class name that describes your dataset in general. In this example, we’ll use MyDataBlob as the class name.
Define a dataclass using the @sp.dataclass decorator at MyDataBlob.Hyperparams. We’ll define an instance of this dataclass at MyDataBlob.hyperparams. This describes the hyperparameters involved in processing your dataset.
Override the methods DataBlob.set_training(), DataBlob.set_validation(), and DataBlob.set_test() to generate tf.data.Dataset pipelines representing your training, validation, and test sets.

Those three steps roughly look like:

>>> import tensorflow as tf
>>> import scalarstop as sp
>>>
>>> class MyDataBlob(sp.DataBlob):
...
...     @sp.dataclass
...     class Hyperparams(sp.HyperparamsType):
...             cols: int
...
...     def _data(self):
...             x = tf.random.uniform(shape=(10, self.hyperparams.cols))
...             y = tf.round(tf.random.uniform(shape=(10,1)))
...             return tf.data.Dataset.zip((
...                     tf.data.Dataset.from_tensor_slices(x),
...                     tf.data.Dataset.from_tensor_slices(y),
...             ))
...
...     def set_training(self):
...         return self._data()
...
...     def set_validation(self):
...         return self._data()
...
...     def set_test(self):
...         return self._data()
>>>

In our above example, our training, validation, and test sets are created with the exact same code. In practice, you’ll be creating them with different inputs.

Now we create an instance of our subclass so we can start using it.

>>> datablob = MyDataBlob(hyperparams=dict(cols=3))
>>> datablob
<sp.DataBlob MyDataBlob-bn5hpc7ueo2uz7as1747tetn>

DataBlob instances are given a unique name by hashing together the class name with the instance’s hyperparameters.

>>> datablob.name
'MyDataBlob-bn5hpc7ueo2uz7as1747tetn'
>>>
>>> datablob.group_name
'MyDataBlob'
>>>
>>> datablob.hyperparams
MyDataBlob.Hyperparams(cols=3)
>>>
>>> sp.enforce_dict(datablob.hyperparams)
{'cols': 3}

We save exactly one instance of each tf.data.Dataset pipeline in the properties DataBlob.training, DataBlob.validation, and DataBlob.test.

>>> datablob.training
<ZipDataset element_spec=(TensorSpec(shape=(3,), dtype=tf.float32, name=None), TensorSpec(shape=(1,), dtype=tf.float32, name=None))>
>>>
>>> datablob.validation
<ZipDataset element_spec=(TensorSpec(shape=(3,), dtype=tf.float32, name=None), TensorSpec(shape=(1,), dtype=tf.float32, name=None))>
>>>
>>> datablob.test
<ZipDataset element_spec=(TensorSpec(shape=(3,), dtype=tf.float32, name=None), TensorSpec(shape=(1,), dtype=tf.float32, name=None))>

DataBlob objects have some methods for applying tf.data transformations to the training, validation, and test sets at the same time:

Batching. DataBlob.batch() will batch the training, validation, and test sets at the same time. If you call DataBlob.batch() with the keyword argument with_tf_distribute=True, your input batch size will be multiplied by the number of replicas in your tf.distribute strategy.
Caching. DataBlob.cache() will cache the training, validation, and test sets in memory once you iterate over them. This is useful if your tf.data.Dataset are doing something computationally expensive each time you iterate over them.
Saving/loading to/from the filesystem. DataBlob.save() saves the training, validation, and test sets to a path on the filesystem. This can be loaded back with the classmethod DataBlob.from_exact_path().

>>> import os
>>> import tempfile
>>> tempdir = tempfile.TemporaryDirectory()
>>>
>>> datablob = datablob.save(tempdir.name)
>>>
>>> os.listdir(tempdir.name)
['MyDataBlob-bn5hpc7ueo2uz7as1747tetn']

>>> path = os.path.join(tempdir.name, datablob.name)
>>> loaded_datablob = MyDataBlob.from_exact_path(path)
>>> loaded_datablob
<sp.DataBlob MyDataBlob-bn5hpc7ueo2uz7as1747tetn>

Alternatively, if you have the hyperparameters of the DataBlob but not the name, you can use the classmethod DataBlob.from_filesystem().

>>> loaded_datablob_2 = MyDataBlob.from_filesystem(
...    hyperparams=dict(cols=3),
...    datablobs_directory=tempdir.name,
... )
>>> loaded_datablob_2
<sp.DataBlob MyDataBlob-bn5hpc7ueo2uz7as1747tetn>

(and now let’s clean up the temporary directory from above)

>>> tempdir.cleanup()

Parameters: hyperparams – The hyperparameters to initialize this class with.

Hyperparams :Type[scalarstop.hyperparams.HyperparamsType]¶

classmethod from_filesystem(cls, *, hyperparams: Optional[Union[Mapping[str, Any], scalarstop.hyperparams.HyperparamsType]] = None, datablobs_directory: str, shard_offset: Optional[int] = None, shard_quantity: int = 1)¶

Loads a DataBlob from the filesystem, calculating the filename from the hyperparameters.

Parameters

hyperparams – The hyperparameters of the model that we want to load.
datablobs_directory – The parent directory of all of your saved DataBlob s. The exact filename is calculated from the class name and hyperparams.

classmethod from_filesystem_distributed(cls, *, hyperparams: Optional[Union[Mapping[str, Any], scalarstop.hyperparams.HyperparamsType]] = None, datablobs_directory: str, cache: bool = False, repeat: Union[bool, int, None] = True, per_replica_batch_size: Optional[int] = None, tf_distribute_strategy: Optional[tf.distribute.Strategy] = None) → DistributedDataBlob¶

Loads a sharded DataBlob from the filesystem, automatically splitting the shards amongs the input workers of a tf.distribute.Strategy.

Parameters

hyperparams – The hyperparameters of the model that we want to load.
datablobs_directory – The parent directory of all of your saved DataBlob s. The exact filename is calculated from the class name and hyperparams.
cache – Whether to cache the DataBlob in memory. If repeat is also enabled, then caching will happen before repeating.
repeat – Repeats the DataBlob after loading it. Set to True to enable infinite repeating. Set to a positive integer n to repeat the DataBlob n times. Set to False to disable repeating.
per_replica_batch_size – The batch size for each individual tf.distribute replica. This is the global batch size divided by tf.distribute.Strategy.num_replicas_in_sync.
tf_distribute_strategy – The tf.distribute.Strategy subclass to use. Optionally, this method will detect if it is already inside a :py:meth:`tf.distribute.Strategy.scope context manager.

classmethod metadata_from_filesystem(cls, *, hyperparams: Optional[Union[Mapping[str, Any], scalarstop.hyperparams.HyperparamsType]] = None, datablobs_directory: str) → scalarstop.datablob_metadata.DataBlobMetadata¶

Loads this DataBlob ‘s DataBlobMetadata from the filesystem, calculating the filename from the hyperparameters.

Parameters

hyperparams – The hyperparameters of the model that we want to load.
datablobs_directory – The parent directory of all of your saved DataBlob s. The exact filename is calculated from the class name and hyperparams.

classmethod from_filesystem_or_new(cls, *, hyperparams: Optional[Union[Mapping[str, Any], scalarstop.hyperparams.HyperparamsType]] = None, datablobs_directory: str, shard_offset: Optional[int] = None, shard_quantity: int = 1, **kwargs)¶

Load a DataBlob from the filesystem, calculating the filename from the hyperparameters. Create a new DataBlob if we cannot find a saved one on the filesystem.

Parameters

hyperparams – The hyperparameters of the model that we want to load.
datablobs_directory – The parent directory of all of your saved DataBlob s. The exact filename is calculated from the class name and hyperparams.
**kwargs – Other keyword arguments that you need to pass to your __init__().

static from_exact_path(path: str, *, shard_offset: Optional[int] = None, shard_quantity: int = 1) → DataBlob¶: Load a DataBlob from a directory on the filesystem.

classmethod from_exact_path_distributed(cls, *, path: str, cache: bool = False, repeat: Union[bool, int, None] = True, per_replica_batch_size: Optional[int] = None, tf_distribute_strategy: Optional[tensorflow.distribute.get_strategy] = None) → DistributedDataBlob¶

Parameters

path – The exact location of the saved DataBlob on the filesystem.
cache – Whether to cache the DataBlob in memory. If repeat is also enabled, then caching will happen before repeating.
repeat – Repeats the DataBlob after loading it. Set to True to enable infinite repeating. Set to a positive integer n to repeat the DataBlob n times. Set to False to disable repeating.
per_replica_batch_size – The batch size for each individual tf.distribute replica. This is the global batch size divided by tf.distribute.Strategy.num_replicas_in_sync.
tf_distribute_strategy – The tf.distribute.Strategy subclass to use. Optionally, this method will detect if it is already inside a :py:meth:`tf.distribute.Strategy.scope context manager.

static metadata_from_exact_path(path: str) → scalarstop.datablob_metadata.DataBlobMetadata¶: Loads this DataBlob ‘s DataBlobMetadata from a directory on the filesystem.

exists_in_datablobs_directory(self, datablobs_directory: str) → bool¶

Returns True if this DataBlob was already saved within datablobs_directory.

Parameters: datablobs_directory – The parent directory of all of your saved DataBlob s.
Returns: Returns True if we found a py:class:DataBlob metadata file at the expected location.

set_training(self) → tf.data.Dataset¶: Create a tf.data.Dataset for the training set.

property training(self) → tf.data.Dataset¶: A tf.data.Dataset instance representing the training set.

set_validation(self) → tf.data.Dataset¶: Create a tf.data.Dataset for the validation set.

property validation(self) → tf.data.Dataset¶: A tf.data.Dataset instance representing the validation set.

set_test(self) → tf.data.Dataset¶: Create a tf.data.Dataset for the test set.

property test(self) → tf.data.Dataset¶: A tf.data.Dataset instance representing the test set.

batch(self, batch_size: int, *, training: bool = True, validation: bool = True, test: bool = True, with_tf_distribute: bool = False) → DataBlob¶

Batch this DataBlob.

Parameters

batch_size – The number of items to collect into a batch.
training – Whether to batch the training set. Defaults to True.
validation – Whether to batch the validation set. Defaults to True.
test – Whether to batch the test set. Defaults to True.
with_tf_distribute – Whether to consider tf.distribute auto-data sharding when calculating the batch size.

cache(self, *, training: bool = True, validation: bool = True, test: bool = True, precache_training: bool = False, precache_validation: bool = False, precache_test: bool = False) → DataBlob¶

Cache this DataBlob into memory before iterating over it.

By default, this creates a DataBlob containing a TensorFlow CacheDataset for each of the training, validation and test tf.data.Dataset s.

But these datasets do not load into memory until the first time you completely iterate over one–from start to end. If you want to immediately load your training, validation, or test sets, you can set precache_training, precache_validation, and/or precache_test to True.

Parameters

training – Lazily cache the training set in CPU memory. Defaults to True.
validation – Lazily cache the validation set in CPU memory. Defaults to True.
test – Lazily cache the test set in CPU memory. Defaults to True.
precache_training – Eagerly cache the training set into memory. Defaults to False.
precache_validation – Eagerly cache the validation set into memory. Defaults to False.
precache_test – Eagerly cache the test set into memory. Defaults to False.

prefetch(self, buffer_size: int, *, training: bool = True, validation: bool = True, test: bool = True) → DataBlob¶

Creates a DataBlob that prefetches elements for performance.

Parameters

buffer_size – The maximum number of elements that will be buffered when prefetching. If the value tf.data.experimental.AUTOTUNE() is used, then the buffer is dynamically tuned.
training – Apply the repeat operator to the training set. Defaults to True.
validation – Apply the repeat operator to the validation set. Defaults to True.
test – Apply the repeat operator to the test set. Defaults to True.

repeat(self, count: Optional[int] = None, *, training: bool = True, validation: bool = True, test: bool = True) → DataBlob¶

Repeats this DataBlob.

Parameters

count – Represents the number of times that the elements in the tf.data.Dataset should be repeated. The default behavior (if count is None or -1) is for the dataset be repeated indefinitely.
training – Apply the repeat operator to the training set. Defaults to True.
validation – Apply the repeat operator to the validation set. Defaults to True.
test – Apply the repeat operator to the test set. Defaults to True.

repeat_interleaved(self, count: int, cycle_length: Optional[int] = None, block_length: Optional[int] = None, num_parallel_calls: Optional[int] = None, deterministic: Optional[bool] = None, *, training: bool = True, validation: bool = True, test: bool = True) → DataBlob¶

Repeats this DataBlob, but interleaved order.

Parameters

count – Represents the number of times that the elements in the tf.data.Dataset should be repeated. This must be a finite integer greater than 0. It cannot be a negative number, None, or an infinite value.
training – Apply the repeat operator to the training set. Defaults to True.
validation – Apply the repeat operator to the validation set. Defaults to True.
test – Apply the repeat operator to the test set. Defaults to True.

with_options(self, options: tf.data.Options, *, training: bool = True, validation: bool = True, test: bool = True) → DataBlob¶

Apply a tf.data.Options object to this DataBlob.

Parameters

options – The tf.data.Options object to apply.
training – Apply the options to the training set. Defaults to True.
validation – Apply the options to the validation set. Defaults to True.
test – Apply the options to the test set. Defaults to True.

save_hook(self, *, subtype: str, path: str) → None¶: Override this method to run additional code when saving this DataBlob to disk.

save(self, datablobs_directory: str, *, ignore_existing: bool = False, num_shards: int = 1, save_load_version: int = _DEFAULT_SAVE_LOAD_VERSION) → DataBlob¶

Save this DataBlob to disk.

Parameters

datablobs_directory – The directory where you plan on storing all of your DataBlobs. This method will save this DataBlob in a subdirectory of datablobs_directory with same name as DataBlob.name.
ignore_existing – Set this to True to ignore if there is already a DataBlob at the given path.
save_load_version – The ScalarStop version for the ScalarStop protocol.

Returns

Return self, enabling you to place this call in a chain.

classmethod calculate_name(cls, *, hyperparams: Optional[Union[Mapping[str, Any], scalarstop.hyperparams.HyperparamsType]] = None) → str¶

Calculate the hashed name of this object, given the hyperparameters.

This classmethod can be used to calculate what an object would be without actually having to call __init__().

property group_name(self) → str¶: The “group” name is this object’s Python class name.

property name(self) → str¶: The group (class) name and a calculated hash of the hyperparameters.

property hyperparams(self) → scalarstop.hyperparams.HyperparamsType¶: Returns a HyperparamsType instance containing hyperparameters.

property hyperparams_flat(self) → Dict[str, Any]¶

Returns a Python dictionary of “flattened” hyperparameters.

AppendDataBlob objects modify a “parent” DataBlob, nesting the parent’s Hyperparams within the AppendDataBlob ‘s own Hyperparams.

This makes it hard to look up a given hyperparams key. A value at parent_datablob.hyperparams.a is stored at child_datablob.hyperparams.parent.hyperparams.a.

class DataFrameDataBlob(*, hyperparams: Optional[Union[Mapping[str, Any], scalarstop.hyperparams.HyperparamsType]] = None, **kwargs)¶

Bases: DataBlob

Subclass this to transform a pandas.DataFrame into your training, validation, and test sets.

DataBlob is useful when you want to manually define your tf.data pipelines and their input tensors.

However, if your input tensors are in a fixed-size list or DataFrame that you want to slice into a training, validation, and test set, then you might find DataFrameDataBlob handy.

Here is how to use it:

Subclass DataFrameDataBlob with a class name that describes your dataset.
Override DataFrameDataBlob.set_dataframe() and have it return a single DataFrame that contains all of the inputs for your training, validation, and test sets. The DataFrame should have one column representing training samples and another column representing training labels.
Override DataFrameDataBlob.transform() and define a method that transforms an arbitrary DataFrame of inputs into a tf.data.Dataset pipeline that represents the actual dataset needed for training and evaluation.

We define what fraction of the DataFrame to split with the class attributes DataFrameDataBlob.training_fraction and DataFrameDataBlob.validation_fraction. By default, 60 percent of the DataFrame is marked for the training set, 20 percent for the validation set, and the remainder of the DataFrame for the test set.

Roughly, this looks like:

>>> import pandas as pd
>>> import tensorflow as tf
>>> import scalarstop as sp
>>>
>>> class MyDataFrameDataBlob(sp.DataFrameDataBlob):
...    samples_column: str = "samples"
...    labels_column: str = "labels"
...    training_fraction: float = 0.6
...    validation_fraction: float = 0.2
...
...    @sp.dataclass
...    class Hyperparams(sp.HyperparamsType):
...        length: int = 0
...
...    def set_dataframe(self):
...        samples = list(range(self.hyperparams.length))
...        labels = list(range(self.hyperparams.length))
...        return pd.DataFrame({self.samples_column: samples, self.labels_column: labels})
...
...    def transform(self, dataframe: pd.DataFrame):
...        return tf.data.Dataset.zip((
...                tf.data.Dataset.from_tensor_slices(dataframe[self.samples_column]),
...                tf.data.Dataset.from_tensor_slices(dataframe[self.labels_column]),
...        ))

>>> datablob2 = MyDataFrameDataBlob(hyperparams=dict(length=10))

And you can use the resulting object in all of the same ways as we’ve demonstrated with DataBlob subclass instances above.

Parameters: hyperparams – The hyperparameters to initialize this class with.

samples_column :str = samples¶

labels_column :str = labels¶

training_fraction :float = 0.6¶

validation_fraction :float = 0.2¶

Hyperparams :Type[scalarstop.hyperparams.HyperparamsType]¶

static from_exact_path(path: str, *, shard_offset: Optional[int] = None, shard_quantity: int = 1) → Union[DataBlob, DataFrameDataBlob]¶: Load a DataFrameDataBlob from a directory on the filesystem.

set_dataframe(self) → pandas.DataFrame¶: Create a new pandas.DataFrame that contains all of the data for the training, validation, and test sets.

property dataframe(self) → pandas.DataFrame¶: A pandas.DataFrame that represents the entire training, validation, and test set.

set_training_dataframe(self) → pandas.DataFrame¶

Sets the pandas.DataFrame for the training set.

By default, this method slices the pandas.DataFrame you have supplied to set_dataframe().

Alternatively, you can choose to directly subclass set_training_dataframe(), set_validation_dataframe(), and :py:meth`set_test_dataframe`.

Returns: Returns a pandas.DataFrame.

property training_dataframe(self) → pandas.DataFrame¶: A pandas.DataFrame representing training set input tensors.

set_validation_dataframe(self) → pandas.DataFrame¶

Sets the pandas.DataFrame for the validation set.

By default, this method slices the pandas.DataFrame you have supplied to set_dataframe().

Alternatively, you can choose to directly subclass set_training_dataframe(), set_validation_dataframe(), and :py:meth`set_test_dataframe`.

Returns: Returns a pandas.DataFrame.

property validation_dataframe(self) → pandas.DataFrame¶: A pandas.DataFrame representing validation set input tensors.

set_test_dataframe(self) → pandas.DataFrame¶

Sets the pandas.DataFrame for the test set.

By default, this method slices the DataFrame you have supplied to set_dataframe().

Alternatively, you can choose to directly subclass set_training_dataframe(), set_validation_dataframe(), and :py:meth`set_test_dataframe`.

Returns: Returns a Pandas pandas.DataFrame.

property test_dataframe(self) → pandas.DataFrame¶: A pandas.DataFrame representing test set input tensors.

transform(self, dataframe: pandas.DataFrame) → tf.data.Dataset¶: Transforms any input tensors into an output tf.data.Dataset.

set_training(self) → tf.data.Dataset¶: Create a tf.data.Dataset for the training set.

set_validation(self) → tf.data.Dataset¶: Create a tf.data.Dataset for the validation set.

set_test(self) → tf.data.Dataset¶: Create a tf.data.Dataset for the test set.

save_hook(self, *, subtype: str, path: str) → None¶: Override this method to run additional code when saving this DataBlob to disk.

Loads a DataBlob from the filesystem, calculating the filename from the hyperparameters.

Parameters

hyperparams – The hyperparameters of the model that we want to load.
datablobs_directory – The parent directory of all of your saved DataBlob s. The exact filename is calculated from the class name and hyperparams.

Loads a sharded DataBlob from the filesystem, automatically splitting the shards amongs the input workers of a tf.distribute.Strategy.

Parameters

hyperparams – The hyperparameters of the model that we want to load.
datablobs_directory – The parent directory of all of your saved DataBlob s. The exact filename is calculated from the class name and hyperparams.
cache – Whether to cache the DataBlob in memory. If repeat is also enabled, then caching will happen before repeating.
repeat – Repeats the DataBlob after loading it. Set to True to enable infinite repeating. Set to a positive integer n to repeat the DataBlob n times. Set to False to disable repeating.
per_replica_batch_size – The batch size for each individual tf.distribute replica. This is the global batch size divided by tf.distribute.Strategy.num_replicas_in_sync.
tf_distribute_strategy – The tf.distribute.Strategy subclass to use. Optionally, this method will detect if it is already inside a :py:meth:`tf.distribute.Strategy.scope context manager.

Loads this DataBlob ‘s DataBlobMetadata from the filesystem, calculating the filename from the hyperparameters.

Parameters

hyperparams – The hyperparameters of the model that we want to load.
datablobs_directory – The parent directory of all of your saved DataBlob s. The exact filename is calculated from the class name and hyperparams.

Load a DataBlob from the filesystem, calculating the filename from the hyperparameters. Create a new DataBlob if we cannot find a saved one on the filesystem.

Parameters

hyperparams – The hyperparameters of the model that we want to load.
datablobs_directory – The parent directory of all of your saved DataBlob s. The exact filename is calculated from the class name and hyperparams.
**kwargs – Other keyword arguments that you need to pass to your __init__().

Parameters

path – The exact location of the saved DataBlob on the filesystem.
cache – Whether to cache the DataBlob in memory. If repeat is also enabled, then caching will happen before repeating.
repeat – Repeats the DataBlob after loading it. Set to True to enable infinite repeating. Set to a positive integer n to repeat the DataBlob n times. Set to False to disable repeating.
per_replica_batch_size – The batch size for each individual tf.distribute replica. This is the global batch size divided by tf.distribute.Strategy.num_replicas_in_sync.
tf_distribute_strategy – The tf.distribute.Strategy subclass to use. Optionally, this method will detect if it is already inside a :py:meth:`tf.distribute.Strategy.scope context manager.

static metadata_from_exact_path(path: str) → scalarstop.datablob_metadata.DataBlobMetadata¶: Loads this DataBlob ‘s DataBlobMetadata from a directory on the filesystem.

exists_in_datablobs_directory(self, datablobs_directory: str) → bool¶

Returns True if this DataBlob was already saved within datablobs_directory.

Parameters: datablobs_directory – The parent directory of all of your saved DataBlob s.
Returns: Returns True if we found a py:class:DataBlob metadata file at the expected location.

property training(self) → tf.data.Dataset¶: A tf.data.Dataset instance representing the training set.

property validation(self) → tf.data.Dataset¶: A tf.data.Dataset instance representing the validation set.

property test(self) → tf.data.Dataset¶: A tf.data.Dataset instance representing the test set.

batch(self, batch_size: int, *, training: bool = True, validation: bool = True, test: bool = True, with_tf_distribute: bool = False) → DataBlob¶

Batch this DataBlob.

Parameters

batch_size – The number of items to collect into a batch.
training – Whether to batch the training set. Defaults to True.
validation – Whether to batch the validation set. Defaults to True.
test – Whether to batch the test set. Defaults to True.
with_tf_distribute – Whether to consider tf.distribute auto-data sharding when calculating the batch size.

Cache this DataBlob into memory before iterating over it.

By default, this creates a DataBlob containing a TensorFlow CacheDataset for each of the training, validation and test tf.data.Dataset s.

Parameters

training – Lazily cache the training set in CPU memory. Defaults to True.
validation – Lazily cache the validation set in CPU memory. Defaults to True.
test – Lazily cache the test set in CPU memory. Defaults to True.
precache_training – Eagerly cache the training set into memory. Defaults to False.
precache_validation – Eagerly cache the validation set into memory. Defaults to False.
precache_test – Eagerly cache the test set into memory. Defaults to False.

prefetch(self, buffer_size: int, *, training: bool = True, validation: bool = True, test: bool = True) → DataBlob¶

Creates a DataBlob that prefetches elements for performance.

Parameters

buffer_size – The maximum number of elements that will be buffered when prefetching. If the value tf.data.experimental.AUTOTUNE() is used, then the buffer is dynamically tuned.
training – Apply the repeat operator to the training set. Defaults to True.
validation – Apply the repeat operator to the validation set. Defaults to True.
test – Apply the repeat operator to the test set. Defaults to True.

repeat(self, count: Optional[int] = None, *, training: bool = True, validation: bool = True, test: bool = True) → DataBlob¶

Repeats this DataBlob.

Parameters

count – Represents the number of times that the elements in the tf.data.Dataset should be repeated. The default behavior (if count is None or -1) is for the dataset be repeated indefinitely.
training – Apply the repeat operator to the training set. Defaults to True.
validation – Apply the repeat operator to the validation set. Defaults to True.
test – Apply the repeat operator to the test set. Defaults to True.

Repeats this DataBlob, but interleaved order.

Parameters

count – Represents the number of times that the elements in the tf.data.Dataset should be repeated. This must be a finite integer greater than 0. It cannot be a negative number, None, or an infinite value.
training – Apply the repeat operator to the training set. Defaults to True.
validation – Apply the repeat operator to the validation set. Defaults to True.
test – Apply the repeat operator to the test set. Defaults to True.

with_options(self, options: tf.data.Options, *, training: bool = True, validation: bool = True, test: bool = True) → DataBlob¶

Apply a tf.data.Options object to this DataBlob.

Parameters

options – The tf.data.Options object to apply.
training – Apply the options to the training set. Defaults to True.
validation – Apply the options to the validation set. Defaults to True.
test – Apply the options to the test set. Defaults to True.

save(self, datablobs_directory: str, *, ignore_existing: bool = False, num_shards: int = 1, save_load_version: int = _DEFAULT_SAVE_LOAD_VERSION) → DataBlob¶

Save this DataBlob to disk.

Parameters

datablobs_directory – The directory where you plan on storing all of your DataBlobs. This method will save this DataBlob in a subdirectory of datablobs_directory with same name as DataBlob.name.
ignore_existing – Set this to True to ignore if there is already a DataBlob at the given path.
save_load_version – The ScalarStop version for the ScalarStop protocol.

Returns

Return self, enabling you to place this call in a chain.

classmethod calculate_name(cls, *, hyperparams: Optional[Union[Mapping[str, Any], scalarstop.hyperparams.HyperparamsType]] = None) → str¶

Calculate the hashed name of this object, given the hyperparameters.

This classmethod can be used to calculate what an object would be without actually having to call __init__().

property group_name(self) → str¶: The “group” name is this object’s Python class name.

property name(self) → str¶: The group (class) name and a calculated hash of the hyperparameters.

property hyperparams(self) → scalarstop.hyperparams.HyperparamsType¶: Returns a HyperparamsType instance containing hyperparameters.

property hyperparams_flat(self) → Dict[str, Any]¶

Returns a Python dictionary of “flattened” hyperparameters.

AppendDataBlob objects modify a “parent” DataBlob, nesting the parent’s Hyperparams within the AppendDataBlob ‘s own Hyperparams.

This makes it hard to look up a given hyperparams key. A value at parent_datablob.hyperparams.a is stored at child_datablob.hyperparams.parent.hyperparams.a.

class AppendDataBlob(*, parent: DataBlob, hyperparams: Optional[Union[Mapping[str, Any], scalarstop.hyperparams.HyperparamsType]] = None, **kwargs)¶

Bases: DataBlob

Subclass this to create a new DataBlob that extends an existing DataBlob.

The AppendDataBlob class is useful when you have an existing DataBlob or DataFrameDataBlob with most, but not all of the functionality you need. If you are trying to implement multiple data pipelines that share a common compute-intensive first step, you can implement your pipelines as AppendDataBlob subclasses with the common first step as a DataBlob that you save and load to/from the filesystem.

Let’s begin by creating a DataBlob that we will use as a parent for an AppendDataBlob.

>>> import tensorflow as tf
>>> import scalarstop as sp
>>>
>>> class MyDataBlob(sp.DataBlob):
...
...     @sp.dataclass
...     class Hyperparams(sp.HyperparamsType):
...             length: int
...
...     def _data(self):
...         length = self.hyperparams.length
...         x = tf.data.Dataset.from_tensor_slices(list(range(0, length)))
...         y = tf.data.Dataset.from_tensor_slices(list(range(length, length * 2)))
...         return tf.data.Dataset.zip((x, y))
...
...     def set_training(self):
...         return self._data()
...
...     def set_validation(self):
...         return self._data()
...
...     def set_test(self):
...         return self._data()
>>>

And then we create an instance of the datablob and save it to the filesystem.

>>> import os
>>> import tempfile
>>> tempdir = tempfile.TemporaryDirectory()
>>>
>>> datablob = MyDataBlob(hyperparams=dict(length=5))
>>> datablob
<sp.DataBlob MyDataBlob-dac936v7mb1ue9phjp6tc3sb>
>>>
>>> list(datablob.training.as_numpy_iterator())
[(0, 5), (1, 6), (2, 7), (3, 8), (4, 9)]
>>>
>>> datablob = datablob.save(tempdir.name)
>>>
>>> os.listdir(tempdir.name)
['MyDataBlob-dac936v7mb1ue9phjp6tc3sb']

Now, let’s say that we want to create an AppendDataBlob that takes in any input DataBlob or DataFrameDataBlob and multiplies every number in every tensor by a constant.

>>> class MyAppendDataBlob(sp.AppendDataBlob):
...
...     @sp.dataclass
...     class Hyperparams(sp.AppendHyperparamsType):
...          coefficient: int
...
...     hyperparams: "MyAppendDataBlob.Hyperparams"
...
...     def __init__(self, *, parent: sp.DataBlob, hyperparams):
...         hyperparams_dict = sp.enforce_dict(hyperparams)
...         if hyperparams_dict["coefficient"] < 1:
...             raise ValueError("Coefficient is too low.")
...         super().__init__(parent=parent, hyperparams=hyperparams_dict)
...
...     def _wrap_tfdata(self, tfdata: tf.data.Dataset) -> tf.data.Dataset:
...          return tfdata.map(
...              lambda x, y: (
...                  x * self.hyperparams.coefficient,
...                  y * self.hyperparams.coefficient,
...               )
...          )
>>>
>>> append = MyAppendDataBlob(parent=datablob, hyperparams=dict(coefficient=3))
>>> list(append.training.as_numpy_iterator())
[(0, 15), (3, 18), (6, 21), (9, 24), (12, 27)]

(And now let’s clean up the temporary directory that we created earlier.)

>>> tempdir.cleanup()

Parameters

parent – The DataBlob to extend.
hyperparams – Additional hyperparameters to add on top of the existing hyperparameters from the parent DataBlob.

Hyperparams :Type[scalarstop.hyperparams.AppendHyperparamsType]¶

classmethod create_append_hyperparams(cls, *, parent: DataBlob, hyperparams: Optional[Union[Mapping[str, Any], scalarstop.hyperparams.HyperparamsType]] = None)¶: Combine the hyperparams from the parent DataBlob with the hyperparams meant for this AppendDataBlob.

classmethod calculate_name_from_parent(cls, *, parent: DataBlob, hyperparams: Optional[Union[Mapping[str, Any], scalarstop.hyperparams.HyperparamsType]] = None)¶: Calculate the hashed name of this AppendDataBlob, given the hyperparameters and the parent DataBlob.

classmethod from_filesystem_with_parent(cls, *, parent: DataBlob, hyperparams: Optional[Union[Mapping[str, Any], scalarstop.hyperparams.HyperparamsType]] = None, datablobs_directory: str, shard_offset: Optional[int] = None, shard_quantity: int = 1)¶: Load a AppendDataBlob from the filesystem, calculating the filename from the parent and the hyperparameters..

classmethod from_filesystem_or_new_with_parent(cls, *, parent: DataBlob, hyperparams: Optional[Union[Mapping[str, Any], scalarstop.hyperparams.HyperparamsType]], datablobs_directory: str, shard_offset: Optional[int] = None, shard_quantity: int = 1, **kwargs)¶

Load a AppendDataBlob from the filesystem, calculating the filename from the hyperparameters. Create a new AppendDataBlob if we cannot find a saved one on the filesystem.

Parameters

parent – The parent DataBlob to extend.
hyperparams – The hyperparameters of the DataBlob that we want to load.
datablobs_directory – The parent directory of all of your saved DataBlob s. The exact filename is calculated.

property parent(self) → DataBlob¶: The parent DataBlob.

set_training(self) → tf.data.Dataset¶: Create a tf.data.Dataset for the training set.

property training(self) → tf.data.Dataset¶: A tf.data.Dataset instance representing the training set.

set_validation(self) → tf.data.Dataset¶: Create a tf.data.Dataset for the validation set.

property validation(self) → tf.data.Dataset¶: A tf.data.Dataset instance representing the validation set.

set_test(self) → tf.data.Dataset¶: Create a tf.data.Dataset for the test set.

property test(self) → tf.data.Dataset¶: A tf.data.Dataset instance representing the test set.

Loads a DataBlob from the filesystem, calculating the filename from the hyperparameters.

Parameters

hyperparams – The hyperparameters of the model that we want to load.
datablobs_directory – The parent directory of all of your saved DataBlob s. The exact filename is calculated from the class name and hyperparams.

Loads a sharded DataBlob from the filesystem, automatically splitting the shards amongs the input workers of a tf.distribute.Strategy.

Parameters

hyperparams – The hyperparameters of the model that we want to load.
datablobs_directory – The parent directory of all of your saved DataBlob s. The exact filename is calculated from the class name and hyperparams.
cache – Whether to cache the DataBlob in memory. If repeat is also enabled, then caching will happen before repeating.
repeat – Repeats the DataBlob after loading it. Set to True to enable infinite repeating. Set to a positive integer n to repeat the DataBlob n times. Set to False to disable repeating.
per_replica_batch_size – The batch size for each individual tf.distribute replica. This is the global batch size divided by tf.distribute.Strategy.num_replicas_in_sync.
tf_distribute_strategy – The tf.distribute.Strategy subclass to use. Optionally, this method will detect if it is already inside a :py:meth:`tf.distribute.Strategy.scope context manager.

Loads this DataBlob ‘s DataBlobMetadata from the filesystem, calculating the filename from the hyperparameters.

Parameters

hyperparams – The hyperparameters of the model that we want to load.
datablobs_directory – The parent directory of all of your saved DataBlob s. The exact filename is calculated from the class name and hyperparams.

Load a DataBlob from the filesystem, calculating the filename from the hyperparameters. Create a new DataBlob if we cannot find a saved one on the filesystem.

Parameters

hyperparams – The hyperparameters of the model that we want to load.
datablobs_directory – The parent directory of all of your saved DataBlob s. The exact filename is calculated from the class name and hyperparams.
**kwargs – Other keyword arguments that you need to pass to your __init__().

static from_exact_path(path: str, *, shard_offset: Optional[int] = None, shard_quantity: int = 1) → DataBlob¶: Load a DataBlob from a directory on the filesystem.

Parameters

path – The exact location of the saved DataBlob on the filesystem.
cache – Whether to cache the DataBlob in memory. If repeat is also enabled, then caching will happen before repeating.
repeat – Repeats the DataBlob after loading it. Set to True to enable infinite repeating. Set to a positive integer n to repeat the DataBlob n times. Set to False to disable repeating.
per_replica_batch_size – The batch size for each individual tf.distribute replica. This is the global batch size divided by tf.distribute.Strategy.num_replicas_in_sync.
tf_distribute_strategy – The tf.distribute.Strategy subclass to use. Optionally, this method will detect if it is already inside a :py:meth:`tf.distribute.Strategy.scope context manager.

static metadata_from_exact_path(path: str) → scalarstop.datablob_metadata.DataBlobMetadata¶: Loads this DataBlob ‘s DataBlobMetadata from a directory on the filesystem.

exists_in_datablobs_directory(self, datablobs_directory: str) → bool¶

Returns True if this DataBlob was already saved within datablobs_directory.

Parameters: datablobs_directory – The parent directory of all of your saved DataBlob s.
Returns: Returns True if we found a py:class:DataBlob metadata file at the expected location.

batch(self, batch_size: int, *, training: bool = True, validation: bool = True, test: bool = True, with_tf_distribute: bool = False) → DataBlob¶

Batch this DataBlob.

Parameters

batch_size – The number of items to collect into a batch.
training – Whether to batch the training set. Defaults to True.
validation – Whether to batch the validation set. Defaults to True.
test – Whether to batch the test set. Defaults to True.
with_tf_distribute – Whether to consider tf.distribute auto-data sharding when calculating the batch size.

Cache this DataBlob into memory before iterating over it.

By default, this creates a DataBlob containing a TensorFlow CacheDataset for each of the training, validation and test tf.data.Dataset s.

Parameters

training – Lazily cache the training set in CPU memory. Defaults to True.
validation – Lazily cache the validation set in CPU memory. Defaults to True.
test – Lazily cache the test set in CPU memory. Defaults to True.
precache_training – Eagerly cache the training set into memory. Defaults to False.
precache_validation – Eagerly cache the validation set into memory. Defaults to False.
precache_test – Eagerly cache the test set into memory. Defaults to False.

prefetch(self, buffer_size: int, *, training: bool = True, validation: bool = True, test: bool = True) → DataBlob¶

Creates a DataBlob that prefetches elements for performance.

Parameters

buffer_size – The maximum number of elements that will be buffered when prefetching. If the value tf.data.experimental.AUTOTUNE() is used, then the buffer is dynamically tuned.
training – Apply the repeat operator to the training set. Defaults to True.
validation – Apply the repeat operator to the validation set. Defaults to True.
test – Apply the repeat operator to the test set. Defaults to True.

repeat(self, count: Optional[int] = None, *, training: bool = True, validation: bool = True, test: bool = True) → DataBlob¶

Repeats this DataBlob.

Parameters

count – Represents the number of times that the elements in the tf.data.Dataset should be repeated. The default behavior (if count is None or -1) is for the dataset be repeated indefinitely.
training – Apply the repeat operator to the training set. Defaults to True.
validation – Apply the repeat operator to the validation set. Defaults to True.
test – Apply the repeat operator to the test set. Defaults to True.

Repeats this DataBlob, but interleaved order.

Parameters

count – Represents the number of times that the elements in the tf.data.Dataset should be repeated. This must be a finite integer greater than 0. It cannot be a negative number, None, or an infinite value.
training – Apply the repeat operator to the training set. Defaults to True.
validation – Apply the repeat operator to the validation set. Defaults to True.
test – Apply the repeat operator to the test set. Defaults to True.

with_options(self, options: tf.data.Options, *, training: bool = True, validation: bool = True, test: bool = True) → DataBlob¶

Apply a tf.data.Options object to this DataBlob.

Parameters

options – The tf.data.Options object to apply.
training – Apply the options to the training set. Defaults to True.
validation – Apply the options to the validation set. Defaults to True.
test – Apply the options to the test set. Defaults to True.

save_hook(self, *, subtype: str, path: str) → None¶: Override this method to run additional code when saving this DataBlob to disk.

save(self, datablobs_directory: str, *, ignore_existing: bool = False, num_shards: int = 1, save_load_version: int = _DEFAULT_SAVE_LOAD_VERSION) → DataBlob¶

Save this DataBlob to disk.

Parameters

datablobs_directory – The directory where you plan on storing all of your DataBlobs. This method will save this DataBlob in a subdirectory of datablobs_directory with same name as DataBlob.name.
ignore_existing – Set this to True to ignore if there is already a DataBlob at the given path.
save_load_version – The ScalarStop version for the ScalarStop protocol.

Returns

Return self, enabling you to place this call in a chain.

classmethod calculate_name(cls, *, hyperparams: Optional[Union[Mapping[str, Any], scalarstop.hyperparams.HyperparamsType]] = None) → str¶

Calculate the hashed name of this object, given the hyperparameters.

This classmethod can be used to calculate what an object would be without actually having to call __init__().

property group_name(self) → str¶: The “group” name is this object’s Python class name.

property name(self) → str¶: The group (class) name and a calculated hash of the hyperparameters.

property hyperparams(self) → scalarstop.hyperparams.HyperparamsType¶: Returns a HyperparamsType instance containing hyperparameters.

property hyperparams_flat(self) → Dict[str, Any]¶

Returns a Python dictionary of “flattened” hyperparameters.

AppendDataBlob objects modify a “parent” DataBlob, nesting the parent’s Hyperparams within the AppendDataBlob ‘s own Hyperparams.

This makes it hard to look up a given hyperparams key. A value at parent_datablob.hyperparams.a is stored at child_datablob.hyperparams.parent.hyperparams.a.

class DistributedDataBlob(*, name: str, group_name: str, hyperparams: Optional[Union[Mapping[str, Any], scalarstop.hyperparams.HyperparamsType]] = None, hyperparams_class: Type[scalarstop.hyperparams.HyperparamsType], cache: bool = False, repeat: Union[bool, int, None] = True, per_replica_batch_size: Optional[int] = None, tf_distribute_strategy: Optional[tensorflow.distribute.get_strategy] = None)¶

Bases: DataBlobBase

Wraps a DataBlob to create a TensorFlow tf.distribute.DistributedDataset.

A DataBlob contains three TensorFlow tf.data.Dataset pipelines, representing a training, validation, and test set. The DistributedDataBlob wraps the creation of a DataBlob to turn each tf.data.Dataset into a tf.distribute.DistributedDataset which is used to distribute a dataset across multiple workers according to a tf.distribute.Strategy.

If you have saved a DataBlob to the filesystem with DataBlob.save(), then you can automatically load the DataBlob from the filesystem as a DistributedDataBlob using the classmethod DataBlob.from_filesystem_distributed() or DataBlob.from_exact_path_distributed().

For more fine-grained control, you can subclass DistributedDataBlob and override DistributedDataBlob.new_sharded_datablob() with your own DataBlob creation and sharding logic. Optionally, you can also subclass DistributedDataBlob.transform_datablob() to change how DistributedDataBlob handles repeating and batching. Finally, you can also subclass DistributedDataBlob.postprocess_tfdata() to make changes to individual tf.data.Dataset instances rather than the DataBlob as a whole.

Parameters

name – The name of the wrapped DataBlob.
group_name – The group name of the wrapped DataBlob.
hyperparams – The hyperparameters of the wrapped DataBlob.
hyperparams_class – The HyperparamsType class that hyperparams instances are created from.
cache – Whether to cache the DataBlob in memory. If repeat is also enabled, then caching will happen before repeating.
repeat – Repeats the DataBlob after loading it. Set to True to enable infinite repeating. Set to a positive integer n to repeat the DataBlob n times. Set to False to disable repeating.
per_replica_batch_size – The batch size for each individual tf.distribute replica. This is the global batch size divided by tf.distribute.Strategy.num_replicas_in_sync.
tf_distribute_strategy – The tf.distribute.Strategy subclass to use. Optionally, this method will detect if it is already inside a :py:meth:`tf.distribute.Strategy.scope context manager.

Hyperparams :Type[scalarstop.hyperparams.HyperparamsType]¶

abstract new_sharded_datablob(self, ctx: tf.distribute.InputContext) → DataBlob¶

Subclass this method to return a sharded DataBlob.

Parameters: ctx – A tf.distribute.InputContext instance. The attribute tf.distribute.InputContext.input_pipeline_id returns the current input pipeline. The attribute tf.distribute.InputContext.num_input_pipelines returns the total number of distributed input pipelines in the current tf.distribute.Strategy.

transform_datablob(self, datablob: DataBlob, ctx: tf.distribute.InputContext) → DataBlob¶

Transforms an already-initialized DataBlob to add repeating and sharding logic.

Parameters

datablob – The already-initialized DataBlob.
ctx – A tf.distribute.InputContext instance. The attribute tf.distribute.InputContext.input_pipeline_id returns the current input pipeline. The attribute tf.distribute.InputContext.num_input_pipelines returns the total number of distributed input pipelines in the current tf.distribute.Strategy.

Returns

Returns a DataBlob that has been modified by: repeating, batching, or another transformation.

postprocess_tfdata(self, tfdata: tf.data.Dataset, ctx: tf.distribute.InputContext) → tf.data.Dataset¶

Performs additional tf.data.Dataset transformations before turning them into tf.distribute.DistributedDataset instances.

Currently, the implementation in DistributedDataBlob does nothing, but is avaiable for you to subclass and change.

Parameters

tfdata – The input tf.data.Dataset instance to transform.
ctx – A tf.distribute.InputContext instance. The attribute tf.distribute.InputContext.input_pipeline_id returns the current input pipeline. The attribute tf.distribute.InputContext.num_input_pipelines returns the total number of distributed input pipelines in the current tf.distribute.Strategy.

Returns

Returns a transformed tf.data.Dataset.

property tf_distribute_strategy(self) → tf.distribute.Strategy¶: Returns the currently-active tf.distribute.Strategy.

set_training(self) → tf.distribute.DistributedDataset¶: Creates a new tf.distribute.DistributedDataset for the training set.

property training(self) → tf.distribute.DistributedDataset¶: A tf.distribute.DistributedDataset instance for the training set.

set_validation(self) → tf.distribute.DistributedDataset¶: Creates a new tf.distribute.DistributedDataset for the validation set.

property validation(self) → tf.distribute.DistributedDataset¶: A tf.distribute.DistributedDataset instance for the validation set.

set_test(self) → tf.distribute.DistributedDataset¶: Creates a new tf.distribute.DistributedDataset for the test set.

property test(self) → tf.distribute.DistributedDataset¶: A tf.distribute.DistributedDataset instance for the test set.

classmethod calculate_name(cls, *, hyperparams: Optional[Union[Mapping[str, Any], scalarstop.hyperparams.HyperparamsType]] = None) → str¶

Calculate the hashed name of this object, given the hyperparameters.

This classmethod can be used to calculate what an object would be without actually having to call __init__().

property group_name(self) → str¶: The “group” name is this object’s Python class name.

property name(self) → str¶: The group (class) name and a calculated hash of the hyperparameters.

property hyperparams(self) → scalarstop.hyperparams.HyperparamsType¶: Returns a HyperparamsType instance containing hyperparameters.

property hyperparams_flat(self) → Dict[str, Any]¶

Returns a Python dictionary of “flattened” hyperparameters.

AppendDataBlob objects modify a “parent” DataBlob, nesting the parent’s Hyperparams within the AppendDataBlob ‘s own Hyperparams.

This makes it hard to look up a given hyperparams key. A value at parent_datablob.hyperparams.a is stored at child_datablob.hyperparams.parent.hyperparams.a.

scalarstop.datablob¶

Module Contents¶

Classes¶

`scalarstop.datablob`¶