scalarstop.train_store¶
Persists DataBlob,
ModelTemplate,
and Model metadata to a database.
What database should I use?¶
Currently the TrainStore supports saving metadata
and metrics to either a SQLite or a PostgreSQL database.
If you are doing all of your work on a single machine, a
SQLite database is easier to set up. But if you are training machine
learning models on multiple machines, you should use a PostgreSQL
database instead of SQLite. The SQLite database is not optimal
for handling multiple concurrent writes.
How can I extend the TrainStore?¶
The TrainStore does not implement absolutely every
type of query that you might want to perform on your
training metrics. However, we directly expose our SQLAlchemy
engine, connection, and tables in the TrainStore
attributes TrainStore.engine,
TrainStore.connection, and
TrainStore.table.
Module Contents¶
Classes¶
Loads and saves names, hyperparameters, and training metrics from |
- class TrainStore(connection_string: str, *, table_name_prefix: Optional[str] = None, echo: bool = False)¶
Loads and saves names, hyperparameters, and training metrics from
DataBlob,ModelTemplate, andModelobjects.Create a
TrainStoreinstance connected to an external database.Use this constructor if you want to connect to a PostgreSQL database. If you want to use a SQLite file as the database, you should instead use the
TrainStore.from_filesystem()classmethod.- Parameters
connection_string – A SQLAlchemy database connection string for connecting to a database. A typical PostgreSQL connection string looks like
"postgresql://username:password@hostname:port/database", with theportdefaulting to5432.table_name_prefix – A string prefix to add to all of the table names we generate. This allows multiple installations of ScalarStop to share the same database.
echo – Set to
Trueto print out the SQL statements that theTrainStoreexecutes.
- classmethod from_filesystem(cls, *, filename: str, table_name_prefix: Optional[str] = None, echo: bool = False) TrainStore¶
Use a SQLite3 database file on the local filesystem as the train store.
- Parameters
filename – The filename of the SQLite3 file.
table_name_prefix – A string prefix to add to all of the table names we generate. This allows multiple installations of ScalarStop to share the same database.
echo – Set to
Trueto print out the SQL statements that theTrainStoreexecutes.
- property table(self) _TrainStoreTables¶
References to the
sqlalchemy.schema.Tableobjects representing our database tables.Currently, there are four tables that are attributes to this property:
datablobmodel_templatemodelmodel_epoch
- property engine(self) sqlalchemy.engine.Engine¶
The currently active
sqlalchemy.engine.Engine.This is useful if you want to write custom SQLAlchemy code on top of
TrainStore.
- property connection(self) sqlalchemy.engine.Connection¶
The currently active
sqlalchemy.engine.Connection.This is useful if you want to write custom SQLAlchemy code on top of
TrainStore.
- insert_datablob(self, datablob: scalarstop.datablob.DataBlobBase, *, ignore_existing: bool = False) None¶
Logs the
DataBlobname, group name, and hyperparams to theTrainStore.This also supports inserting other subclasses of
DataBlobBase, such asDistributedDataBlob.- Parameters
datablob – A
DataBlobinstance whose name and hyperparameters that we want to record in the database.ignore_existing – Set this to
Trueto ignore if aDataBlobwith the same name is already in the database, in which case this function will do nothing. Note thatDataBlobinstances are supposed to be immutable, soTrainStoredoes not implement updating them.
- insert_datablob_by_str(self, *, name: str, group_name: str, hyperparams: Any, ignore_existing: bool = False)¶
Logs the
DataBlobname, group name, and hyperparams to theTrainStore.- Parameters
name – Your
DataBlobname.group_name – Your
DataBlobgroup name.hyperparams – Your
DataBlobhyperparameters.ignore_existing – Set this to
Trueto ignore if aDataBlobwith the same name is already in the database, in which case this function will do nothing. Note thatDataBlobinstances are supposed to be immutable, soTrainStoredoes not implement updating them.
- list_datablobs(self, *, datablob_name: Optional[Union[str, Sequence[str]]] = None, datablob_group_name: Optional[Union[str, Sequence[str]]] = None) pandas.DataFrame¶
Returns a
pandas.DataFramelisting theDataBlobnames in the database.If you call this method without any arguments, it will list ALL of the
DataBlobs in the database. You can narrow down your results by providing ONE (but not both) of the below arguments.
- insert_model_template(self, model_template, *, ignore_existing: bool = False)¶
Logs the
ModelTemplatename, group name, and hyperparams to theTrainStore.- Parameters
model_template – A
ModelTemplateinstance whose name and hyperparameters that we want to record in the database.ignore_existing – Set this to
Trueto ignore if aModelTemplatewith the same name is already in the database, in which case this function will do nothing. Note thatModelTemplateinstances are supposed to be immutable, soTrainStoredoes not implement updating them.
- insert_model_template_by_str(self, *, name: str, group_name: str, hyperparams, ignore_existing: bool = False)¶
Logs the
ModelTemplatename, group name, and hyperparams to theTrainStore.- Parameters
name – Your
ModelTemplatename.group_name – Your
ModelTemplategroup name.hyperparams – Your
ModelTemplatehyperparameters.ignore_existing – Set this to
Trueto ignore if aModelTemplatewith the same name is already in the database, in which case this function will do nothing. Note thatModelTemplateinstances are supposed to be immutable, soTrainStoredoes not implement updating them.
- list_model_templates(self, *, model_template_name: Optional[Union[str, Sequence[str]]] = None, model_template_group_name: Optional[Union[str, Sequence[str]]] = None)¶
Returns a
pandas.DataFramelisting ALL of the rows in theModelTemplatetable.If you call this method without any arguments, it will list ALL of the
ModelTemplates in the database. You can narrow down your results by providing ONE (but not both) of the below arguments.- Parameters
model_template_name – Either a single
ModelTemplatename or a list of names to select.model_template_group_name – Either a single
ModelTemplategroup name or a list of group names to select.
- insert_model(self, model, *, ignore_existing: bool = False)¶
Logs the
Modelname,DataBlob, and :py:class;`~scalarstop.model_template.ModelTemplate` to theTrainStore.- Parameters
model – A
Modelinstance whose name and hyperparameters that we want to record in the database.ignore_existing – Set this to
Trueto ignore if aModelwith the same name is already in the database, in which case this function will do nothing. TheTrainStoredoes not implement the updating ofModelname or hyperparameters. The only way to change aModelis to log more epochs.
- insert_model_by_str(self, *, name: str, model_class_name: str, datablob_name: str, model_template_name: str, ignore_existing: bool = False) None¶
Logs the
Modelname,DataBlob, and :py:class;`~scalarstop.model_template.ModelTemplate` to theTrainStore.- Parameters
name – The
Modelname.model_class_name – The
Modelsubclass name used. If you are usingKerasModel, then this value is the string"KerasModel".datablob_name – The
DataBlobname used to create theModelinstance.model_template_name – The
ModelTemplatename used to create theModelinstance.ignore_existing – Set this to
Trueto ignore if aModelwith the same name is already in the database, in which case this function will do nothing. TheTrainStoredoes not implement the updating ofModelname or hyperparameters. The only way to change aModelis to log more epochs.
- list_models(self, *, datablob_name: Optional[Union[str, Sequence[str]]] = None, datablob_group_name: Optional[Union[str, Sequence[str]]] = None, model_template_name: Optional[Union[str, Sequence[str]]] = None, model_template_group_name: Optional[Union[str, Sequence[str]]] = None) pandas.DataFrame¶
Returns a
pandas.DataFramelisting ALL of the rows in theModeltable.If you call this method without any arguments, it will list ALL of the
Models in the database. Optionally, you can narrow down the results with the following values.Note that you can provide either
datablob_nameordatablob_group_name, but not both.Similarly, you can provide either
model_template_nameormodel_template_group_name, but not both.- Parameters
datablob_name – Either a single
DataBlobname or a list of names to select.datablob_group_name – Either a single
DataBlobgroup name or a list of group names to select.model_template_name – Either a single
ModelTemplatename or a list of names to select.model_template_group_name – Either a single
ModelTemplategroup name or a list of group names to select.
- list_models_grouped_by_epoch_metric(self, *, metric_name: str, metric_direction: str, datablob_name: Optional[Union[str, Sequence[str]]] = None, datablob_group_name: Optional[Union[str, Sequence[str]]] = None, model_template_name: Optional[Union[str, Sequence[str]]] = None, model_template_group_name: Optional[Union[str, Sequence[str]]] = None) pandas.DataFrame¶
Returns a
pandas.DataFramelisting ALL of the rows in theModeltable AND a metric from the model’s best-performing epoch.You provide this method with a model epoch metric name and whether to maximize or minimize this, and then it returns all of the models and the best metric value.
Note that you can provide either
datablob_nameordatablob_group_name, but not both.Similarly, you can provide either
model_template_nameormodel_template_group_name, but not both.- Parameters
metric_name – The name of one of the metrics tracked when training a model. This might be a value like
"loss"or"val_accuracy".metric_direction – Set this to
"min"if the metric you picked inmetric_nameis a value where lower values are better–such as"loss". Set this to"max"if higher values of your metric are better–such as"accuracy".datablob_name – Either a single
DataBlobname or a list of names to select.datablob_group_name – Either a single
DataBlobgroup name or a list of group names to select.model_template_name – Either a single
ModelTemplatename or a list of names to select.model_template_group_name – Either a single
ModelTemplategroup name or a list of group names to select.
Returns a
pandas.DataFramewith the following columns:model_namemodel_class_namemodel_last_modifieddatablob_namedatablob_group_namemodel_template_namemodel_template_group_namesort_metric_valueModelTemplatehyperparameter names prefixed withmth__DataBlobhyperparameter names prefixed withdbh__
- insert_model_epoch(self, *, epoch_num: int, model_name: str, metrics, steps_per_epoch: Optional[int] = None, validation_steps_per_epoch: Optional[int] = None, ignore_existing: bool = False) None¶
Logs a new epoch for a
Modelto theTrainStore.- Parameters
epoch_num – The epoch number that we are adding.
model_name – The name of the
Modeltha we are training.metrics – A dictionary of metric names and values to save.
steps_per_epoch – The number of training steps that count as one epoch. Defaults to
None, which means that an epoch is defined by how long it takes for the model’sDataBlobtraining dataset to be exhausted.validation_steps_per_epoch – The number of validation steps that count as one epoch. Defaults to
None, which means that an epoch is defined by how long it takes for the model’sDataBlobvalidation dataset to be exhausted.ignore_existing – Set this to
Trueto ignore if the database already has a row with the same(model_name, epoch_num)pair.
- bulk_insert_model_epochs(self, model) None¶
Insert a list of
Modelepochs at once.This method will politely ignore if the database already contains rows with the same model name and epoch number.
Currently this method only works if you are using either SQLite or PostgreSQL as the backing database.
- Parameters
model – The
Modelwith the epochs that we want to save.
- list_model_epochs(self, model_name: Optional[Union[str, Sequence[str]]] = None) pandas.DataFrame¶
Returns a
pandas.DataFramelistingModelepochs.By default, this lists ALL epochs in the database for ALL models. You can narrow down the search with the following arguments.
- Parameters
model_name – Specify a single model name or a list of model names whose epochs we are interested in.
- get_current_epoch(self, model_name: str) int¶
Returns how many epochs a given
Modelhas been trained for.Returns 0 if the given model is not registered in the
TrainStore.This information is also saved in the directory created when a
Modelinstance is saved to the filesystem and is available in the attributecurrent_epoch.
- get_best_model(self, *, metric_name: str, metric_direction: str, datablob_name: Optional[Union[str, Sequence[str]]] = None, datablob_group_name: Optional[Union[str, Sequence[str]]] = None, model_template_name: Optional[Union[str, Sequence[str]]] = None, model_template_group_name: Optional[Union[str, Sequence[str]]] = None) _ModelMetadata¶
Return metadata about the model with the best performance on a metric.
This method queries the database, looking for the
Modelwith the best performance on the metric you specified in the parametermetric_name. By default, this returns ALL models in the database sorted by your metric name. Most likely, you will want to narrow down your search using the below arguments.Note that you can provide either
datablob_nameordatablob_group_name, but not both.Similarly, you can provide either
model_template_nameormodel_template_group_name, but not both.- Parameters
metric_name – The name of one of the metrics tracked when training a model. This might be a value like
"loss"or"val_accuracy".metric_direction – Set this to
"min"if the metric you picked inmetric_nameis a value where lower values are better–such as"loss". Set this to"max"if higher values of your metric are better–such as"accuracy".datablob_name – Either a single
DataBlobname or a list of names to select.datablob_group_name – Either a single
DataBlobgroup name or a list of group names to select.model_template_name – Either a single
ModelTemplatename or a list of names to select.model_template_group_name – Either a single
ModelTemplategroup name or a list of group names to select.
- Returns a dataclass with the following attributes:
model_namemodel_class_namemodel_epoch_metricsmodel_last_modifieddatablob_namedatablob_group_namedatablob_hyperparamsdatablob_hyperparams_flatmodel_template_namemodel_template_group_namemodel_template_hyperparamssort_metric_namesort_metric_value