cox.store module¶
-
class
cox.store.Store(storage_folder, exp_id=None, new=False, mode='a')¶ Bases:
objectSerializes and saves data from experiment runs. Automatically makes a tensorboard. Access the tensorboard field, and refer to the TensorboardX documentation for more information about how to manipulate it (it is a tensorboardX object).
Directly saves: int, float, torch scalar, string Saves and links: np.array, torch tensor, python object (via pickle or pytorch serialization)
Note on python object serialization: you can choose one of three options to serialize using:
OBJECT(store as python serialization inline),PICKLE(store as python serialization on disk), orPYTORCH_STATE(save as pytorch serialization on disk). All these types are represented as properties, i.e.store_instance.PYTORCH_STATE. You will need to manually decode the objects using the static methods found in theTableclass (get_pytorch_state,get_object,get_pickle), or use acox.readers.CollectionReaderwhich will handle this for you.Make new experiment store in
storage_folder, within its subdirectoryexp_id(if not none). If an experiment exists already with this corresponding directory, open it for reading.Parameters: - storage_folder (str) – parent folder in which we will put a folder with all our experiment data (this store).
- exp_id (str) – dir name in
storage_folderunder which we will store experimental data. - new (str) – enforce that this store has never been created before.
- mode (str) – mode for accessing tables. a is append only, r is read only, w is write.
-
OBJECT= '__object__'¶ Python serialized datatype (saved as string in the h5 table—not recommended for large objects as these objects must be loaded along with the table)
-
PICKLE= '__pickle__'¶ Pickle datatype (saved on disk and referenced from the table—recommended for larger objects)
-
PYTORCH_STATE= '__pytorch_state__'¶ PyTorch state, e.g. from model.state_dict() (saved on disk and linked)
-
add_table(table_name, schema)¶ Add a new table to the experiment.
Parameters: - table_name (str) – a name for the table
- schema (dict) – a dict for the schema of the table. The entries
should be of the form name:type. For example, if we wanted to
add a float column in the table named acc, we would have an
entry
'acc':float.
Returns: The table object of the new table.
-
add_table_like_example(table_name, example, alternative='__object__')¶ Add a new table to the experiment, using an example dictionary as the basis for the types of the columns.
Parameters: - table_name (str) – a name for the table
- example (dict) – example for the schema of the table. Make a table with columns with types corresponding to the types of the objects in the dictionary.
- alternative (self.OBJECT|self.PICKLE|self.PYTORCH_STATE) – how to store columns that are python objects.
-
close()¶ Closes underlying HDFStore of this store.
-
get_table(table_id)¶ Gets table with key
table_id.Parameters: table_id (str) – id of table to get from this store. Returns: The corresponding table (Table object).
-
log_table_and_tb(table_name, update_dict, summary_type='scalar')¶ Log to a table and also a tensorboard.
Parameters: - table_name (str) – which table to log to
- update_dict (dict) – values to log and store as a dictionary of column mapping to value.
- summary_type (str) – what type of summary to log to tensorboard as
-
class
cox.store.Table(name, schema, table_obj_dir, store, has_initialized=False)¶ Bases:
objectA class representing a single storer table, to be written to by the experiment. This is essentially a single HDFStore table.
Create a new Table object.
Parameters: - name (str) – name of table
- schema (dict) – schema of table (as described in
cox.store.Storeclass) - table_obj_dir (str) – where to store serialized objects on disk store (Store) : parent store.
- has_initialized (bool) – has this table been created yet.
-
append_row(data)¶ Write a dictionary with format column name:value as a row to the table. Must have a value for each column. See
update_row()for more mechanics.Parameters: data (dict) – dictionary with format column name:value.
-
df¶ Access the underlying pandas dataframe for this table.
-
flush_row()¶ Writes the current row we have staged (using
update_row()) to the table. Another row is immediately staged forupdate_row()to act on.
-
get_object(s)¶ Unserialize object of store.OBJECT type (a pickled object stored as a string in the table).
Parameters: s (str) – pickle string to unpickle into a python object.
-
get_pickle(uid)¶ Unserialize object of store.PICKLE type (a pickled object stored as a string on disk).
Parameters: uid (str) – identifier corresponding to stored object in the table.
-
get_state_dict(uid, **kwargs)¶ Unserialize object of store.PYTORCH_STATE type (object stored using pytorch’s serialization system).
Parameters: uid (str) – identifier corresponding to stored object in the table.
-
nrows¶ How many rows this table has.
-
schema¶ Access the underlying schema for this table.
-
update_row(data)¶ Update the currently considered row in the data store. Our database is append only using the
cox.store.TableAPI. We can update this single row as much as we desire, using column:value mappings indata. Eventually, the currently considered row must be written to the database usingcox.store.Table.flush_row(). This model allows for writing rows easily when not all the values are known in a single context. Eachdataobject does not need to contain every column, but by the time that the row is flushed every column must obtained a value. This update model is stateful.Python primitives (
int,float,str,bool), and their numpy equivalents are written automatically to the row. All other objects are serialized (seeStore).Parameters: data (dict) – a dictionary with format column name:value.
-
cox.store.schema_from_dict(d, alternative='__object__')¶ Given a dictionary mapping column names to values, make a corresponding schema.
Parameters: - d (dict) – dict of values we are going to infer the schema from
- alternative (self.OBJECT|self.PICKLE|self.PYTORCH_STATE) – how to store columns that are python objects.