cox.store module

class cox.store.Store(storage_folder, exp_id=None, new=False, mode='a')

Bases: object

Serializes and saves data from experiment runs. Automatically makes a tensorboard. Access the tensorboard field, and refer to the TensorboardX documentation for more information about how to manipulate it (it is a tensorboardX object).

Directly saves: int, float, torch scalar, string Saves and links: np.array, torch tensor, python object (via pickle or pytorch serialization)

Note on python object serialization: you can choose one of three options to serialize using: OBJECT (store as python serialization inline), PICKLE (store as python serialization on disk), or PYTORCH_STATE (save as pytorch serialization on disk). All these types are represented as properties, i.e. store_instance.PYTORCH_STATE. You will need to manually decode the objects using the static methods found in the Table class (get_pytorch_state, get_object, get_pickle), or use a cox.readers.CollectionReader which will handle this for you.

Make new experiment store in storage_folder, within its subdirectory exp_id (if not none). If an experiment exists already with this corresponding directory, open it for reading.

Parameters:
  • storage_folder (str) – parent folder in which we will put a folder with all our experiment data (this store).
  • exp_id (str) – dir name in storage_folder under which we will store experimental data.
  • new (str) – enforce that this store has never been created before.
  • mode (str) – mode for accessing tables. a is append only, r is read only, w is write.
OBJECT = '__object__'

Python serialized datatype (saved as string in the h5 table—not recommended for large objects as these objects must be loaded along with the table)

PICKLE = '__pickle__'

Pickle datatype (saved on disk and referenced from the table—recommended for larger objects)

PYTORCH_STATE = '__pytorch_state__'

PyTorch state, e.g. from model.state_dict() (saved on disk and linked)

add_table(table_name, schema)

Add a new table to the experiment.

Parameters:
  • table_name (str) – a name for the table
  • schema (dict) – a dict for the schema of the table. The entries should be of the form name:type. For example, if we wanted to add a float column in the table named acc, we would have an entry 'acc':float.
Returns:

The table object of the new table.

add_table_like_example(table_name, example, alternative='__object__')

Add a new table to the experiment, using an example dictionary as the basis for the types of the columns.

Parameters:
  • table_name (str) – a name for the table
  • example (dict) – example for the schema of the table. Make a table with columns with types corresponding to the types of the objects in the dictionary.
  • alternative (self.OBJECT|self.PICKLE|self.PYTORCH_STATE) – how to store columns that are python objects.
close()

Closes underlying HDFStore of this store.

get_table(table_id)

Gets table with key table_id.

Parameters:table_id (str) – id of table to get from this store.
Returns:The corresponding table (Table object).
log_table_and_tb(table_name, update_dict, summary_type='scalar')

Log to a table and also a tensorboard.

Parameters:
  • table_name (str) – which table to log to
  • update_dict (dict) – values to log and store as a dictionary of column mapping to value.
  • summary_type (str) – what type of summary to log to tensorboard as
class cox.store.Table(name, schema, table_obj_dir, store, has_initialized=False)

Bases: object

A class representing a single storer table, to be written to by the experiment. This is essentially a single HDFStore table.

Create a new Table object.

Parameters:
  • name (str) – name of table
  • schema (dict) – schema of table (as described in cox.store.Store class)
  • table_obj_dir (str) – where to store serialized objects on disk store (Store) : parent store.
  • has_initialized (bool) – has this table been created yet.
append_row(data)

Write a dictionary with format column name:value as a row to the table. Must have a value for each column. See update_row() for more mechanics.

Parameters:data (dict) – dictionary with format column name:value.
df

Access the underlying pandas dataframe for this table.

flush_row()

Writes the current row we have staged (using update_row()) to the table. Another row is immediately staged for update_row() to act on.

get_object(s)

Unserialize object of store.OBJECT type (a pickled object stored as a string in the table).

Parameters:s (str) – pickle string to unpickle into a python object.
get_pickle(uid)

Unserialize object of store.PICKLE type (a pickled object stored as a string on disk).

Parameters:uid (str) – identifier corresponding to stored object in the table.
get_state_dict(uid, **kwargs)

Unserialize object of store.PYTORCH_STATE type (object stored using pytorch’s serialization system).

Parameters:uid (str) – identifier corresponding to stored object in the table.
nrows

How many rows this table has.

schema

Access the underlying schema for this table.

update_row(data)

Update the currently considered row in the data store. Our database is append only using the cox.store.Table API. We can update this single row as much as we desire, using column:value mappings in data. Eventually, the currently considered row must be written to the database using cox.store.Table.flush_row(). This model allows for writing rows easily when not all the values are known in a single context. Each data object does not need to contain every column, but by the time that the row is flushed every column must obtained a value. This update model is stateful.

Python primitives (int, float, str, bool), and their numpy equivalents are written automatically to the row. All other objects are serialized (see Store).

Parameters:data (dict) – a dictionary with format column name:value.
cox.store.schema_from_dict(d, alternative='__object__')

Given a dictionary mapping column names to values, make a corresponding schema.

Parameters:
  • d (dict) – dict of values we are going to infer the schema from
  • alternative (self.OBJECT|self.PICKLE|self.PYTORCH_STATE) – how to store columns that are python objects.