Experiments management

Layouts

class dpipe.layout.base.Flat(split: Iterable[Sequence], prefixes: Sequence[str] = ('train', 'val', 'test'))[source]

Bases: Layout

Generates an experiment with a ‘flat’ structure. Creates a subdirectory of experiment_path for the each entry of split. The subdirectory contains corresponding structure of identifiers.

Also, the config file from config_path is copied to experiment_path/resources.config.

Parameters
  • split – an iterable with groups of ids.

  • prefixes (Sequence[str]) – the corresponding prefixes for each identifier group of split which will be used to generate appropriate filenames. Default is ('train', 'val', 'test').

Examples

>>> ids = [
>>>     [[1, 2, 3], [4, 5, 6], [7, 8]],
>>>     [[1, 4, 8], [7, 5, 2], [6, 3]],
>>> ]
>>> Flat(ids).build('some_path.config', 'experiments/base')
# resulting folder structure:
# experiments/base:
#   - resources.config
#   - experiment_0:
#       - train_ids.json # 1, 2, 3
#       - val_ids.json # 4, 5, 6
#       - test_ids.json # 7, 8
#   - experiment_1:
#       - train_ids.json # 1, 4, 8
#       - val_ids.json # 7, 5, 2
#       - test_ids.json # 6, 3

Splitters

Train-val-test

dpipe.split.cv.split(ids, *, n_splits, random_state=42)[source]
dpipe.split.cv.leave_group_out(ids, groups, *, val_size=None, random_state=42)[source]

Leave one group out CV. Validation subset will be selected randomly.

dpipe.split.cv.train_val_test_split(ids, *, val_size, n_splits, random_state=42)[source]

Splits the dataset’s ids into triplets (train, validation, test). The test ids are determined as in the standard K-fold cross-validation setting: for each fold a different portion of 1/K ids is kept for testing. The remaining (K - 1) / K ids are split into train and validation sets according to val_size.

Parameters
  • ids

  • val_size (float, int) – If float, should be between 0.0 and 1.0 and represents the proportion of the train set to include in the validation set. If int, represents the absolute number of validation samples.

  • n_splits (int) – the number of cross-validation folds.

Returns

splits

Return type

Sequence of triplets

dpipe.split.cv.group_train_val_test_split(ids: Sequence, groups: Union[Callable, Sequence], *, val_size, n_splits, random_state=42)[source]

Splits the dataset’s ids into triplets (train, validation, test) keeping all the objects from a group in the same set (either train, validation or test). The test ids are determined as in the standard K-fold cross-validation setting: for each fold a different portion of 1 / K ids is kept for testing. The remaining (K - 1) / K ids are split into train and validation sets according to val_size.

The splitter guarantees that no objects belonging to the same group will en up in different sets.

Parameters
  • ids

  • groups (np.ndarray[int]) –

  • val_size (float, int) – If float, should be between 0.0 and 1.0 and represents the proportion of the train set to include in the validation set. If int, represents the absolute number of validation samples.

  • n_splits (int) – the number of cross-validation folds

dpipe.split.cv.stratified_train_val_test_split(ids: Sequence, labels: Union[Callable, Sequence], *, val_size, n_splits, random_state=42)[source]