Experiments management¶
Layouts¶
- class dpipe.layout.base.Flat(split: Iterable[Sequence], prefixes: Sequence[str] = ('train', 'val', 'test'))[source]¶
Bases:
Layout
Generates an experiment with a ‘flat’ structure. Creates a subdirectory of
experiment_path
for the each entry ofsplit
. The subdirectory contains corresponding structure of identifiers.Also, the config file from
config_path
is copied toexperiment_path/resources.config
.- Parameters
split – an iterable with groups of ids.
prefixes (Sequence[str]) – the corresponding prefixes for each identifier group of
split
which will be used to generate appropriate filenames. Default is('train', 'val', 'test')
.
Examples
>>> ids = [ >>> [[1, 2, 3], [4, 5, 6], [7, 8]], >>> [[1, 4, 8], [7, 5, 2], [6, 3]], >>> ] >>> Flat(ids).build('some_path.config', 'experiments/base') # resulting folder structure: # experiments/base: # - resources.config # - experiment_0: # - train_ids.json # 1, 2, 3 # - val_ids.json # 4, 5, 6 # - test_ids.json # 7, 8 # - experiment_1: # - train_ids.json # 1, 4, 8 # - val_ids.json # 7, 5, 2 # - test_ids.json # 6, 3
Splitters¶
Train-val-test¶
- dpipe.split.cv.leave_group_out(ids, groups, *, val_size=None, random_state=42)[source]¶
Leave one group out CV. Validation subset will be selected randomly.
- dpipe.split.cv.train_val_test_split(ids, *, val_size, n_splits, random_state=42)[source]¶
Splits the dataset’s ids into triplets (train, validation, test). The test ids are determined as in the standard K-fold cross-validation setting: for each fold a different portion of 1/K ids is kept for testing. The remaining (K - 1) / K ids are split into train and validation sets according to
val_size
.- Parameters
ids –
val_size (float, int) – If
float
, should be between 0.0 and 1.0 and represents the proportion of the train set to include in the validation set. Ifint
, represents the absolute number of validation samples.n_splits (int) – the number of cross-validation folds.
- Returns
splits
- Return type
Sequence of triplets
- dpipe.split.cv.group_train_val_test_split(ids: Sequence, groups: Union[Callable, Sequence], *, val_size, n_splits, random_state=42)[source]¶
Splits the dataset’s ids into triplets (train, validation, test) keeping all the objects from a group in the same set (either train, validation or test). The test ids are determined as in the standard K-fold cross-validation setting: for each fold a different portion of 1 / K ids is kept for testing. The remaining (K - 1) / K ids are split into train and validation sets according to
val_size
.The splitter guarantees that no objects belonging to the same group will en up in different sets.
- Parameters
ids –
groups (np.ndarray[int]) –
val_size (float, int) – If
float
, should be between 0.0 and 1.0 and represents the proportion of the train set to include in the validation set. Ifint
, represents the absolute number of validation samples.n_splits (int) – the number of cross-validation folds