Datasets are used for data and metadata loading.


class dpipe.dataset.base.Dataset(*args, **kwargs)[source]

Bases: object

Interface for datasets.

Its subclasses must define the ids attribute - a tuple of identifiers, one for each dataset entry, as well as methods for loading an entry by its identifier.


a tuple of identifiers, one for each dataset entry.


class dpipe.dataset.csv.CSV(path: ~typing.Union[~pathlib.Path, str], filename: str = 'meta.csv', index_col: str = 'id', loader: ~typing.Callable = <function load>)[source]

Bases: Dataset

A small wrapper for dataframes that contain paths to data.

  • path (PathLike) – the path to the data.

  • filename (str) – the relative path to the csv dataframe. Default is meta.csv.

  • index_col (str, None, optional) – the column that will be used as index. Must contain unique values. Default is id.

  • loader (Callable) – the function to load an object by the path located in a corresponding dataset entry. Default is load_by_ext.

get(index, col)[source]

Returns dataframe element from index and col.

get_global_path(index: str, col: str) str[source]

Get the global path at index and col. Often data frames contain path to data, this is a convenient way to obtain the global path.

load(index: str, col: str, loader=None)[source]

Loads the object from the path located in index and col positions in dataframe.


Wrappers change the dataset’s behaviour. See the Wrappers tutorial for more details.

class dpipe.dataset.wrappers.Proxy(shadowed)[source]

Bases: object

Base class for all wrappers.

dpipe.dataset.wrappers.cache_methods(instance, methods: Optional[Iterable[str]] = None, maxsize: Optional[int] = None)[source]

Cache the instance’s methods. If methods is None, all public methods will be cached.

dpipe.dataset.wrappers.cache_methods_to_disk(instance, base_path: ~typing.Union[~pathlib.Path, str], loader: ~typing.Callable = <function load_numpy>, saver: ~typing.Callable = <function save_numpy>, **methods: str)[source]

Cache the instance’s methods to disk.

  • instance – arbitrary object

  • base_path (str) – the path, all other paths of methods relative to.

  • methods (str) – each keyword argument has the form method_name=path_to_cache. The methods are assumed to take a single argument of type str.

  • loader – loads a single object given its path.

  • saver (Callable(value, path)) – saves a single object to the given path.

dpipe.dataset.wrappers.apply(instance, **methods: Callable)[source]

Applies a given function to the output of a given method.

  • instance – arbitrary object

  • methods (Callable) – each keyword argument has the form method_name=func_to_apply. func_to_apply is applied to the method_name method.


>>> # normalize will be applied to the output of load_image
>>> dataset = apply(base_dataset, load_image=normalize)
dpipe.dataset.wrappers.set_attributes(instance, **attributes)[source]

Sets or overwrites attributes with those provided as keyword arguments.

  • instance – arbitrary object

  • attributes – each keyword argument has the form attr_name=attr_value.

dpipe.dataset.wrappers.change_ids(dataset: Dataset, change_id: Callable, methods: Optional[Iterable[str]] = None) Dataset[source]

Change the dataset’s ids according to the change_id function and adapt the provided methods to work with the new ids.

  • dataset (Dataset) – the dataset to perform ids changing on.

  • change_id (Callable(str) -> str) – the method which allows change ids. Output ids should be unique as well as old ids.

  • methods (Iterable[str]) – the list of methods to be adapted. Each method takes a single argument - the identifier.

dpipe.dataset.wrappers.merge(*datasets: Dataset, methods: Optional[Sequence[str]] = None, attributes: Sequence[str] = ()) Dataset[source]

Merge several datasets into one by preserving the provided methods and attributes.

  • datasets (Dataset) – sequence of datasets.

  • methods (Sequence[str], None, optional) – the list of methods to be preserved. Each method should take an identifier as its first argument. If None, all the common methods will be preserved.

  • attributes (Sequence[str]) – the list of attributes to be preserved. For each dataset their values should be the same. Default is the empty sequence ().

dpipe.dataset.wrappers.apply_mask(dataset: Dataset, mask_modality_id: int = -1, mask_value: Optional[int] = None) Dataset[source]

Applies the mask_modality_id modality as the binary mask to the other modalities and remove the mask from sequence of modalities.

  • dataset (Dataset) – dataset which is used in the current task.

  • mask_modality_id (int) – the index of mask in the sequence of modalities. Default is -1, which means the last modality will be used as the mask.

  • mask_value (int, None, optional) – the value in the mask to filter other modalities with. If None, greater than zero filtering will be applied. Default is None.


>>> modalities = ['flair', 't1', 'brain_mask']  # we are to apply brain mask to other modalities
>>> target = 'target'
>>> dataset = apply_mask(
>>>     dataset=Wmh2017(
>>>         data_path=data_path,
>>>         modalities=modalities,
>>>         target=target
>>>     ),
>>>     mask_modality_id=-1,
>>>     mask_value=1
>>> )