Dataset¶
Datasets are used for data and metadata loading.
Interfaces¶
- class dpipe.dataset.base.Dataset(*args, **kwargs)[source]¶
Bases:
object
Interface for datasets.
Its subclasses must define the
ids
attribute - a tuple of identifiers, one for each dataset entry, as well as methods for loading an entry by its identifier.- ids¶
- Type
a tuple of identifiers, one for each dataset entry.
Helpers¶
- class dpipe.dataset.csv.CSV(path: ~typing.Union[~pathlib.Path, str], filename: str = 'meta.csv', index_col: str = 'id', loader: ~typing.Callable = <function load>)[source]¶
Bases:
Dataset
A small wrapper for dataframes that contain paths to data.
- Parameters
path (PathLike) – the path to the data.
filename (str) – the relative path to the csv dataframe. Default is
meta.csv
.index_col (str, None, optional) – the column that will be used as index. Must contain unique values. Default is
id
.loader (Callable) – the function to load an object by the path located in a corresponding dataset entry. Default is
load_by_ext
.
Wrappers¶
Wrappers change the dataset’s behaviour. See the Wrappers tutorial for more details.
- dpipe.dataset.wrappers.cache_methods(instance, methods: Optional[Iterable[str]] = None, maxsize: Optional[int] = None)[source]¶
Cache the
instance
’smethods
. Ifmethods
is None, all public methods will be cached.
- dpipe.dataset.wrappers.cache_methods_to_disk(instance, base_path: ~typing.Union[~pathlib.Path, str], loader: ~typing.Callable = <function load_numpy>, saver: ~typing.Callable = <function save_numpy>, **methods: str)[source]¶
Cache the
instance
’smethods
to disk.- Parameters
instance – arbitrary object
base_path (str) – the path, all other paths of
methods
relative to.methods (str) – each keyword argument has the form
method_name=path_to_cache
. The methods are assumed to take a single argument of typestr
.loader – loads a single object given its path.
saver (Callable(value, path)) – saves a single object to the given path.
- dpipe.dataset.wrappers.apply(instance, **methods: Callable)[source]¶
Applies a given function to the output of a given method.
- Parameters
instance – arbitrary object
methods (Callable) – each keyword argument has the form
method_name=func_to_apply
.func_to_apply
is applied to themethod_name
method.
Examples
>>> # normalize will be applied to the output of load_image >>> dataset = apply(base_dataset, load_image=normalize)
- dpipe.dataset.wrappers.set_attributes(instance, **attributes)[source]¶
Sets or overwrites attributes with those provided as keyword arguments.
- Parameters
instance – arbitrary object
attributes – each keyword argument has the form
attr_name=attr_value
.
- dpipe.dataset.wrappers.change_ids(dataset: Dataset, change_id: Callable, methods: Optional[Iterable[str]] = None) Dataset [source]¶
Change the
dataset
’s ids according to thechange_id
function and adapt the providedmethods
to work with the new ids.- Parameters
dataset (Dataset) – the dataset to perform ids changing on.
change_id (Callable(str) -> str) – the method which allows change ids. Output ids should be unique as well as old ids.
methods (Iterable[str]) – the list of methods to be adapted. Each method takes a single argument - the identifier.
- dpipe.dataset.wrappers.merge(*datasets: Dataset, methods: Optional[Sequence[str]] = None, attributes: Sequence[str] = ()) Dataset [source]¶
Merge several
datasets
into one by preserving the providedmethods
andattributes
.- Parameters
datasets (Dataset) – sequence of datasets.
methods (Sequence[str], None, optional) – the list of methods to be preserved. Each method should take an identifier as its first argument. If
None
, all the common methods will be preserved.attributes (Sequence[str]) – the list of attributes to be preserved. For each dataset their values should be the same. Default is the empty sequence
()
.
- dpipe.dataset.wrappers.apply_mask(dataset: Dataset, mask_modality_id: int = -1, mask_value: Optional[int] = None) Dataset [source]¶
Applies the
mask_modality_id
modality as the binary mask to the other modalities and remove the mask from sequence of modalities.- Parameters
dataset (Dataset) – dataset which is used in the current task.
mask_modality_id (int) – the index of mask in the sequence of modalities. Default is
-1
, which means the last modality will be used as the mask.mask_value (int, None, optional) – the value in the mask to filter other modalities with. If
None
, greater than zero filtering will be applied. Default isNone
.
Examples
>>> modalities = ['flair', 't1', 'brain_mask'] # we are to apply brain mask to other modalities >>> target = 'target' >>> >>> dataset = apply_mask( >>> dataset=Wmh2017( >>> data_path=data_path, >>> modalities=modalities, >>> target=target >>> ), >>> mask_modality_id=-1, >>> mask_value=1 >>> )