DeepPipe documentation¶
DeepPipe is a collection of utils for deep learning experiments mainly aimed at medical imaging applications.
Contents¶
Experiments management¶
Layouts¶
- class dpipe.layout.base.Flat(split: Iterable[Sequence], prefixes: Sequence[str] = ('train', 'val', 'test'))[source]¶
Bases:
Layout
Generates an experiment with a ‘flat’ structure. Creates a subdirectory of
experiment_path
for the each entry ofsplit
. The subdirectory contains corresponding structure of identifiers.Also, the config file from
config_path
is copied toexperiment_path/resources.config
.- Parameters
split – an iterable with groups of ids.
prefixes (Sequence[str]) – the corresponding prefixes for each identifier group of
split
which will be used to generate appropriate filenames. Default is('train', 'val', 'test')
.
Examples
>>> ids = [ >>> [[1, 2, 3], [4, 5, 6], [7, 8]], >>> [[1, 4, 8], [7, 5, 2], [6, 3]], >>> ] >>> Flat(ids).build('some_path.config', 'experiments/base') # resulting folder structure: # experiments/base: # - resources.config # - experiment_0: # - train_ids.json # 1, 2, 3 # - val_ids.json # 4, 5, 6 # - test_ids.json # 7, 8 # - experiment_1: # - train_ids.json # 1, 4, 8 # - val_ids.json # 7, 5, 2 # - test_ids.json # 6, 3
Splitters¶
Train-val-test¶
- dpipe.split.cv.leave_group_out(ids, groups, *, val_size=None, random_state=42)[source]¶
Leave one group out CV. Validation subset will be selected randomly.
- dpipe.split.cv.train_val_test_split(ids, *, val_size, n_splits, random_state=42)[source]¶
Splits the dataset’s ids into triplets (train, validation, test). The test ids are determined as in the standard K-fold cross-validation setting: for each fold a different portion of 1/K ids is kept for testing. The remaining (K - 1) / K ids are split into train and validation sets according to
val_size
.- Parameters
ids –
val_size (float, int) – If
float
, should be between 0.0 and 1.0 and represents the proportion of the train set to include in the validation set. Ifint
, represents the absolute number of validation samples.n_splits (int) – the number of cross-validation folds.
- Returns
splits
- Return type
Sequence of triplets
- dpipe.split.cv.group_train_val_test_split(ids: Sequence, groups: Union[Callable, Sequence], *, val_size, n_splits, random_state=42)[source]¶
Splits the dataset’s ids into triplets (train, validation, test) keeping all the objects from a group in the same set (either train, validation or test). The test ids are determined as in the standard K-fold cross-validation setting: for each fold a different portion of 1 / K ids is kept for testing. The remaining (K - 1) / K ids are split into train and validation sets according to
val_size
.The splitter guarantees that no objects belonging to the same group will en up in different sets.
- Parameters
ids –
groups (np.ndarray[int]) –
val_size (float, int) – If
float
, should be between 0.0 and 1.0 and represents the proportion of the train set to include in the validation set. Ifint
, represents the absolute number of validation samples.n_splits (int) – the number of cross-validation folds
Imaging utils¶
Preprocessing¶
- dpipe.im.preprocessing.normalize(x: ndarray, mean: bool = True, std: bool = True, percentiles: Optional[Union[float, Sequence[float]]] = None, axis: Optional[Union[int, Sequence[int]]] = None, dtype=None) ndarray [source]¶
Normalize
x
’s values to make mean and std independently alongaxes
equal to 0 and 1 respectively (if specified).- Parameters
x –
mean – whether to make mean == zero
std – whether to make std == 1
percentiles – if pair (a, b) - the percentiles between which mean and/or std will be estimated if scalar (s) - same as (s, 100 - s) if None - same as (0, 100).
axis – axes along which mean and/or std will be estimated independently. If None - the statistics will be estimated globally.
dtype – the dtype of the output.
- dpipe.im.preprocessing.min_max_scale(x: ndarray, axis: Optional[Union[int, Sequence[int]]] = None) ndarray [source]¶
Scale
x
’s values so that its minimum and maximum become 0 and 1 respectively independently alongaxes
.
- dpipe.im.preprocessing.bytescale(x: ndarray) ndarray [source]¶
Scales
x
’s values so that its minimum and maximum become 0 and 255 respectively. Afterwards converts it touint8
.
- dpipe.im.preprocessing.describe_connected_components(mask: ndarray, background: int = 0, drop_background: bool = True)[source]¶
Get the connected components of
mask
as well as their labels and volumes.- Parameters
mask –
background – the label of the background. The pixels with this label will be marked as the background component (even if it is not connected).
drop_background – whether to exclude the background from the returned components’ descriptions.
- Returns
labeled_mask – array of the same shape as
mask
.labels – a list of labels from the
labeled_mask
. The background label is always 0. The labels are sorted according to their corresponding volumes.volumes – a list of corresponding labels’ volumes.
- dpipe.im.preprocessing.get_greatest_component(mask: ndarray, background: int = 0, drop_background: bool = True) ndarray [source]¶
Get the greatest connected component from
mask
. Seedescribe_connected_components
for details.
Shape operations¶
- dpipe.im.shape_ops.zoom(x: ndarray, scale_factor: Union[float, Sequence[float]], axis: Optional[Union[int, Sequence[int]]] = None, order: int = 1, fill_value: Union[float, Callable] = 0, num_threads: int = -1, backend: Optional[Union[str, Backend, Type[Backend]]] = None) ndarray [source]¶
Rescale
x
according toscale_factor
along theaxis
.Uses a fast parallelizable implementation for fp32 / fp64 (and bool-int16-32-64 if order == 0) inputs, ndim <= 4 and order = 0 or 1.
- Parameters
x (np.ndarray) – n-dimensional array
scale_factor (AxesParams) – float or sequence of floats describing how to scale along axes
axis (AxesLike) – axis along which array will be scaled
order (int) – order of interpolation
fill_value (float | Callable) – value to fill past edges. If Callable (e.g.
numpy.min
) -fill_value(x)
will be usednum_threads (int) – the number of threads to use for computation. Default = the cpu count. If negative value passed cpu count + num_threads + 1 threads will be used
backend (BackendLike) – which backend to use.
numba
,cython
andscipy
are available,cython
is used by default
- Returns
zoomed – zoomed array
- Return type
np.ndarray
Examples
>>> zoomed = zoom(x, 2, axis=[0, 1]) # 3d array >>> zoomed = zoom(x, [1, 2, 3]) # different scales along each axes >>> zoomed = zoom(x.astype(int)) # will fall back to scipy's implementation because of int dtype
- dpipe.im.shape_ops.zoom_to_shape(x: ndarray, shape: Union[int, Sequence[int]], axis: Optional[Union[int, Sequence[int]]] = None, order: int = 1, fill_value: Union[float, Callable] = 0, num_threads: int = -1, backend: Optional[Union[str, Backend, Type[Backend]]] = None) ndarray [source]¶
Rescale
x
to matchshape
along theaxis
.Uses a fast parallelizable implementation for fp32 / fp64 (and bool-int16-32-64 if order == 0) inputs, ndim <= 4 and order = 0 or 1.
- Parameters
x (np.ndarray) – n-dimensional array
shape (AxesLike) – float or sequence of floats describing desired lengths along axes
axis (AxesLike) – axis along which array will be scaled
order (int) – order of interpolation
fill_value (float | Callable) – value to fill past edges. If Callable (e.g.
numpy.min
) -fill_value(x)
will be usednum_threads (int) – the number of threads to use for computation. Default = the cpu count. If negative value passed cpu count + num_threads + 1 threads will be used
backend (BackendLike) – which backend to use.
numba
,cython
andscipy
are available,cython
is used by default
- Returns
zoomed – zoomed array
- Return type
np.ndarray
Examples
>>> zoomed = zoom_to_shape(x, [3, 4, 5]) # 3d array >>> zoomed = zoom_to_shape(x, [6, 7], axis=[1, 2]) # zoom to shape along specified axes >>> zoomed = zoom_to_shape(x.astype(int)) # will fall back to scipy's implementation because of int dtype
- dpipe.im.shape_ops.proportional_zoom_to_shape(x: ndarray, shape: Union[int, Sequence[int]], axis: Optional[Union[int, Sequence[int]]] = None, padding_values: Union[float, Sequence[float], Callable] = 0, order: int = 1) ndarray [source]¶
Proportionally rescale
x
to fitshape
alongaxes
then pad it to that shape. :param x: :param shape: final shape. :param axis: axes along whichx
will be padded. If None - the lastlen(shape)
axes are used. :param padding_values: values to pad with. :param order: order of interpolation.
- dpipe.im.shape_ops.crop_to_shape(x: ndarray, shape: Union[int, Sequence[int]], axis: Optional[Union[int, Sequence[int]]] = None, ratio: Union[float, Sequence[float]] = 0.5) ndarray [source]¶
Crop
x
to matchshape
alongaxes
. :param x: :param shape: final shape. :param axis: axes along whichx
will be padded. If None - the lastlen(shape)
axes are used. :param ratio: the fraction of the crop that will be applied to the left,1 - ratio
will be applied to the right.
- dpipe.im.shape_ops.crop_to_box(x: ndarray, box: ndarray, axis: Optional[Union[int, Sequence[int]]] = None, padding_values: Optional[Union[float, Sequence[float]]] = None) ndarray [source]¶
Crop
x
according tobox
alongaxis
.- Parameters
x (np.ndarray) – n-dimensional array
box (np.ndarray) – array of shape (2, x.ndim or len(axis) if axis is passed) describing crop boundaries
axis (AxesLike) – axis along which
x
will be croppedpadding_values (AxesParams) – values to pad with if box exceeds the input’s limits
- Returns
cropped – cropped array
- Return type
np.ndarray
Examples
>>> x # array of shape [2, 3, 4] >>> cropped = crop_to_box(x, np.array([[0, 0, 0], [1, 1, 1]])) # crop to shape [1, 1, 1] >>> cropped = crop_to_box(x, np.array([[0, 0, 0], [5, 5, 5]])) # fail, box exceeds the input's limits >>> cropped = crop_to_box(x, np.array([[0], [5]]), axis=0, padding_values=0) # pad with 0-s to shape [5, 3, 4]
- dpipe.im.shape_ops.restore_crop(x: ndarray, box: ndarray, shape: Union[int, Sequence[int]], padding_values: Union[float, Sequence[float], Callable] = 0) ndarray [source]¶
Pad
x
to matchshape
. The left padding is taken equal tobox
’s start.- Parameters
x (np.ndarray) – n-dimensional array to pad
box (np.ndarray) – array of shape (2, x.ndim) describing crop boundaries
shape (AxesLike) – shape to restore crop to
padding_values (Union[AxesParams, Callable]) – values to pad with. If Callable (e.g.
numpy.min
) -padding_values(x)
will be used
- Returns
padded – padded array
- Return type
np.ndarray
Examples
>>> x # array of shape [2, 3, 4] >>> padded = restore_crop(x, np.array([[0, 0, 0], [2, 3, 4]]), [4, 4, 4]) # pad to shape [4, 4, 4] >>> padded = restore_crop(x, np.array([[0, 0, 0], [1, 1, 1]]), [4, 4, 4]) # fail, box is inconsistent with an array >>> padded = restore_crop(x, np.array([[1, 2, 3], [3, 5, 7]]), [3, 5, 7]) # pad to shape [3, 5, 7]
- dpipe.im.shape_ops.pad(x: ndarray, padding: Union[int, Sequence[int], Sequence[Sequence[int]]], axis: Optional[Union[int, Sequence[int]]] = None, padding_values: Union[float, Sequence[float], Callable] = 0) ndarray [source]¶
Pad
x
according topadding
along theaxis
.- Parameters
x (np.ndarray) – n-dimensional array to pad
padding (Union[AxesLike, Sequence[Sequence[int]]]) – if 2D array [[start_1, stop_1], …, [start_n, stop_n]] - specifies individual padding for each axis from
axis
. The length of the array must either be equal to 1 or match the length ofaxis
. If 1D array [val_1, …, val_n] - same as [[val_1, val_1], …, [val_n, val_n]]. If scalar (val) - same as [[val, val]]axis (AxesLike) – axis along which
x
will be paddedpadding_values (Union[AxesParams, Callable]) – values to pad with, must be broadcastable to the resulting array. If Callable (e.g.
numpy.min
) -padding_values(x)
will be used
- Returns
padded – padded array
- Return type
np.ndarray
Examples
>>> padded = pad(x, 2) # pad 2 zeros on each side of each axes >>> padded = pad(x, [1, 1], axis=(-1, -2)) # pad 1 zero on each side of last 2 axes
- dpipe.im.shape_ops.pad_to_shape(x: ndarray, shape: Union[int, Sequence[int]], axis: Optional[Union[int, Sequence[int]]] = None, padding_values: Union[float, Sequence[float], Callable] = 0, ratio: Union[float, Sequence[float]] = 0.5) ndarray [source]¶
Pad
x
to matchshape
along theaxis
.- Parameters
x (np.ndarray) – n-dimensional array to pad
shape (AxesLike) – final shape
axis (AxesLike) – axis along which
x
will be paddedpadding_values (Union[AxesParams, Callable]) – values to pad with, must be broadcastable to the resulting array. If Callable (e.g.
numpy.min
) -padding_values(x)
will be usedratio (AxesParams) – float or sequence of floats describing what proportion of padding to apply on the left sides of padding axes. Remaining ratio of padding will be applied on the right sides
- Returns
padded – padded array
- Return type
np.ndarray
Examples
>>> padded = pad_to_shape(x, [4, 5, 6]) # pad 3d array >>> padded = pad_to_shape(x, [4, 5], axis=[0, 1], ratio=0) # pad first 2 axes on the right
- dpipe.im.shape_ops.pad_to_divisible(x: ndarray, divisor: Union[int, Sequence[int]], axis: Optional[Union[int, Sequence[int]]] = None, padding_values: Union[float, Sequence[float], Callable] = 0, ratio: Union[float, Sequence[float]] = 0.5, remainder: Union[int, Sequence[int]] = 0) ndarray [source]¶
Pad
x
to be divisible bydivisor
along theaxis
.- Parameters
x (np.ndarray) – n-dimensional array to pad
divisor (AxesLike) – float or sequence of floats an incoming array shape will be divisible by
axis (AxesLike) – axis along which the array will be padded. If None - the last
len(divisor)
axes are usedpadding_values (Union[AxesParams, Callable]) – values to pad with. If Callable (e.g.
numpy.min
) -padding_values(x)
will be usedratio (AxesParams) – float or sequence of floats describing what proportion of padding to apply on the left sides of padding axes. Remaining ratio of padding will be applied on the right sides
remainder (AxesLike) –
x
will be padded such that its shape gives the remainderremainder
when divided bydivisor
- Returns
padded – padded array
- Return type
np.ndarray
Examples
>>> x # array of shape [2, 3, 4] >>> padded = pad_to_divisible(x, 6) # pad to shape [6, 6, 6] >>> padded = pad_to_divisible(x, [4, 3], axis=[0, 1], ratio=1) # pad first 2 axes on the left, shape - [4, 3, 4] >>> padded = pad_to_divisible(x, 3, remainder=1) # pad to shape [4, 4, 4]
Data augmentation¶
Metrics¶
- dpipe.im.metrics.cross_entropy_with_logits(target: ~numpy.ndarray, logits: ~numpy.ndarray, axis: int = 1, reduce: ~typing.Optional[~typing.Callable] = <function mean>)[source]¶
A numerically stable cross entropy for numpy arrays.
target
andlogits
must have the same shape except foraxis
.- Parameters
target – integer array of shape (d1, …, di, dj, …, dn)
logits – array of shape (d1, …, di, k, dj, …, dn)
axis – the axis containing the logits for each class:
logits.shape[axis] == k
reduce – the reduction operation to be applied to the final loss. If None - no reduction will be performed.
- dpipe.im.metrics.convert_to_aggregated(metrics: ~typing.Dict[str, ~typing.Callable], aggregate_fn: ~typing.Callable = <function mean>, key_prefix: str = '', key_suffix: str = '', *args, **kwargs)[source]¶
- dpipe.im.metrics.to_aggregated(metric: ~typing.Callable, aggregate: ~typing.Callable = <function mean>, *args, **kwargs)[source]¶
Converts a
metric
that receives two values to a metric that receives two sequences and returns an aggregated value.args
andkwargs
are passed as additional arguments otaggregate
.Examples
>>> mean_dice = to_aggregated(dice_score) >>> worst_dice = to_aggregated(dice_score, aggregate=np.min)
Box¶
Functions to work with boxes: immutable numpy arrays of shape (2, n) which represent the coordinates of the upper left and lower right corners of an n-dimensional rectangle.
In slicing operations, as everywhere in Python, the left corner is inclusive, and the right one is non-inclusive.
- dpipe.im.box.make_box_(iterable) ndarray [source]¶
Returns a box, generated inplace from the
iterable
. Ifiterable
was a numpy array, will make it immutable and return.
- dpipe.im.box.returns_box(func: Callable) Callable [source]¶
Returns function, decorated so that it returns a box.
- dpipe.im.box.get_containing_box(shape: tuple) ndarray [source]¶
Returns box that contains complete array of shape
shape
.
- dpipe.im.box.broadcast_box(box: ndarray, shape: tuple, dims: tuple) ndarray [source]¶
Returns box, such that it contains
box
acrossdims
and whole array with shapeshape
across other dimensions.
- dpipe.im.box.limit_box(box, limit) ndarray [source]¶
Returns a box, maximum subset of the input
box
so that start would be non-negative and stop would be limited by thelimit
.
- dpipe.im.box.get_box_padding(box: ndarray, limit)[source]¶
- Returns padding that is necessary to get
box
from array of shapelimit
. Returns padding in numpy form, so it can be given to
numpy.pad
.
- Returns padding that is necessary to get
- dpipe.im.box.add_margin(box: ndarray, margin) ndarray [source]¶
Returns a box with size increased by the
margin
(need to be broadcastable to the box) compared to the inputbox
.
Grid splitters¶
Function for working with patches from tensors. See the Working with patches tutorial for more details.
- dpipe.im.grid.get_boxes(shape: Union[int, Sequence[int]], box_size: Union[int, Sequence[int]], stride: Union[int, Sequence[int]], axis: Optional[Union[int, Sequence[int]]] = None, valid: bool = True) Iterable[ndarray] [source]¶
Yield boxes appropriate for a tensor of shape
shape
in a convolution-like fashion.- Parameters
shape – the input tensor’s shape.
box_size –
axis – axes along which the slices will be taken.
stride – the stride (step-size) of the slice.
valid – whether boxes of size smaller than
box_size
should be left out.
References
See the Working with patches tutorial for more details.
- dpipe.im.grid.divide(x: ~numpy.ndarray, patch_size: ~typing.Union[int, ~typing.Sequence[int]], stride: ~typing.Union[int, ~typing.Sequence[int]], axis: ~typing.Optional[~typing.Union[int, ~typing.Sequence[int]]] = None, valid: bool = False, get_boxes: ~typing.Callable = <function get_boxes>) Iterable[ndarray] [source]¶
A convolution-like approach to generating patches from a tensor.
- Parameters
x –
patch_size –
axis – dimensions along which the slices will be taken.
stride – the stride (step-size) of the slice.
valid – whether patches of size smaller than
patch_size
should be left out.get_boxes – function that yields boxes, for signature see
get_boxes
References
See the Working with patches tutorial for more details.
- dpipe.im.grid.combine(patches: ~typing.Iterable[~numpy.ndarray], output_shape: ~typing.Union[int, ~typing.Sequence[int]], stride: ~typing.Union[int, ~typing.Sequence[int]], axis: ~typing.Optional[~typing.Union[int, ~typing.Sequence[int]]] = None, valid: bool = False, combiner: ~typing.Type[~dpipe.im.grid.PatchCombiner] = <class 'dpipe.im.grid.Average'>, get_boxes: ~typing.Callable = <function get_boxes>) ndarray [source]¶
Build a tensor of shape
output_shape
frompatches
obtained in a convolution-like approach with corresponding parameters. The overlapping parts are aggregated using the strategy fromcombiner
- Average by default.References
See the Working with patches tutorial for more details.
Patch¶
Tools for patch extraction and generation.
- dpipe.im.patch.sample_box_center_uniformly(shape, box_size: array, random_state: Optional[RandomState] = None)[source]¶
Returns the center of a sampled uniformly box of size
box_size
, contained in the array of shapeshape
.
- dpipe.im.patch.get_random_patch(*arrays: ~numpy.ndarray, patch_size: ~typing.Union[int, ~typing.Sequence[int]], axis: ~typing.Optional[~typing.Union[int, ~typing.Sequence[int]]] = None, distribution: ~typing.Callable = <function uniform>)[source]¶
Get a random patch of size
path_size
along theaxes
for each of thearrays
. The patch position is equal for all thearrays
.- Parameters
arrays –
patch_size –
axis –
distribution (Callable(shape)) – function that samples a random number in the range
[0, n)
for each axis. Defaults to a uniform distribution.
- dpipe.im.patch.get_random_box(shape: ~typing.Union[int, ~typing.Sequence[int]], box_shape: ~typing.Union[int, ~typing.Sequence[int]], axis: ~typing.Union[int, ~typing.Sequence[int]] = None, distribution: ~typing.Callable = <function uniform>) ndarray [source]¶
Get a random box of shape
box_shape
that fits in theshape
along the givenaxes
.
Distributions¶
Module for calculation of various statistics given a discrete or piecewise-linear distribution.
- dpipe.im.dist.weighted_sum(weights: Union[ndarray, torch.Tensor], axis: Union[int, Sequence[int]], values_range: Callable) Union[ndarray, torch.Tensor] [source]¶
Calculates a weighted sum of values returned by
values_range
with the correspondingweights
along a givenaxis
.- Parameters
weights –
axis –
values_range – takes
n
as input and returns an array ofn
values wheren = weights.shape[axis]
.
- dpipe.im.dist.expectation(distribution: ~typing.Union[~numpy.ndarray, torch.Tensor], axis: int, integral: ~typing.Callable = <function polynomial>, *args, **kwargs) Union[ndarray, torch.Tensor] [source]¶
Calculates the expectation of a function
h
given itsintegral
and adistribution
.args
andkwargs
are passed tointegral
as additional arguments.- Parameters
distribution – the distribution by which the expectation will be calculated. Must sum to 1 along the
axis
.axis – the axis along which the expectation is calculated.
integral – the definite integral of the function
h
. Seepolynomial
for an example.
Notes
This function calculates the expectation by a piecewise-linear distribution in the range \([0, N]\) where
N = distribution.shape[axis] + 1
:\[\mathbb{E}_F[h] = \int\limits_0^N h(x) dF(x) = \sum\limits_0^{N-1} \int\limits_i^{i+1} h(x) dF(x) = \sum\limits_0^{N-1} distribution_i \int\limits_i^{i+1} h(x) dx = \sum\limits_0^{N-1} distribution_i \cdot (H(i+1) - H(i)),\]where \(distribution_i\) are taken along
axis
, \(H(i) = \int\limits_0^{i} h(x) dx\) are returned byintegral
.References
- dpipe.im.dist.marginal_expectation(distribution: ~typing.Union[~numpy.ndarray, torch.Tensor], axis: ~typing.Union[int, ~typing.Sequence[int]], integrals: ~typing.Union[~typing.Callable, ~typing.Sequence[~typing.Callable]] = <function polynomial>, *args, **kwargs) list [source]¶
Computes expectations along the
axis
according tointegrals
independently.args
andkwargs
are passed tointegral
as additional arguments.
Slicing¶
Images visualization¶
- dpipe.im.visualize.slice3d(*data: ndarray, axis: int = -1, scale: int = 5, max_columns: Optional[int] = None, colorbar: bool = False, show_axes: bool = False, cmap: Union[Colormap, str] = 'gray', vlim: Optional[Union[float, Sequence[float]]] = None, titles: Optional[Sequence[Optional[str]]] = None)[source]¶
Creates an interactive plot, simultaneously showing slices along a given
axis
for all the passed images.- Parameters
data –
axis –
scale – the figure scale.
max_columns – the maximal number of figures in a row. If None - all figures will be in the same row.
colorbar – Whether to display a colorbar.
show_axes – Whether to do display grid on the image.
cmap –
vlim – used to normalize luminance data. If None - the limits are determined automatically. Must be broadcastable to (len(data), 2). See
matplotlib.pyplot.imshow
(vmin and vmax) for details.
- dpipe.im.visualize.animate3d(*data: ndarray, output_path: Union[Path, str], axis: int = -1, scale: int = 5, max_columns: Optional[int] = None, colorbar: bool = False, show_axes: bool = False, cmap: str = 'gray', vlim=(None, None), fps: int = 30, writer: str = 'imagemagick', repeat: bool = True)[source]¶
Saves an animation to
output_path
, simultaneously showing slices along a givenaxis
for all the passed images.- Parameters
data (np.ndarray) –
output_path (str) –
axis (int) –
scale (int) – the figure scale.
max_columns (int) – the maximal number of figures in a row. If None - all figures will be in the same row.
colorbar (bool) – Whether to display a colorbar. Works only if ``vlim``s are not None.
show_axes (bool) – Whether to do display grid on the image.
cmap – parameters passed to matplotlib.pyplot.imshow
vlim – parameters passed to matplotlib.pyplot.imshow
fps (int) –
writer (str) –
repeat (bool) – whether the animation should repeat when the sequence of frames is completed.
Color space conversion¶
- dpipe.im.hsv.rgb_from_hsv_data(hue, saturation, value)[source]¶
Creates image in RGB format from HSV data.
- dpipe.im.hsv.gray_image_colored_mask(gray_image, mask, hue)[source]¶
Creates gray image with colored mask. Keeps intensities intact, so dark areas on gray image will be hard to see even after colorization.
Various utils¶
- dpipe.im.utils.apply_along_axes(func: Callable, x: ndarray, axis: Union[int, Sequence[int]], *args, **kwargs)[source]¶
Apply
func
to slices fromx
taken alongaxes
.args
andkwargs
are passed as additional arguments.Notes
func
must return an array of the same shape as it received.
- dpipe.im.utils.build_slices(start: Sequence[int], stop: Optional[Sequence[int]] = None) Tuple[slice, ...] [source]¶
Returns a tuple of slices built from
start
andstop
.Examples
>>> build_slices([1, 2, 3], [4, 5, 6]) (slice(1, 4), slice(2, 5), slice(3, 6)) >>> build_slices([10, 11]) (slice(10), slice(11))
- dpipe.im.utils.composition(func: Callable, *args, **kwargs)[source]¶
Applies
func
to the output of the decorated function.args
andkwargs
are passed as additional positional and keyword arguments respectively.
- dpipe.im.utils.get_mask_volume(mask: ndarray, *spacing: Union[float, Sequence[float]], location: bool = False) float [source]¶
Calculates the
mask
volume given its spatialspacing
.- Parameters
mask –
spacing – each value represents the spacing for the corresponding axis. If float - the values are uniformly spaced along this axis. If Sequence[float] - the values are non-uniformly spaced.
location – whether to interpret the Sequence[float] in
spacing
as values’ locations or spacings. IfTrue
- the deltas are used as spacings.
Shape utils¶
- dpipe.im.shape_utils.extract_dims(array, ndim=1)[source]¶
Decrease the dimensionality of
array
by extractingndim
leading singleton dimensions.
- dpipe.im.shape_utils.prepend_dims(array, ndim=1)[source]¶
Increase the dimensionality of
array
by addingndim
leading singleton dimensions.
- dpipe.im.shape_utils.append_dims(array, ndim=1)[source]¶
Increase the dimensionality of
array
by addingndim
singleton dimensions to the end of its shape.
- dpipe.im.shape_utils.insert_dims(array, index=0, ndim=1)[source]¶
Increase the dimensionality of
array
by addingndim
singleton dimensions before the specified ``index` of its shape.
- dpipe.im.shape_utils.shape_after_convolution(shape: Union[int, Sequence[int]], kernel_size: Union[int, Sequence[int]], stride: Union[int, Sequence[int]] = 1, padding: Union[int, Sequence[int]] = 0, dilation: Union[int, Sequence[int]] = 1, valid: bool = True) tuple [source]¶
Get the shape of a tensor after applying a convolution with corresponding parameters.
- dpipe.im.shape_utils.shape_after_full_convolution(shape: Union[int, Sequence[int]], kernel_size: Union[int, Sequence[int]], axis: Optional[Union[int, Sequence[int]]] = None, stride: Union[int, Sequence[int]] = 1, padding: Union[int, Sequence[int]] = 0, dilation: Union[int, Sequence[int]] = 1, valid: bool = True) tuple [source]¶
Get the shape of a tensor after applying a convolution with corresponding parameters along the given axes. The dimensions along the remaining axes will become singleton.
Input/Output¶
Input/Output operations.
All the loading functions have the interface load(path, **kwargs)
where kwargs
are loader-specific keyword arguments.
Similarly, all the saving functions have the interface save(value, path, **kwargs)
.
- class dpipe.io.ConsoleArguments[source]¶
Bases:
object
A class that simplifies access to console arguments.
- dpipe.io.load_or_create(path: ~typing.Union[~pathlib.Path, str], create: ~typing.Callable, *args, save: ~typing.Callable = <function save>, load: ~typing.Callable = <function load>, **kwargs)[source]¶
load
a file frompath
if it exists. Otherwisecreate
the value,save
it topath
, and return it.args
andkwargs
are passed tocreate
as additional arguments.
- dpipe.io.choose_existing(*paths: Union[Path, str]) Path [source]¶
Returns the first existing path from a list of
paths
.
- dpipe.io.load(path: Union[Path, str], ext: Optional[str] = None, **kwargs)[source]¶
Load a file located at
path
.kwargs
are format-specific keyword arguments.- The following extensions are supported:
npy, tif, png, jpg, bmp, hdr, img, csv, dcm, nii, nii.gz, json, mhd, csv, txt, pickle, pkl, config
- dpipe.io.save(value, path: Union[Path, str], **kwargs)[source]¶
Save
value
to a file located atpath
.kwargs
are format-specific keyword arguments.- The following extensions are supported:
npy, npy.gz, tif, png, jpg, bmp, hdr, img, csv nii, nii.gz, json, mhd, csv, txt, pickle, pkl
- dpipe.io.save_json(value, path: Union[Path, str], *, indent: Optional[int] = None)[source]¶
Dump a json-serializable object to a json file.
- dpipe.io.load_numpy(path: Union[Path, str], *, allow_pickle: bool = True, fix_imports: bool = True, decompress: bool = False)[source]¶
A wrapper around
np.load
withallow_pickle
set to True by default.
- dpipe.io.save_numpy(value, path: Union[Path, str], *, allow_pickle: bool = True, fix_imports: bool = True, compression: Optional[int] = None, timestamp: Optional[int] = None)[source]¶
A wrapper around
np.save
that matches the interfacesave(what, where)
.
Training¶
Checkpoints¶
- class dpipe.train.checkpoint.Checkpoints(base_path: Union[Path, str], objects: Iterable, frequency: Optional[int] = None)[source]¶
Bases:
object
Saves the most recent iteration to
base_path
and removes the previous one.- Parameters
base_path (str) – path to save/restore checkpoint object in/from.
objects (Dict[PathLike, Any]) – objects to save. Each key-value pair represents the path relative to
base_path
and the corresponding object.frequency (int) – the frequency with which the objects are stored. By default only the latest checkpoint is saved.
- dpipe.train.checkpoint.CheckpointManager¶
alias of
Checkpoints
Policies¶
- class dpipe.train.policy.Policy[source]¶
Bases:
object
Interface for various policies.
- epoch_started(epoch: int)[source]¶
Update the policy before an epoch will start. The epochs numeration starts at zero.
- train_step_started(epoch: int, iteration: int)[source]¶
Update the policy before a new train step.
iteration
denotes the iteration index inside the current epoch. The epochs and iterations numeration starts at zero.
- train_step_finished(epoch: int, iteration: int, loss: Any)[source]¶
Update the policy after a train step.
iteration
denotes the iteration index inside the current epoch.loss
is the value returned by the last train step. The epochs and iterations numeration starts at zero.
- validation_started(epoch: int, train_losses: Sequence)[source]¶
Update the policy after the batch iterator was depleted. The epochs numeration starts at zero.
The history of
train_losses
andmetrics
from the entireepoch
is provided as additional information.
- epoch_finished(epoch: int, train_losses: Sequence, metrics: Optional[dict] = None, policies: Optional[dict] = None)[source]¶
Update the policy after an epoch is finished. The epochs numeration starts at zero.
The history of
train_losses
andmetrics
andpolicies
from the entireepoch
is provided as additional information.
- class dpipe.train.policy.ValuePolicy(initial)[source]¶
Bases:
Policy
Interface for policies that have a
value
which changes over time.- value¶
- Type
the current value carried by the policy.
- dpipe.train.policy.Constant¶
alias of
ValuePolicy
- class dpipe.train.policy.DecreasingOnPlateau(*, initial: float, multiplier: float, patience: int, rtol, atol)[source]¶
Bases:
ValuePolicy
Policy that traces average train loss and if it didn’t decrease according to
atol
orrtol
forpatience
epochs, multiplyvalue
bymultiplier
.atol
:- absolute tolerance for detecting change in training loss value.rtol
:- relative tolerance for detecting change in training loss value.
- class dpipe.train.policy.Exponential(initial: float, multiplier: float, step_length: int = 1, floordiv: bool = True, min_value: float = -inf, max_value: float = inf)[source]¶
Bases:
ValuePolicy
Exponentially change the
value
by a factor ofmultiplier
eachstep_length
epochs. Iffloordiv
is False - thevalue
will be changed continuously.
- class dpipe.train.policy.Schedule(initial: float, epoch2value_multiplier: Dict[int, float])[source]¶
Bases:
ValuePolicy
Multiply
value
by multipliers given byepoch2value_multiplier
at corresponding epochs.
- class dpipe.train.policy.Switch(initial: float, epoch_to_value: Dict[int, Any])[source]¶
Bases:
ValuePolicy
Changes the
value
at specific epochs to the values given inepoch_to_value
.
- class dpipe.train.policy.LambdaEpoch(func: Callable, *args, **kwargs)[source]¶
Bases:
ValuePolicy
Use the passed function to calculate the
value
for the current epoch (starting with 0).
Logging¶
Validation¶
Batch iterators¶
Tools for creating batch iterators. See the Batch iterators tutorial for more details.
Pipeline¶
- class dpipe.batch_iter.pipeline.Infinite(source: ~typing.Iterable, *transformers: ~typing.Union[~typing.Callable, ~dpipe.batch_iter.pipeline.Transform], batch_size: ~typing.Union[int, ~typing.Callable], batches_per_epoch: int, buffer_size: int = 1, combiner: ~typing.Callable = <function combine_to_arrays>, **kwargs)[source]¶
Bases:
object
Combine
source
andtransformers
into a batch iterator that yields batches of sizebatch_size
.- Parameters
source (Iterable) – an infinite iterable.
transformers (Callable) – the callable that transforms the objects generated by the previous element of the pipeline.
batch_size (int, Callable) – the size of batch.
batches_per_epoch (int) – the number of batches to yield each epoch.
buffer_size (int) – the number of objects to keep buffered in each pipeline element. Default is 1.
combiner (Callable) – combines chunks of single batches in multiple batches, e.g. combiner([(x, y), (x, y)]) -> ([x, x], [y, y]). Default is
combine_to_arrays
.kwargs – additional keyword arguments passed to the
combiner
.
References
See the Batch iterators tutorial for more details.
- property closing_callback¶
A callback to make this interface compatible with
Lightning
which allows for a safe release of resourcesExamples
>>> batch_iter = Infinite(...) >>> trainer = Trainer(callbacks=[batch_iter.closing_callback, ...])
- class dpipe.batch_iter.pipeline.Threads(func: Callable, *args, n_workers: int = 1, buffer_size: int = 1, **kwargs)[source]¶
Bases:
Iterator
Apply
func
concurrently to each object in the batch iterator by moving it ton_workers
threads.- Parameters
transform (Callable(Iterable) -> Iterable) – a function that takes an iterable and yields transformed values.
n_workers (int) – the number of threads to which
transform
will be moved.buffer_size (int) – the number of objects to keep buffered.
args – additional positional arguments passed to
transform
.kwargs – additional keyword arguments passed to
transform
.
References
See the Batch iterators tutorial for more details.
- class dpipe.batch_iter.pipeline.Loky(func: Callable, *args, n_workers: int = 1, buffer_size: int = 1, **kwargs)[source]¶
Bases:
Transform
Apply
func
concurrently to each object in the batch iterator by moving it ton_workers
processes.- Parameters
transform (Callable(Iterable) -> Iterable) – a function that takes an iterable and yields transformed values.
n_workers (int) – the number of threads to which
transform
will be moved.buffer_size (int) – the number of objects to keep buffered.
args – additional positional arguments passed to
transform
.kwargs – additional keyword arguments passed to
transform
.
Notes
Process-based parallelism is implemented with the
loky
backend.References
See the Batch iterators tutorial for more details.
- class dpipe.batch_iter.pipeline.Iterator(transform: Callable, *args, n_workers: int = 1, buffer_size: int = 1, **kwargs)[source]¶
Bases:
Transform
Apply
transform
to the iterator of values that flow through the batch iterator.- Parameters
transform (Callable(Iterable) -> Iterable) – a function that takes an iterable and yields transformed values.
n_workers (int) – the number of threads to which
transform
will be moved.buffer_size (int) – the number of objects to keep buffered.
args – additional positional arguments passed to
transform
.kwargs – additional keyword arguments passed to
transform
.
References
See the Batch iterators tutorial for more details.
- dpipe.batch_iter.pipeline.combine_batches(inputs)[source]¶
Combines tuples from
inputs
into batches: [(x, y), (x, y)] -> [(x, x), (y, y)]
- dpipe.batch_iter.pipeline.combine_to_arrays(inputs)[source]¶
Combines tuples from
inputs
into batches of numpy arrays.
- dpipe.batch_iter.pipeline.combine_pad(inputs, padding_values: Union[float, Sequence[float]] = 0, ratio: Union[float, Sequence[float]] = 0.5)[source]¶
Combines tuples from
inputs
into batches and pads each batch in order to obtain a correctly shaped numpy array.- Parameters
inputs –
padding_values – values to pad with. If Callable (e.g.
numpy.min
) -padding_values(x)
will be used.ratio – the fraction of the padding that will be applied to the left,
1.0 - ratio
will be applied to the right. By default0.5 - ratio
, it is applied uniformly to the left and right.
References
Sources¶
- dpipe.batch_iter.sources.sample(sequence: Sequence, weights: Optional[Sequence[float]] = None, random_state: Optional[Union[RandomState, int]] = None)[source]¶
Infinitely yield samples from
sequence
according toweights
.- Parameters
sequence (Sequence) – the sequence of elements to sample from.
weights (Sequence[float], None, optional) – the weights associated with each element. If
None
, the weights are assumed to be equal. Should be the same size assequence
.random_state (int, np.random.RandomState, None, optional) – if not
None
, used to set the random seed for reproducibility reasons.
- dpipe.batch_iter.sources.load_by_random_id(*loaders: Callable, ids: Sequence, weights: Optional[Sequence[float]] = None, random_state: Optional[Union[RandomState, int]] = None)[source]¶
Infinitely yield objects loaded by
loaders
according to the identifier fromids
. The identifiers are randomly sampled fromids
according to theweights
.- Parameters
loaders (Callable) – function, which loads object by its id.
ids (Sequence) – the sequence of identifiers to sample from.
weights (Sequence[float], None, optional) – The weights associated with each id. If
None
, the weights are assumed to be equal. Should be the same size asids
.random_state (int, np.random.RandomState, None, optional) – if not
None
, used to set the random seed for reproducibility reasons.
Blocks¶
- class dpipe.batch_iter.expiration_pool.ExpirationPool(pool_size: int, repetitions: int, iterations: int = 1)[source]¶
Bases:
Iterator
A simple expiration pool for time consuming operations that don’t fit into RAM. See
expiration_pool
for details.Examples
>>> batch_iter = Infinite( # ... some expensive operations, e.g. loading from disk, or preprocessing ExpirationPool(pool_size, repetitions), # ... here are the values from pool # ... other lightweight operations # ... )
- dpipe.batch_iter.expiration_pool.expiration_pool(iterable: Iterable, pool_size: int, repetitions: int, iterations: int = 1)[source]¶
Caches
pool_size
items fromiterable
. The item is removed from cache after it was generatedrepetitions
times. After an item is removed, a new one is extracted from theiterable
. Finally,iterations
controls how many values are generated after a new value is added, thus speeding up the pipeline at early stages.
Utils¶
- dpipe.batch_iter.utils.pad_batch_equal(batch, padding_values: Union[float, Sequence[float]] = 0, ratio: Union[float, Sequence[float]] = 0.5)[source]¶
Pad each element of
batch
to obtain a correctly shaped array.References
- dpipe.batch_iter.utils.unpack_args(func: Callable, *args, **kwargs)[source]¶
Returns a function that takes an iterable and unpacks it while calling
func
.args
andkwargs
are passed tofunc
as additional arguments.Examples
>>> def add(x, y): >>> return x + y >>> >>> add_ = unpack_args(add) >>> add(1, 2) == add_([1, 2]) >>> True
- dpipe.batch_iter.utils.multiply(func: Callable, *args, **kwargs)[source]¶
Returns a function that takes an iterable and maps
func
over it. Useful when multiple batches require the same function.args
andkwargs
are passed tofunc
as additional arguments.
- dpipe.batch_iter.utils.apply_at(index: Union[int, Sequence[int]], func: Callable, *args, **kwargs)[source]¶
Returns a function that takes an iterable and applies
func
to the values at the correspondingindex
.args
andkwargs
are passed tofunc
as additional arguments.Examples
>>> first_sqr = apply_at(0, np.square) >>> first_sqr([3, 2, 1]) >>> (9, 2, 1)
- dpipe.batch_iter.utils.zip_apply(*functions: Callable, **kwargs)[source]¶
Returns a function that takes an iterable and zips
functions
over it.kwargs
are passed to each function as additional arguments.Examples
>>> zipper = zip_apply(np.square, np.sqrt) >>> zipper([4, 9]) >>> (16, 3)
- dpipe.batch_iter.utils.random_apply(p: float, func: Callable, *args, **kwargs)[source]¶
Returns a function that applies
func
with a given probabilityp
.args
andkwargs
are passed tofunc
as additional arguments.
- dpipe.batch_iter.utils.sample_args(func: Callable, *args: Callable, **kwargs: Callable)[source]¶
Returns a function that samples arguments for
func
fromargs
andkwargs
.Each argument in
args
andkwargs
must be a callable that samples a random value.Examples
>>> from scipy.ndimage import rotate >>> >>> random_rotate = sample_args(rotate, angle=np.random.normal) >>> random_rotate(x) >>> # same as >>> rotate(x, angle=np.random.normal())
Prediction¶
Various functions for prediction with neural networks. See the Predict tutorial for more details.
Predictors¶
Ready-to-use predictors.
- dpipe.predict.shape.add_extract_dims(n_add: int = 1, n_extract: Optional[int] = None, sequence: bool = False)[source]¶
Adds
n_add
dimensions before a prediction and extractsn_extract
dimensions after this prediction.- Parameters
n_add (int) – number of dimensions to add.
n_extract (int, None, optional) – number of dimensions to extract. If
None
, extracts the same number of dimensions as were added (n_add
).sequence – if True - the output is expected to be a sequence, and the dims are extracted for each element of the sequence.
- dpipe.predict.shape.divisible_shape(divisor: Union[int, Sequence[int]], axis: Optional[Union[int, Sequence[int]]] = None, padding_values: Union[float, Sequence[float], Callable] = 0, ratio: Union[float, Sequence[float]] = 0.5)[source]¶
Pads an incoming array to be divisible by
divisor
along theaxes
. Afterwards the padding is removed.- Parameters
divisor – a value an incoming array should be divisible by.
axis – axes along which the array will be padded. If None - the last
len(divisor)
axes are used.padding_values – values to pad with. If Callable (e.g.
numpy.min
) -padding_values(x)
will be used.ratio – the fraction of the padding that will be applied to the left,
1 - ratio
will be applied to the right.
References
- dpipe.predict.shape.patches_grid(patch_size: ~typing.Union[int, ~typing.Sequence[int]], stride: ~typing.Union[int, ~typing.Sequence[int]], axis: ~typing.Optional[~typing.Union[int, ~typing.Sequence[int]]] = None, padding_values: ~typing.Union[float, ~typing.Sequence[float], ~typing.Callable] = 0, ratio: ~typing.Union[float, ~typing.Sequence[float]] = 0.5, combiner: ~typing.Type[~dpipe.im.grid.PatchCombiner] = <class 'dpipe.im.grid.Average'>, get_boxes: ~typing.Callable = <function get_boxes>)[source]¶
Divide an incoming array into patches of corresponding
patch_size
andstride
and then combine the predicted patches by aggregating the overlapping regions using thecombiner
- Average by default.If
padding_values
is not None, the array will be padded to an appropriate shape to make a valid division. Afterwards the padding is removed. Otherwise if input cannot be patched without remainderValueError
is raised.References
Functions¶
Various functions that can be used to build predictors.
- dpipe.predict.functional.chain_decorators(*decorators: Callable, predict: Callable, **kwargs)[source]¶
Wraps
predict
into a series ofdecorators
.kwargs
are passed as additional arguments topredict
.Examples
>>> @decorator1 >>> @decorator2 >>> def f(x): >>> return x + 1 >>> # same as: >>> def f(x): >>> return x + 1 >>> >>> f = chain_decorators(decorator1, decorator2, predict=f)
- dpipe.predict.functional.preprocess(func, *args, **kwargs)[source]¶
Applies function
func
with given parameters before making a prediction.Examples
>>> from dpipe.im.shape_ops import pad >>> from dpipe.predict.functional import preprocess >>> >>> @preprocess(pad, padding=[10, 10, 10], padding_values=np.min) >>> def predict(x): >>> return model.do_inf_step(x) performs spatial padding before prediction.
References
NN Layers¶
Residual Blocks¶
- class dpipe.layers.resblock.ResBlock(*args: Any, **kwargs: Any)[source]¶
Bases:
Module
Performs a sequence of two convolutions with residual connection (Residual Block).
- Parameters
in_channels (int) – the number of incoming channels.
out_channels (int) – the number of the
ResBlock
output channels. Note, ifin_channels
!=out_channels
, then linear transform will be applied to the shortcut.kernel_size (int, tuple) – size of the convolving kernel.
stride (int, tuple, optional) – stride of the convolution. Default is 1. Note, if stride is greater than 1, then linear transform will be applied to the shortcut.
padding (int, tuple, optional) – zero-padding added to all spatial sides of the input. Default is 0.
dilation (int, tuple, optional) – spacing between kernel elements. Default is 1.
bias (bool) – if
True
, adds a learnable bias to the output. Default isFalse
.activation_module (None, nn.Module, optional) – module to build up activation layer. Default is
torch.nn.ReLU
.conv_module (nn.Module) – module to build up convolution layer with given parameters, e.g.
torch.nn.Conv3d
.batch_norm_module (nn.Module) – module to build up batch normalization layer, e.g.
torch.nn.BatchNorm3d
.kwargs – additional arguments passed to
conv_module
.
FPN¶
- class dpipe.layers.fpn.FPN(*args: Any, **kwargs: Any)[source]¶
Bases:
Module
Feature Pyramid Network - a generalization of UNet.
- Parameters
layer (Callable) – the structural block of each level, e.g.
torch.nn.Conv2d
.downsample (nn.Module) – the downsampling layer, e.g.
torch.nn.MaxPool2d
.upsample (nn.Module) – the upsampling layer, e.g.
torch.nn.Upsample
.merge (Callable(left, down)) – a function that merges the upsampled features map with the one coming from the left branch, e.g.
torch.add
.structure (Sequence[Union[Sequence[int], nn.Module]]) – a collection of channels sequences, see Examples section for details.
last_level (bool) – If True only the result of the last level is returned (as in UNet), otherwise the results from all levels are returned (as in FPN).
kwargs – additional arguments passed to
layer
.
Examples
>>> from dpipe.layers import ResBlock2d >>> >>> structure = [ >>> [[16, 16, 16], [16, 16, 16]], # level 1, left and right >>> [[16, 32, 32], [32, 32, 16]], # level 2, left and right >>> [32, 64, 32] # final level >>> ] >>> >>> upsample = nn.Upsample(scale_factor=2, mode='bilinear') >>> downsample = nn.MaxPool2d(kernel_size=2) >>> >>> ResUNet = FPN( >>> ResBlock2d, downsample, upsample, torch.add, >>> structure, kernel_size=3, dilation=1, padding=1, last_level=True >>> )
References
Structure¶
- dpipe.layers.structure.make_consistent_seq(layer: Callable, channels: Sequence[int], *args, **kwargs)[source]¶
Builds a sequence of layers that have consistent input and output channels/features.
args
andkwargs
are passed as additional parameters.Examples
>>> make_consistent_seq(nn.Conv2d, [16, 32, 64, 128], kernel_size=3, padding=1) >>> # same as >>> nn.Sequential( >>> nn.Conv2d(16, 32, kernel_size=3, padding=1), >>> nn.Conv2d(32, 64, kernel_size=3, padding=1), >>> nn.Conv2d(64, 128, kernel_size=3, padding=1), >>> )
- class dpipe.layers.structure.ConsistentSequential(*args: Any, **kwargs: Any)[source]¶
Bases:
Sequential
A sequence of layers that have consistent input and output channels/features.
args
andkwargs
are passed as additional parameters.Examples
>>> ConsistentSequential(nn.Conv2d, [16, 32, 64, 128], kernel_size=3, padding=1) >>> # same as >>> nn.Sequential( >>> nn.Conv2d(16, 32, kernel_size=3, padding=1), >>> nn.Conv2d(32, 64, kernel_size=3, padding=1), >>> nn.Conv2d(64, 128, kernel_size=3, padding=1), >>> )
- class dpipe.layers.structure.PreActivation(*args: Any, **kwargs: Any)[source]¶
Bases:
Module
Runs a sequence of batch_norm, activation, and
layer
.in -> (BN -> activation -> layer) -> out
- Parameters
in_features (int) – the number of incoming features/channels.
out_features (int) – the number of the output features/channels.
batch_norm_module – module to build up batch normalization layer, e.g.
torch.nn.BatchNorm3d
.activation_module – module to build up activation layer. Default is
torch.nn.ReLU
.layer_module (Callable(in_features, out_features, **kwargs)) – module to build up the main layer, e.g.
torch.nn.Conv3d
ortorch.nn.Linear
.kwargs – additional arguments passed to
layer_module
.
- class dpipe.layers.structure.PostActivation(*args: Any, **kwargs: Any)[source]¶
Bases:
Module
Performs a sequence of layer, batch_norm and activation:
in -> (layer -> BN -> activation) -> out
- Parameters
in_features (int) – the number of incoming features/channels.
out_features (int) – the number of the output features/channels.
batch_norm_module – module to build up batch normalization layer, e.g.
torch.nn.BatchNorm3d
.activation_module – module to build up activation layer. Default is
torch.nn.ReLU
.layer_module (Callable(in_features, out_features, **kwargs)) – module to build up the main layer, e.g.
torch.nn.Conv3d
ortorch.nn.Linear
.kwargs – additional arguments passed to
layer_module
.
Notes
If
layer
supports a bias term, make sure to passbias=False
.
- class dpipe.layers.structure.Lambda(*args: Any, **kwargs: Any)[source]¶
Bases:
Module
Applies
func
to the incoming tensor.kwargs
are passed as additional arguments.
- class dpipe.layers.conv.PreActivationND(*args: Any, **kwargs: Any)[source]¶
Bases:
PreActivation
Performs a sequence of batch_norm, activation, and convolution
in -> (BN -> activation -> Conv) -> out
- Parameters
in_channels (int) – the number of incoming channels.
out_channels (int) – the number of the
PreActivation
output channels.kernel_size (int, tuple) – size of the convolving kernel.
stride (int, tuple, optional) – stride of the convolution. Default is 1.
padding (int, tuple, optional) – zero-padding added to all spatial sides of the input. Default is 0.
dilation (int, tuple, optional) – spacing between kernel elements. Default is 1.
groups (int, optional) – number of blocked connections from input channels to output channels. Default is 1.
bias (bool) – if
True
, adds a learnable bias to the output. Default isFalse
batch_norm_module (nn.Module) – module to build up batch normalization layer, e.g.
torch.nn.BatchNorm3d
.activation_module (nn.Module) – module to build up activation layer. Default is
torch.nn.ReLU
.conv_module (nn.Module) – module to build up convolution layer with given parameters, e.g.
torch.nn.Conv3d
.kwargs – additional arguments passed to
layer_module
- class dpipe.layers.conv.PostActivationND(*args: Any, **kwargs: Any)[source]¶
Bases:
PostActivation
Performs a sequence of convolution, batch_norm and activation:
in -> (Conv -> BN -> activation) -> out
- Parameters
in_channels (int) – the number of incoming channels.
out_channels (int) – the number of the
PostActivation
output channels.kernel_size (int, tuple) – size of the convolving kernel.
stride (int, tuple, optional) – stride of the convolution. Default is 1.
padding (int, tuple, optional) – zero-padding added to all spatial sides of the input. Default is 0.
dilation (int, tuple, optional) – spacing between kernel elements. Default is 1.
groups (int, optional) – number of blocked connections from input channels to output channels. Default is 1.
batch_norm_module (nn.Module) – module to build up batch normalization layer, e.g.
torch.nn.BatchNorm3d
.activation_module (nn.Module) – module to build up activation layer. Default is
torch.nn.ReLU
.conv_module (nn.Module) – module to build up convolution layer with given parameters, e.g.
torch.nn.Conv3d
.kwargs – additional arguments passed to
layer_module
Shape Operations¶
- class dpipe.layers.shape.InterpolateToInput(*args: Any, **kwargs: Any)[source]¶
Bases:
Module
Interpolates the result of
path
to the original shape along the spatialaxis
.- Parameters
path (nn.Module) – arbitrary neural network module to calculate the result.
mode (str) – algorithm used for upsampling. Should be one of ‘nearest’ | ‘linear’ | ‘bilinear’ | ‘trilinear’ | ‘area’. Default is ‘nearest’.
axis (AxesLike, None, optional) – spatial axes to interpolate result along. If
axes
isNone
, the result is interpolated along all the spatial axes.
- class dpipe.layers.shape.Reshape(*args: Any, **kwargs: Any)[source]¶
Bases:
Module
Reshape the incoming tensor to the given
shape
.- Parameters
shape (Union[int, str]) – the resulting shape. String values denote indices in the input tensor’s shape.
Examples
>>> layer = Reshape('0', '1', 500, 500) >>> layer(x) >>> # same as >>> x.reshape(x.shape[0], x.shape[1], 500, 500)
- class dpipe.layers.shape.Softmax(*args: Any, **kwargs: Any)[source]¶
Bases:
Module
A multidimensional version of softmax.
- class dpipe.layers.shape.PyramidPooling(*args: Any, **kwargs: Any)[source]¶
Bases:
Module
Implements the pyramid pooling operation.
- Parameters
pooling (Callable) – the pooling to be applied, e.g.
torch.nn.functional.max_pool2d
.levels (int) – the number of pyramid levels, default is 1 which is the global pooling operation.
PyTorch Wrappers¶
Training and inference¶
- dpipe.torch.model.optimizer_step(optimizer: torch.optim.Optimizer, loss: torch.Tensor, scaler: Optional[torch.cuda.amp.GradScaler] = None, clip_grad: Optional[float] = None, accumulate: bool = False, **params) torch.Tensor [source]¶
Performs the backward pass with respect to
loss
, as well as a gradient step or gradient accumlation.If a
scaler
is passed - it is used to perform the gradient step (automatic mixed precision support). If aclip_grad
is passed - gradient will be clipped by this value considered as maximum l2 norm.accumulate
indicates whether to perform gradient step or just accumulate gradients.params
is used to change the optimizer’s parameters.Examples
>>> optimizer = Adam(model.parameters(), lr=1) >>> optimizer_step(optimizer, loss) # perform a gradient step >>> optimizer_step(optimizer, loss, lr=1e-3) # set lr to 1e-3 and perform a gradient step >>> optimizer_step(optimizer, loss, betas=(0, 0)) # set betas to 0 and perform a gradient step >>> optimizer_step(optimizer, loss, accumulate=True) # perform a gradient accumulation
Notes
The incoming
optimizer
’s parameters are not restored to their original values.
- dpipe.torch.model.train_step(*inputs: ndarray, architecture: torch.nn.Module, criterion: Callable, optimizer: torch.optim.Optimizer, n_targets: int = 1, loss_key: Optional[str] = None, scaler: Optional[torch.cuda.amp.GradScaler] = None, clip_grad: Optional[float] = None, accumulate: bool = False, gradient_accumulation_steps: int = 1, **optimizer_params) ndarray [source]¶
Performs a forward-backward pass, and make a gradient step or accumulation, according to the given
inputs
.- Parameters
inputs – inputs batches. The last
n_targets
batches are passed tocriterion
. The remaining batches are fed into thearchitecture
.architecture – the neural network architecture.
criterion – the loss function. Returns either a scalar or a dictionary of scalars. In the latter case
loss_key
must be provided.optimizer –
n_targets – how many values from
inputs
to be considered as targets.loss_key – in case
criterion
returns a dictionary of scalars, indicates which key should be used for gradient computation.scaler – a gradient scaler used to operate in automatic mixed precision mode.
clip_grad – maximum l2 norm of the gradient to clip it by.
accumulate – whether to accumulate gradients or perform optimizer step.
gradient_accumulation_steps –
optimizer_params – additional parameters that will override the optimizer’s current parameters (e.g. lr).
Notes
Note that both input and output are not of type
torch.Tensor
- the conversion to and fromtorch.Tensor
is made inside this function.References
- dpipe.torch.model.inference_step(*inputs: ~numpy.ndarray, architecture: torch.nn.Module, activation: ~typing.Callable = <function identity>, amp: bool = False) ndarray [source]¶
Returns the prediction for the given
inputs
.Notes
Note that both input and output are not of type
torch.Tensor
- the conversion to and fromtorch.Tensor
is made inside this function. Inputs will be converted to fp16 ifamp
is True.
- dpipe.torch.model.multi_inference_step(*inputs: ~numpy.ndarray, architecture: torch.nn.Module, activations: ~typing.Union[~typing.Callable, ~typing.Sequence[~typing.Optional[~typing.Callable]]] = <function identity>, amp: bool = False) list [source]¶
Returns the prediction for the given
inputs
.The
architecture
is expected to return a sequence of torch.Tensor objects.Notes
Note that both input and output are not of type
torch.Tensor
- the conversion to and fromtorch.Tensor
is made inside this function. Inputs will be converted to fp16 ifamp
is True.
Loss functions¶
- dpipe.torch.functional.focal_loss_with_logits(logits: torch.Tensor, target: torch.Tensor, weight: Optional[torch.Tensor] = None, gamma: float = 2, alpha: float = 0.25, reduce: Optional[Callable] = torch.mean)[source]¶
Function that measures Focal Loss between target and output logits.
- Parameters
logits (torch.Tensor) – tensor of an arbitrary shape.
target (torch.Tensor) – tensor of the same shape as
logits
.weight (torch.Tensor, None, optional) – a manual rescaling weight. Must be broadcastable to
logits
.gamma (float) – the power of focal loss factor. Defaults to 2.
alpha (float, None, optional) – weighting factor of the focal loss. If
None
, no weighting will be performed. Defaults to 0.25.reduce (Callable, None, optional) – the reduction operation to be applied to the final loss. Defaults to
torch.mean
. IfNone
, no reduction will be performed.
References
- dpipe.torch.functional.linear_focal_loss_with_logits(logits: torch.Tensor, target: torch.Tensor, gamma: float, beta: float, weight: Optional[torch.Tensor] = None, reduce: Optional[Callable] = torch.mean)[source]¶
Function that measures Linear Focal Loss between target and output logits. Equals to BinaryCrossEntropy(
gamma
*logits
+beta
,target
,weights
).- Parameters
logits (torch.Tensor) – tensor of an arbitrary shape.
target (torch.Tensor) – tensor of the same shape as
logits
.gamma (float) – multiplication coefficient for
logits
tensor.beta (float) – coefficient to be added to all the elements in
logits
tensor.weight (torch.Tensor) – a manual rescaling weight. Must be broadcastable to
logits
.reduce (Callable, None, optional) – the reduction operation to be applied to the final loss. Defaults to
torch.mean
. If None - no reduction will be performed.
References
- dpipe.torch.functional.weighted_cross_entropy_with_logits(logit: torch.Tensor, target: torch.Tensor, weight: Optional[torch.Tensor] = None, alpha: float = 1, adaptive: bool = False, reduce: Optional[Callable] = torch.mean)[source]¶
Function that measures Binary Cross Entropy between target and output logits. This version of BCE has additional options of constant or adaptive weighting of positive examples.
- Parameters
logit (torch.Tensor) – tensor of an arbitrary shape.
target (torch.Tensor) – tensor of the same shape as
logits
.weight (torch.Tensor) – a manual rescaling weight. Must be broadcastable to
logits
.alpha (float, optional) – a weight for the positive class examples.
adaptive (bool, optional) – If
True
, uses adaptive weight[N - sum(p_i)] / sum(p_i)
for a positive class examples.reduce (Callable, None, optional) – the reduction operation to be applied to the final loss. Defaults to
torch.mean
. If None - no reduction will be performed.
References
- dpipe.torch.functional.tversky_loss(pred: torch.Tensor, target: torch.Tensor, alpha=0.5, epsilon=1e-07, reduce: Optional[Callable] = torch.mean)[source]¶
References
- dpipe.torch.functional.focal_tversky_loss(pred: torch.Tensor, target: torch.Tensor, gamma=1.3333333333333333, alpha=0.5, epsilon=1e-07)[source]¶
References
- dpipe.torch.functional.dice_loss(pred: torch.Tensor, target: torch.Tensor, epsilon=1e-07)[source]¶
References
- dpipe.torch.functional.masked_loss(mask: torch.Tensor, criterion: Callable, prediction: torch.Tensor, target: torch.Tensor, **kwargs)[source]¶
Calculates the
criterion
between the maskedprediction
andtarget
.args
andkwargs
are passed tocriterion
as additional arguments.If the
mask
is empty - returns 0 wrapped in a torch tensor.
Utils¶
- dpipe.torch.utils.load_model_state(module: torch.nn.Module, path: Union[Path, str], modify_state_fn: Optional[Callable] = None, strict: bool = True)[source]¶
Updates the
module
’s state dict by the one located atpath
.- Parameters
module (nn.Module) –
path (PathLike) –
modify_state_fn (Callable(current_state, state_to_load)) – if not
None
, two arguments will be passed to the function: current state of the model and the state loaded from the path. This function should modify states as needed and return the final state to load. For example, it could help you to transfer weights from similar but not completely equal architecture.strict (bool) –
- dpipe.torch.utils.save_model_state(module: torch.nn.Module, path: Union[Path, str])[source]¶
Saves the
module
’s state dict topath
.
- dpipe.torch.utils.get_device(x: Optional[Union[torch.device, torch.nn.Module, torch.Tensor, str]] = None) torch.device [source]¶
Determines the correct device based on the input.
- Parameters
x (torch.device, torch.nn.Module, torch.Tensor, str, None) –
iftorch.Tensor
- returns the device on which it is locatediftorch.nn.Module
- returns the device on which its parameters are locatedifstr
ortorch.device
- returnstorch.device(x)
ifNone
- same as ‘cuda’ if CUDA is available, ‘cpu’ otherwise.
- dpipe.torch.utils.to_device(x: Union[torch.nn.Module, torch.Tensor], device: Optional[Union[torch.device, torch.nn.Module, torch.Tensor, str]] = 'cpu')[source]¶
Move
x
todevice
.- Parameters
x –
device – the device on which to move
x
. Seeget_device
for details.
- dpipe.torch.utils.to_cuda(x, cuda: Optional[Union[torch.nn.Module, torch.Tensor, bool]] = None)[source]¶
Move
x
to cuda if specified.- Parameters
x –
cuda – whether to move to cuda. If None, torch.cuda.is_available() is used to determine that.
- dpipe.torch.utils.to_var(*arrays: Union[Iterable, int, float], device: Union[torch.device, torch.nn.Module, torch.Tensor, str] = 'cpu', requires_grad: bool = False)[source]¶
Convert numpy arrays to torch Tensors.
- Parameters
arrays (array-like) – objects, that will be converted to torch Tensors.
device – the device on which to move
x
. Seeget_device
for details.requires_grad – whether the tensors require grad.
Notes
If
arrays
contains a single argument the result will not be contained in a tuple: >>> x = to_var(x) >>> x, y = to_var(x, y)If this is not the desired behaviour, use
sequence_to_var
, which always returns a tuple of tensors.
- dpipe.torch.utils.to_np(*tensors: torch.Tensor)[source]¶
Convert torch Tensors to numpy arrays.
Notes
If
tensors
contains a single argument the result will not be contained in a tuple: >>> x = to_np(x) >>> x, y = to_np(x, y)If this is not the desired behaviour, use
sequence_to_np
, which always returns a tuple of arrays.
- dpipe.torch.utils.set_params(optimizer: torch.optim.Optimizer, **params) torch.optim.Optimizer [source]¶
Change an
optimizer
’s parameters by the ones passed inparams
.
- dpipe.torch.utils.set_lr(optimizer: torch.optim.Optimizer, lr: float) torch.optim.Optimizer [source]¶
Change an
optimizer
’s learning rate tolr
.
- dpipe.torch.utils.get_parameters(optimizer: torch.optim.Optimizer) Iterator[torch.nn.parameter.Parameter] [source]¶
Returns an iterator over model parameters stored in
optimizer
.
Iterator utils¶
- dpipe.itertools.pam(functions: Iterable[Callable], *args, **kwargs)[source]¶
Inverse of
map
. Apply a sequence of callables to fixed arguments.Examples
>>> list(pam([np.sqrt, np.square, np.cbrt], 64)) [8, 4096, 4]
- dpipe.itertools.zip_equal(*args: Union[Sized, Iterable]) Iterable[Tuple] [source]¶
zip over the given iterables, but enforce that all of them exhaust simultaneously.
Examples
>>> zip_equal([1, 2, 3], [4, 5, 6]) # ok >>> zip_equal([1, 2, 3], [4, 5, 6, 7]) # raises ValueError # ValueError is raised even if the lengths are not known >>> zip_equal([1, 2, 3], map(np.sqrt, [4, 5, 6])) # ok >>> zip_equal([1, 2, 3], map(np.sqrt, [4, 5, 6, 7])) # raises ValueError
- dpipe.itertools.head_tail(iterable: Iterable) Tuple[Any, Iterable] [source]¶
Split the
iterable
into the first and the rest of the elements.Examples
>>> head, tail = head_tail(map(np.square, [1, 2, 3])) >>> head, list(tail) 1, [4, 9]
- dpipe.itertools.peek(iterable: Iterable) Tuple[Any, Iterable] [source]¶
Return the first element from
iterable
and the whole iterable.Notes
The incoming
iterable
might be mutated, use the returned iterable instead.Examples
>>> original_iterable = map(np.square, [1, 2, 3]) >>> head, iterable = peek(original_iterable) >>> head, list(iterable) 1, [1, 4, 9] # list(original_iterable) would return [4, 9]
- dpipe.itertools.lmap(func: Callable, *iterables: Iterable) list [source]¶
Composition of list and map.
- dpipe.itertools.pmap(func: Callable, iterable: Iterable, *args, **kwargs) Iterable [source]¶
Partial map. Maps
func
overiterable
usingargs
andkwargs
as additional arguments.
- dpipe.itertools.dmap(func: Callable, dictionary: dict, *args, **kwargs)[source]¶
Transform the
dictionary
by mappingfunc
over its values.args
andkwargs
are passed as additional arguments.Examples
>>> dmap(np.square, {'a': 1, 'b': 2}) {'a': 1, 'b': 4}
- dpipe.itertools.zdict(keys: Iterable, values: Iterable) dict [source]¶
Create a dictionary from
keys
andvalues
.
- dpipe.itertools.flatten(iterable: Iterable, iterable_types: Optional[Union[tuple, type]] = None) list [source]¶
Recursively flattens an
iterable
as long as it is an instance ofiterable_types
.Examples
>>> flatten([1, [2, 3], [[4]]]) [1, 2, 3, 4] >>> flatten([1, (2, 3), [[4]]]) [1, (2, 3), 4] >>> flatten([1, (2, 3), [[4]]], iterable_types=(list, tuple)) [1, 2, 3, 4]
- dpipe.itertools.filter_mask(iterable: Iterable, mask: Iterable[bool]) Iterable [source]¶
Filter values from
iterable
according tomask
.
- dpipe.itertools.extract(sequence: Sequence, indices: Iterable)[source]¶
Extract
indices
fromsequence
.
- dpipe.itertools.negate_indices(indices: Iterable, length: int)[source]¶
Return valid indices for a sequence of len
length
that are not present inindices
.
- dpipe.itertools.make_chunks(iterable: Iterable, chunk_size: int, incomplete: bool = True)[source]¶
Group
iterable
into chunks of sizechunk_size
.- Parameters
iterable –
chunk_size –
incomplete – whether to yield the last chunk in case it has a smaller size.
- dpipe.itertools.collect(func: Callable)[source]¶
Make a function that returns a list from a function that returns an iterator.
Examples
>>> @collect >>> def squares(n): >>> for i in range(n): >>> yield i ** 2 >>> >>> squares(3) [1, 4, 9]
- dpipe.itertools.stack(axis: int = 0, dtype: Optional[dtype] = None)[source]¶
Stack the values yielded by a generator function along a given
axis
.dtype
(if any) determines the data type of the resulting array.Examples
>>> @stack(1) >>> def consecutive(n): >>> for i in range(n): >>> yield i, i+1 >>> >>> consecutive(3) array([[0, 1, 2], [1, 2, 3]])
Commands¶
Contains a few more sophisticated commands that are usually accessed directly inside configs.
- dpipe.commands.populate(path: Union[Path, str], func: Callable, *args, **kwargs)[source]¶
Call
func
withargs
andkwargs
ifpath
doesn’t exist.Examples
>>> populate('metrics.json', save_metrics, targets, predictions) # if `metrics.json` doesn't exist, the following call will be performed: >>> save_metrics(targets, predictions)
- Raises
FileNotFoundError – if after calling
func
thepath
still doesn’t exist.:
- dpipe.commands.lock_dir(folder: Union[Path, str] = '.', lock: str = '.lock')[source]¶
Lock the given
folder
by generating a special lock file -lock
.- Raises
FileExistsError – if
lock
already exists, i.e. the folder is already locked.:
- dpipe.commands.load_from_folder(path: ~typing.Union[~pathlib.Path, str], loader=<function load>, ext='.npy')[source]¶
Yields (id, object) pairs loaded from
path
.
- dpipe.commands.map_ids_to_disk(func: ~typing.Callable[str, object], ids: ~typing.Iterable[str], output_path: str, exist_ok: bool = False, save: ~typing.Callable = <function save>, ext: str = '.npy')[source]¶
Apply
func
to each id fromids
and save each output tooutput_path
usingsave
. Ifexist_ok
is True the existing files will be ignored, otherwise an exception is raised.
- dpipe.commands.predict(ids, output_path, load_x, predict_fn, exist_ok=False, save: ~typing.Callable = <function save>, ext='.npy')[source]¶
Dataset¶
Datasets are used for data and metadata loading.
Interfaces¶
- class dpipe.dataset.base.Dataset(*args, **kwargs)[source]¶
Bases:
object
Interface for datasets.
Its subclasses must define the
ids
attribute - a tuple of identifiers, one for each dataset entry, as well as methods for loading an entry by its identifier.- ids¶
- Type
a tuple of identifiers, one for each dataset entry.
Helpers¶
- class dpipe.dataset.csv.CSV(path: ~typing.Union[~pathlib.Path, str], filename: str = 'meta.csv', index_col: str = 'id', loader: ~typing.Callable = <function load>)[source]¶
Bases:
Dataset
A small wrapper for dataframes that contain paths to data.
- Parameters
path (PathLike) – the path to the data.
filename (str) – the relative path to the csv dataframe. Default is
meta.csv
.index_col (str, None, optional) – the column that will be used as index. Must contain unique values. Default is
id
.loader (Callable) – the function to load an object by the path located in a corresponding dataset entry. Default is
load_by_ext
.
Wrappers¶
Wrappers change the dataset’s behaviour. See the Wrappers tutorial for more details.
- dpipe.dataset.wrappers.cache_methods(instance, methods: Optional[Iterable[str]] = None, maxsize: Optional[int] = None)[source]¶
Cache the
instance
’smethods
. Ifmethods
is None, all public methods will be cached.
- dpipe.dataset.wrappers.cache_methods_to_disk(instance, base_path: ~typing.Union[~pathlib.Path, str], loader: ~typing.Callable = <function load_numpy>, saver: ~typing.Callable = <function save_numpy>, **methods: str)[source]¶
Cache the
instance
’smethods
to disk.- Parameters
instance – arbitrary object
base_path (str) – the path, all other paths of
methods
relative to.methods (str) – each keyword argument has the form
method_name=path_to_cache
. The methods are assumed to take a single argument of typestr
.loader – loads a single object given its path.
saver (Callable(value, path)) – saves a single object to the given path.
- dpipe.dataset.wrappers.apply(instance, **methods: Callable)[source]¶
Applies a given function to the output of a given method.
- Parameters
instance – arbitrary object
methods (Callable) – each keyword argument has the form
method_name=func_to_apply
.func_to_apply
is applied to themethod_name
method.
Examples
>>> # normalize will be applied to the output of load_image >>> dataset = apply(base_dataset, load_image=normalize)
- dpipe.dataset.wrappers.set_attributes(instance, **attributes)[source]¶
Sets or overwrites attributes with those provided as keyword arguments.
- Parameters
instance – arbitrary object
attributes – each keyword argument has the form
attr_name=attr_value
.
- dpipe.dataset.wrappers.change_ids(dataset: Dataset, change_id: Callable, methods: Optional[Iterable[str]] = None) Dataset [source]¶
Change the
dataset
’s ids according to thechange_id
function and adapt the providedmethods
to work with the new ids.- Parameters
dataset (Dataset) – the dataset to perform ids changing on.
change_id (Callable(str) -> str) – the method which allows change ids. Output ids should be unique as well as old ids.
methods (Iterable[str]) – the list of methods to be adapted. Each method takes a single argument - the identifier.
- dpipe.dataset.wrappers.merge(*datasets: Dataset, methods: Optional[Sequence[str]] = None, attributes: Sequence[str] = ()) Dataset [source]¶
Merge several
datasets
into one by preserving the providedmethods
andattributes
.- Parameters
datasets (Dataset) – sequence of datasets.
methods (Sequence[str], None, optional) – the list of methods to be preserved. Each method should take an identifier as its first argument. If
None
, all the common methods will be preserved.attributes (Sequence[str]) – the list of attributes to be preserved. For each dataset their values should be the same. Default is the empty sequence
()
.
- dpipe.dataset.wrappers.apply_mask(dataset: Dataset, mask_modality_id: int = -1, mask_value: Optional[int] = None) Dataset [source]¶
Applies the
mask_modality_id
modality as the binary mask to the other modalities and remove the mask from sequence of modalities.- Parameters
dataset (Dataset) – dataset which is used in the current task.
mask_modality_id (int) – the index of mask in the sequence of modalities. Default is
-1
, which means the last modality will be used as the mask.mask_value (int, None, optional) – the value in the mask to filter other modalities with. If
None
, greater than zero filtering will be applied. Default isNone
.
Examples
>>> modalities = ['flair', 't1', 'brain_mask'] # we are to apply brain mask to other modalities >>> target = 'target' >>> >>> dataset = apply_mask( >>> dataset=Wmh2017( >>> data_path=data_path, >>> modalities=modalities, >>> target=target >>> ), >>> mask_modality_id=-1, >>> mask_value=1 >>> )
Tutorials¶
This section contains various tutorials generated from jupyter notebooks located here.
Batch iterators¶
Batch iterators are built using the following constructor:
from dpipe.batch_iter import Infinite
its only required argument is source
- an infinite iterable that
yields entries from your data.
We’ll build an example batch iterator that yields batches from the MNIST dataset:
from torchvision.datasets import MNIST
from pathlib import Path
import numpy as np
# download to ~/tests/MNIST, if necessary
dataset = MNIST(Path('~/tests/MNIST').expanduser(), transform=np.array, download=True)
Sampling¶
from dpipe.batch_iter import sample
# yield 10 batches of size 30 each epoch:
batch_iter = Infinite(
sample(dataset), # randomly sample from the dataset
batch_size=30, batches_per_epoch=10,
)
sample
infinitely yields data randomly sampled from the dataset:
for x, y in sample(dataset):
print(x.shape, y)
break
(28, 28) 7
We use infinite sources because our batch iterators are executed in a background thread, this allows us to use the resources more efficiently. For example, a new batch can be prepared while the network’s forward and backward passes are performed in the main thread.
Now we can simply iterate over batch_iter
:
# give 10 batches of size 30
for xs, ys in batch_iter():
print(xs.shape, ys.shape)
(30, 28, 28) (30,)
(30, 28, 28) (30,)
(30, 28, 28) (30,)
(30, 28, 28) (30,)
(30, 28, 28) (30,)
(30, 28, 28) (30,)
(30, 28, 28) (30,)
(30, 28, 28) (30,)
(30, 28, 28) (30,)
(30, 28, 28) (30,)
… and reuse it again:
# give another 10 batches of size 30
for xs, ys in batch_iter():
print(xs.shape, ys.shape)
(30, 28, 28) (30,)
(30, 28, 28) (30,)
(30, 28, 28) (30,)
(30, 28, 28) (30,)
(30, 28, 28) (30,)
(30, 28, 28) (30,)
(30, 28, 28) (30,)
(30, 28, 28) (30,)
(30, 28, 28) (30,)
(30, 28, 28) (30,)
After the training is over you must close the batch iterator in order to stop all the background processes:
batch_iter.close()
Or you can use it as a context manager:
batch_iter = Infinite(
sample(dataset),
batch_size=30, batches_per_epoch=10,
)
with batch_iter:
for xs, ys in batch_iter():
print(xs.shape, ys.shape)
(30, 28, 28) (30,)
(30, 28, 28) (30,)
(30, 28, 28) (30,)
(30, 28, 28) (30,)
(30, 28, 28) (30,)
(30, 28, 28) (30,)
(30, 28, 28) (30,)
(30, 28, 28) (30,)
(30, 28, 28) (30,)
(30, 28, 28) (30,)
Transformations¶
Let’s add more transformations to the data.
from dpipe.im import zoom
def zoom_image(pair):
image, label = pair
return zoom(image, scale_factor=[2, 2]), label
batch_iter = Infinite(
sample(dataset), # yields pairs
zoom_image, # zoom the images by a factor of 2
batch_size=30, batches_per_epoch=3,
)
You can think of Infinite
as a pipe through which the data flows.
Each function takes as input the data (an [image, label]
pair in
this case) applies a trasformation, and the result is propagated
further.
with batch_iter:
for xs, ys in batch_iter():
print(xs.shape, ys.shape)
(30, 56, 56) (30,)
(30, 56, 56) (30,)
(30, 56, 56) (30,)
Note, that because sample
yields pairs, pair
is the input of
zoom_image
. This is not very user-friendly, that’s why there are a
number of wrappers for transformers:
from dpipe.batch_iter import unpack_args
# a better version of zoom
def zoom_image(image, label):
return zoom(image, scale_factor=[2, 2]), label
batch_iter = Infinite(
sample(dataset),
unpack_args(zoom_image), # unpack the arguments before calling the function
batch_size=30, batches_per_epoch=3)
# or use a lambda directly
batch_iter = Infinite(
sample(dataset),
unpack_args(lambda image, label: [zoom(image, scale_factor=[2, 2]), label]),
batch_size=30, batches_per_epoch=3)
However, there is still redundancy: the label
argument is simply
passed through, only the image
is transformed. Let’s fix that:
from dpipe.batch_iter import apply_at
batch_iter = Infinite(
sample(dataset),
# apply zoom at index 0 of the pair with scale_factor=[2, 2] as an additional argument
apply_at(0, zoom, scale_factor=[2, 2]),
batch_size=30, batches_per_epoch=3)
with batch_iter:
for xs, ys in batch_iter():
print(xs.shape, ys.shape)
(30, 56, 56) (30,)
(30, 56, 56) (30,)
(30, 56, 56) (30,)
Now we don’t even have to create another function!
Check dpipe.batch_iter.utils
for other helper functions.
Parallel execution¶
The batch iterator supports both thread-based and process-based execution.
Threads¶
Wrap the function in Threads
in order to enable thread-based
parallelism:
%%time
import time
import itertools
from dpipe.batch_iter import Threads
def do_stuff(x):
time.sleep(1)
return x ** 2,
batch_iter = Infinite(
range(10),
do_stuff, # sleep for 10 seconds
batch_size=10, batches_per_epoch=1
)
for value in batch_iter():
pass
CPU times: user 33.3 ms, sys: 9.17 ms, total: 42.5 ms
Wall time: 10 s
%%time
batch_iter = Infinite(
range(10),
Threads(do_stuff, n_workers=2), # sleep for 5 seconds
batch_size=10, batches_per_epoch=1
)
for value in batch_iter():
pass
CPU times: user 21.4 ms, sys: 7.75 ms, total: 29.1 ms
Wall time: 5.01 s
Processes¶
Similarly, wrap the function in Loky
in order to enable process-based
parallelism:
from dpipe.batch_iter import Loky
%%time
batch_iter = Infinite(
range(10),
Loky(do_stuff, n_workers=2), # sleep for 5 seconds
batch_size=10, batches_per_epoch=1
)
for value in batch_iter():
pass
CPU times: user 43.6 ms, sys: 27.6 ms, total: 71.2 ms
Wall time: 5.56 s
Combining objects into batches¶
If your dataset contains items of various shapes, you can’t just stack
them into batches. For example you may want to pad them to a common
shape. To do this, pass a custom combiner
to Infinite
:
# random 3D images of random shapes:
images = [np.random.randn(10, 10, np.random.randint(2, 40)) for _ in range(100)]
labels = np.random.randint(0, 2, size=30)
images[0].shape, images[1].shape
((10, 10, 34), (10, 10, 34))
from dpipe.batch_iter import combine_pad
batch_iter = Infinite(
sample(list(zip(images, labels))),
batch_size=5, batches_per_epoch=3,
# pad and combine
combiner=combine_pad
)
with batch_iter:
for xs, ys in batch_iter():
print(xs.shape, ys.shape)
(5, 10, 10, 39) (5,)
(5, 10, 10, 34) (5,)
(5, 10, 10, 39) (5,)
Adaptive batch size¶
If samples in your pipeline have various sizes, a constant batch size can be too wasteful.
You can pass a function to batch_size
instead of an integer.
Let’s say we are classifying 3D images of different shapes along the last axis. We want a batch to contain at most 100 slices along the last axis.
def should_add(seq, item):
# seq - sequence of already added objects to the batch
# item - the next item
count = 0
for image, label in seq + [item]:
count += image.shape[-1]
return count <= 100
from dpipe.batch_iter import combine_pad
batch_iter = Infinite(
sample(list(zip(images, labels))),
batch_size=should_add, batches_per_epoch=3,
combiner=combine_pad
)
with batch_iter:
for xs, ys in batch_iter():
print(xs.shape, ys.shape)
(5, 10, 10, 34) (5,)
(4, 10, 10, 25) (4,)
(4, 10, 10, 32) (4,)
Note that the batch sizes are different: 4, 4, 5
Training¶
deep_pipe
has a unified interface for training models. We will show
an example for a model written in PyTorch.
from dpipe.train import train
this is the main function; it requires a batch iterator, and a
train_step
function, that performs a forward-backward pass for a
given batch.
Let’s build all the required components.
Batch iterator¶
The batch iterators are covered in a separate tutorial (Batch iterators), we’ll reuse the code from it:
from torchvision.datasets import MNIST
from dpipe.batch_iter import Infinite, sample, apply_at
from pathlib import Path
import numpy as np
# download to ~/tests/MNIST, if necessary
dataset = MNIST(Path('~/tests/MNIST').expanduser(), transform=np.array, download=True)
# yield 10 batches of size 30 each epoch:
batch_iter = Infinite(
sample(dataset),
apply_at(0, lambda x: x[None].astype('float32')), # add channels dim
batch_size=30, batches_per_epoch=10,
)
Train Step¶
Next, we will implement the function that performs a train_step. But first we need an architecture:
import torch
from torch import nn
from dpipe import layers
architecture = nn.Sequential(
nn.Conv2d(1, 32, kernel_size=3),
nn.ReLU(),
nn.Conv2d(32, 64, kernel_size=3),
nn.ReLU(),
nn.Conv2d(64, 128, kernel_size=3),
nn.AdaptiveMaxPool2d((1, 1)),
nn.Flatten(),
nn.ReLU(),
nn.Linear(128, 10),
)
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(architecture.parameters(), lr=1e-3)
from dpipe.torch import to_var, to_np
def cls_train_step(images, labels):
# move images and labels to same device as architecture
images, labels = to_var(images, labels, device=architecture)
architecture.train()
logits = architecture(images)
loss = criterion(logits, labels)
optimizer.zero_grad()
loss.backward()
optimizer.step()
# `train_step` must return the loss which will be later user for logging
return to_np(loss)
Training the model¶
Next, we just run the train
function:
train(cls_train_step, batch_iter, n_epochs=10)
A more general version of the function cls_train_step
is already
available in dpipe:
from dpipe.torch import train_step
Apart from the input batches it requires the following arguments:
architecture
, optimizer
, criterion
. We can pass these
arguments directly to train
, so the previous call is equivalent to:
train(
train_step, batch_iter, n_epochs=10,
architecture=architecture, optimizer=optimizer, criterion=criterion
)
Logging¶
After calling train
the interpreter just “hangs” until the training
is over. In order to log various information about the training process,
you can pass a logger:
from dpipe.train import ConsoleLogger
train(
train_step, batch_iter, n_epochs=3, logger=ConsoleLogger(),
architecture=architecture, optimizer=optimizer, criterion=criterion
)
00000: train loss: 0.29427966475486755
00001: train loss: 0.26119616627693176
00002: train loss: 0.2186189591884613
There are various logger implementations, e.g. one that writes in a
format, readable by tensorboard - TBLogger
.
Checkpoints¶
It is often useful to keep checkpoints (or snapshots) of you model and
optimizer in case you may want to resotore them. To do that, pass the
checkpoints
argument:
from dpipe.train import Checkpoints
checkpoints = Checkpoints(
'PATH/TO/CHECKPOINTS/FOLDER',
[architecture, optimizer],
)
train(
train_step, batch_iter, n_epochs=3, checkpoints=checkpoints,
architecture=architecture, optimizer=optimizer, criterion=criterion
)
The cool part is that if the training is prematurely stopped, e.g. by an exception, you can resume the training from the same point instead of starting over:
train(
train_step, batch_iter, n_epochs=3, checkpoints=checkpoints,
architecture=architecture, optimizer=optimizer, criterion=criterion
)
# ... something bad happened, e.g. KeyboardInterrupt
# start from where you left off
train(
train_step, batch_iter, n_epochs=3, checkpoints=checkpoints,
architecture=architecture, optimizer=optimizer, criterion=criterion
)
Value Policies¶
You can further customize the training process by passing addtitional
values to train_step
that change in time.
For example, train_step
takes an optional argument lr
- used to
update the optimizer
’s learning rate.
We can change this value after each trainig epoch using the
ValuePolicy
interface. Let’s use an exponential learning rate:
from dpipe.train import Exponential
train(
train_step, batch_iter, n_epochs=10,
architecture=architecture, optimizer=optimizer, criterion=criterion,
lr=Exponential(initial=1e-3, multiplier=0.5, step_length=3) # decrease by a factor of 2 every 3 epochs
)
Validation¶
Finally, you may want to evaluate your network on a separate validation
set after each epoch. This is done by the validate
argument. It
expects a function that simply returns a dictionary with the calculated
metrics, e.g.:
def validate():
architecture.eval()
# ... predict on validation set
pred = ...
ys = ...
acc = accuracy_score(ys, pred)
return {
'acuracy': acc
}
train(
train_step, batch_iter, n_epochs=10, validate=validate,
architecture=architecture, optimizer=optimizer, criterion=criterion,
)
Predict¶
Usually when dealing with neural networks, at inference time the input data may require some preprocessing before being fed into the network. Also, the network’s output might need postprocessing in order to obtain a final prediction.
Padding and cropping¶
Let’s suppose that we have a network
for segmentation that can only
work with images larger than 256x256 pixels.
Before feeding a given image
into the network you may want to pad
it:
from dpipe.medim.shape_ops import pad_to_shape
padded = pad_to_shape(image, np.maximum(image.shape, (256, 256)))
mask = network(padded)
Now you need to remove the padding in order to make the mask
of same
shape as image
:
from dpipe.medim.shape_ops import crop_to_shape
mask = crop_to_shape(mask, image.shape)
Let’s make a function that implements the whole pipeline:
import numpy as np
from dpipe.medim.shape_ops import pad_to_shape, crop_to_shape
def predict_pad(image, network, min_shape):
# pad
padded = pad_to_shape(image, np.maximum(image.shape, min_shape))
# predict
mask = network(padded)
# restore
mask = crop_to_shape(mask, image.shape)
return mask
Now we have a perfectly reusable function.
Scale¶
Now let’s write a function that downsamples the input by a factor of 2 and then zooms the output by 2.
import numpy as np
from dpipe.medim.shape_ops import zoom, zoom_to_shape
def predict_zoom(image, network, scale_factor=0.5):
# zoom
zoomed = zoom(image, scale_factor)
# predict
mask = network(zoomed)
# restore
mask = zoom_to_shape(mask, image.shape)
return mask
Combining¶
Now suppose we want to combine zooming and padding. We could do something like:
import numpy as np
from dpipe.medim.shape_ops import pad_to_shape, crop_to_shape
def predict(image, network, min_shape, scale_factor):
# zoom
zoomed = zoom(image, scale_factor)
# ---
# pad
padded = pad_to_shape(image, np.maximum(zoomed.shape, min_shape))
# predict
mask = network(padded)
# restore
mask = crop_to_shape(mask, np.minimum(mask.shape, zoomed.shape))
# ---
mask = zoom_to_shape(mask, image.shape)
return mask
Note how the content of predict
is divided in two regions: basically
it looks like the function predict_zoom
but with the line
mask = network(padded)
replaced by the body of predict_pad
.
Basically, it means that we can pass predict_pad
as the network
argument and reuse the functions we defined above:
def predict(image, network, min_shape, scale_factor):
def network_(x):
return predict_pad(x, network, min_shape)
return predict_zoom(image, network_, scale_factor)
predict_pad
“wraps” the original network
- it behaves like
network
, and predict_zoom
doesn’t really care whether it
received the original network
or a wrapped one.
This sounds just like a decorator (a very good explanation can be found here).
If we implement predict_pad
and predict_zoom
as decorators we
can more easily reuse them:
def predict_pad(min_shape):
def decorator(network):
def predict(image):
# pad
padded = pad_to_shape(image, np.maximum(image.shape, min_shape))
# predict
mask = network(padded)
# restore
mask = crop_to_shape(mask, np.minimum(mask.shape, image.shape))
return mask
return predict
return decorator
def predict_zoom(scale_factor):
def decorator(network):
def predict(image):
# zoom
zoomed = zoom(image, scale_factor)
# predict
mask = network(padded)
# restore
mask = zoom_to_shape(mask, image.shape)
return mask
return predict
return decorator
Then the same predict
can be defined like so:
@predict_zoom(0.5)
@predict_pad((256, 256))
def predict(image):
# here the image is already zoomed and padded
return network(image)
Now predict
is just a function that receives a single argument - the
image.
If you don’t like the decorator approach you can use a handy function for that:
from dpipe.predict.functional import chain_decorators
predict = chain_decorators(
predict_zoom(0.5),
predict_pad((256, 256)),
predict=network,
)
which gives the same function.
Working with patches¶
If your pipeline requires images of a given shape, you may want to split larger images into patches, perform some operations and then combine the results.
!wget https://www.bluecross.org.uk/sites/default/files/d8/assets/images/118809lprLR.jpg
import numpy as np
from imageio import imread
import matplotlib.pyplot as plt
%matplotlib inline
image = imread('118809lprLR.jpg')
plt.imshow(image)
Probability maps¶
from torchvision.models import resnet50
from torchvision.transforms import Normalize
model = resnet50(pretrained=True)
# resnet requires normalization
normalize = Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
We’ll classify this image by averaging the logits on each patch. We’ll be taking patches in a convolution-like fashion, i.e. with a fixed stride.
from dpipe.medim import grid
from dpipe.torch import to_var, to_np
from scipy.special import softmax
from dpipe.medim.shape_utils import shape_after_convolution
x = np.moveaxis(image.astype('float32'), -1, 0) # move channels forward
x = x / 256
probas = []
for patch in grid.divide(x, patch_size=(256, 256), stride=32, valid=True):
# move the patch to the same device as the model
patch = to_var(patch, device=model)
patch = normalize(patch)
pred = to_np(model(patch[None])[0])
pred = softmax(pred)
# according to https://gist.github.com/yrevar/942d3a0ac09ec9e5eb3a
# 281 is "tabby, tabby cat"
probas.append(pred[281][None, None])
output_shape = shape_after_convolution(x.shape[1:], kernel_size=256, stride=32)
# combine "patches" of shape (1, 1) into an image of `output_shape` with stride 1
heatmap = grid.combine(probas, output_shape, stride=(1, 1))
plt.figure(figsize=(20, 10))
plt.subplot(1, 2, 1)
plt.imshow(heatmap)
plt.subplot(1, 2, 2)
plt.imshow(image)
Patches segmentation¶
from torchvision.models.segmentation import fcn_resnet101
model = fcn_resnet101(pretrained=True)
pred.shape
x = np.moveaxis(image.astype('float32'), -1, 0) # move channels forward
x = x / 256
probas = []
for patch in grid.divide(x, patch_size=(256, 256), stride=32):
# move the patch to the same device as the model
patch = to_var(patch, device=model)
patch = normalize(patch)
pred = model(patch[None])['out'][0]
pred = to_np(pred)
# 'cat' is 8
pred = pred[8]
probas.append(pred)
segmentation = grid.combine(probas, x.shape[1:], stride=(32, 32))
plt.figure(figsize=(20, 10))
plt.subplot(1, 2, 1)
plt.imshow(segmentation)
plt.subplot(1, 2, 2)
plt.imshow(image)
Using predictors¶
The previous approach is a quite common pattern: split -> segment -> combine, that’s why there is a predictor that reduces boilerplate code:
from dpipe.predict import patches_grid
@patches_grid(patch_size=(256, 256), stride=(32, 32), padding_values=None)
def segment(patch):
patch = to_var(patch, device=model)
patch = normalize(patch)
pred = model(patch[None])['out'][0]
# 'cat' is 8
return to_np(pred[8])
You can then reuse this function:
segmentation = segment(image)
Wrappers¶
Consider the following dataset, which is a simple loader for MNIST:
class MNIST:
# ...
def load_image(self, identifier: str):
return self.xs[int(identifier)]
def load_label(self, identifier: str):
return self.ys[int(identifier)]
# The full implementation can be found at `dpipe.tests.mnist.resources`:
# from dpipe.tests.mnist.resources import MNIST
dataset = MNIST('PATH TO DATA')
dataset.load_image(0).shape, dataset.load_label(0)
((1, 28, 28), 5)
Next, suppose you want to upsample the images by a factor of 2.
There are several solutions:
Rewrite the dataset - breaks compatibility, not reusable
Write a new dataset - not reusable, generates a lot of repetitive code
Subclass the dataset - not reusable
Wrap the dataset
Wrappers are handy when you need to change the dataset’s behaviour in a reusable way.
You can think of a wrapper as an additional layer around the original dataset. In case of upsampling it could look something like this:
from dpipe.dataset.wrappers import Proxy
from dpipe.medim.shape_ops import zoom
class UpsampleWrapper(Proxy):
def load_image(self, identifier):
# self._shadowed is the original dataset
image = self._shadowed.load_image(identifier)
image = zoom(image, [2, 2])
return image
upsampled = UpsampleWrapper(dataset)
upsampled.load_image(0).shape, upsampled.load_label(0)
((1, 56, 56), 5)
Now this wrapper can be reused with other datasets that have the
load_image
method. Note that load_label
is also working, even
though it wasn’t defined in the wrapper.
dpipe
already has a collection of predefined wrappers, for example,
you can apply upsampling as follows:
from dpipe.dataset.wrappers import apply
upsampled = apply(dataset, load_image=lambda image: zoom(image, [2, 2]))
or in a more functional fashion:
from functools import partial
upsampled = apply(dataset, load_image=partial(zoom, scale_factor=[2, 2]))