Batch iterators

Tools for creating batch iterators. See the Batch iterators tutorial for more details.

Pipeline

class dpipe.batch_iter.pipeline.Infinite(source: ~typing.Iterable, *transformers: ~typing.Union[~typing.Callable, ~dpipe.batch_iter.pipeline.Transform], batch_size: ~typing.Union[int, ~typing.Callable], batches_per_epoch: int, buffer_size: int = 1, combiner: ~typing.Callable = <function combine_to_arrays>, **kwargs)[source]

Bases: object

Combine source and transformers into a batch iterator that yields batches of size batch_size.

Parameters
  • source (Iterable) – an infinite iterable.

  • transformers (Callable) – the callable that transforms the objects generated by the previous element of the pipeline.

  • batch_size (int, Callable) – the size of batch.

  • batches_per_epoch (int) – the number of batches to yield each epoch.

  • buffer_size (int) – the number of objects to keep buffered in each pipeline element. Default is 1.

  • combiner (Callable) – combines chunks of single batches in multiple batches, e.g. combiner([(x, y), (x, y)]) -> ([x, x], [y, y]). Default is combine_to_arrays.

  • kwargs – additional keyword arguments passed to the combiner.

References

See the Batch iterators tutorial for more details.

close()[source]

Stop all background processes.

property closing_callback

A callback to make this interface compatible with Lightning which allows for a safe release of resources

Examples

>>> batch_iter = Infinite(...)
>>> trainer = Trainer(callbacks=[batch_iter.closing_callback, ...])
class dpipe.batch_iter.pipeline.Threads(func: Callable, *args, n_workers: int = 1, buffer_size: int = 1, **kwargs)[source]

Bases: Iterator

Apply func concurrently to each object in the batch iterator by moving it to n_workers threads.

Parameters
  • transform (Callable(Iterable) -> Iterable) – a function that takes an iterable and yields transformed values.

  • n_workers (int) – the number of threads to which transform will be moved.

  • buffer_size (int) – the number of objects to keep buffered.

  • args – additional positional arguments passed to transform.

  • kwargs – additional keyword arguments passed to transform.

References

See the Batch iterators tutorial for more details.

class dpipe.batch_iter.pipeline.Loky(func: Callable, *args, n_workers: int = 1, buffer_size: int = 1, **kwargs)[source]

Bases: Transform

Apply func concurrently to each object in the batch iterator by moving it to n_workers processes.

Parameters
  • transform (Callable(Iterable) -> Iterable) – a function that takes an iterable and yields transformed values.

  • n_workers (int) – the number of threads to which transform will be moved.

  • buffer_size (int) – the number of objects to keep buffered.

  • args – additional positional arguments passed to transform.

  • kwargs – additional keyword arguments passed to transform.

Notes

Process-based parallelism is implemented with the loky backend.

References

See the Batch iterators tutorial for more details.

class dpipe.batch_iter.pipeline.Iterator(transform: Callable, *args, n_workers: int = 1, buffer_size: int = 1, **kwargs)[source]

Bases: Transform

Apply transform to the iterator of values that flow through the batch iterator.

Parameters
  • transform (Callable(Iterable) -> Iterable) – a function that takes an iterable and yields transformed values.

  • n_workers (int) – the number of threads to which transform will be moved.

  • buffer_size (int) – the number of objects to keep buffered.

  • args – additional positional arguments passed to transform.

  • kwargs – additional keyword arguments passed to transform.

References

See the Batch iterators tutorial for more details.

dpipe.batch_iter.pipeline.combine_batches(inputs)[source]

Combines tuples from inputs into batches: [(x, y), (x, y)] -> [(x, x), (y, y)]

dpipe.batch_iter.pipeline.combine_to_arrays(inputs)[source]

Combines tuples from inputs into batches of numpy arrays.

dpipe.batch_iter.pipeline.combine_pad(inputs, padding_values: Union[float, Sequence[float]] = 0, ratio: Union[float, Sequence[float]] = 0.5)[source]

Combines tuples from inputs into batches and pads each batch in order to obtain a correctly shaped numpy array.

Parameters
  • inputs

  • padding_values – values to pad with. If Callable (e.g. numpy.min) - padding_values(x) will be used.

  • ratio – the fraction of the padding that will be applied to the left, 1.0 - ratio will be applied to the right. By default 0.5 - ratio, it is applied uniformly to the left and right.

References

pad_to_shape

Sources

dpipe.batch_iter.sources.sample(sequence: Sequence, weights: Optional[Sequence[float]] = None, random_state: Optional[Union[RandomState, int]] = None)[source]

Infinitely yield samples from sequence according to weights.

Parameters
  • sequence (Sequence) – the sequence of elements to sample from.

  • weights (Sequence[float], None, optional) – the weights associated with each element. If None, the weights are assumed to be equal. Should be the same size as sequence.

  • random_state (int, np.random.RandomState, None, optional) – if not None, used to set the random seed for reproducibility reasons.

dpipe.batch_iter.sources.load_by_random_id(*loaders: Callable, ids: Sequence, weights: Optional[Sequence[float]] = None, random_state: Optional[Union[RandomState, int]] = None)[source]

Infinitely yield objects loaded by loaders according to the identifier from ids. The identifiers are randomly sampled from ids according to the weights.

Parameters
  • loaders (Callable) – function, which loads object by its id.

  • ids (Sequence) – the sequence of identifiers to sample from.

  • weights (Sequence[float], None, optional) – The weights associated with each id. If None, the weights are assumed to be equal. Should be the same size as ids.

  • random_state (int, np.random.RandomState, None, optional) – if not None, used to set the random seed for reproducibility reasons.

Blocks

class dpipe.batch_iter.expiration_pool.ExpirationPool(pool_size: int, repetitions: int, iterations: int = 1)[source]

Bases: Iterator

A simple expiration pool for time consuming operations that don’t fit into RAM. See expiration_pool for details.

Examples

>>> batch_iter = Infinite(
    # ... some expensive operations, e.g. loading from disk, or preprocessing
    ExpirationPool(pool_size, repetitions),
    # ... here are the values from pool
    # ... other lightweight operations
    # ...
)
dpipe.batch_iter.expiration_pool.expiration_pool(iterable: Iterable, pool_size: int, repetitions: int, iterations: int = 1)[source]

Caches pool_size items from iterable. The item is removed from cache after it was generated repetitions times. After an item is removed, a new one is extracted from the iterable. Finally, iterations controls how many values are generated after a new value is added, thus speeding up the pipeline at early stages.

Utils

dpipe.batch_iter.utils.pad_batch_equal(batch, padding_values: Union[float, Sequence[float]] = 0, ratio: Union[float, Sequence[float]] = 0.5)[source]

Pad each element of batch to obtain a correctly shaped array.

References

pad_to_shape

dpipe.batch_iter.utils.unpack_args(func: Callable, *args, **kwargs)[source]

Returns a function that takes an iterable and unpacks it while calling func.

args and kwargs are passed to func as additional arguments.

Examples

>>> def add(x, y):
>>>     return x + y
>>>
>>> add_ = unpack_args(add)
>>> add(1, 2) == add_([1, 2])
>>> True
dpipe.batch_iter.utils.multiply(func: Callable, *args, **kwargs)[source]

Returns a function that takes an iterable and maps func over it. Useful when multiple batches require the same function.

args and kwargs are passed to func as additional arguments.

dpipe.batch_iter.utils.apply_at(index: Union[int, Sequence[int]], func: Callable, *args, **kwargs)[source]

Returns a function that takes an iterable and applies func to the values at the corresponding index.

args and kwargs are passed to func as additional arguments.

Examples

>>> first_sqr = apply_at(0, np.square)
>>> first_sqr([3, 2, 1])
>>> (9, 2, 1)
dpipe.batch_iter.utils.zip_apply(*functions: Callable, **kwargs)[source]

Returns a function that takes an iterable and zips functions over it.

kwargs are passed to each function as additional arguments.

Examples

>>> zipper = zip_apply(np.square, np.sqrt)
>>> zipper([4, 9])
>>> (16, 3)
dpipe.batch_iter.utils.random_apply(p: float, func: Callable, *args, **kwargs)[source]

Returns a function that applies func with a given probability p.

args and kwargs are passed to func as additional arguments.

dpipe.batch_iter.utils.sample_args(func: Callable, *args: Callable, **kwargs: Callable)[source]

Returns a function that samples arguments for func from args and kwargs.

Each argument in args and kwargs must be a callable that samples a random value.

Examples

>>> from scipy.ndimage import  rotate
>>>
>>> random_rotate = sample_args(rotate, angle=np.random.normal)
>>> random_rotate(x)
>>> # same as
>>> rotate(x, angle=np.random.normal())