gluonts.dataset.parallelized_loader module

class gluonts.dataset.parallelized_loader.ParallelDataLoader(dataset: Iterable[Dict[str, Any]], transformation: gluonts.transform._base.Transformation, cyclic: bool, is_train: bool, batch_size: int, ctx: mxnet.context.Context, dtype: gluonts.core.component.DType = <class 'numpy.float32'>, batchify_fn: Callable = <function batchify>, num_prefetch: Optional[int] = None, num_workers: Optional[int] = None, shuffle_buffer_length: Optional[int] = None)[source]

Bases: object

Loads data from a dataset and returns mini-batches of data.

  • dataset – The dataset from which to load data.

  • transformation – A transformation to apply to each entry in the dataset.

  • cyclic – Whether the dataset in question should be cycled.

  • is_train – Whether the dataset in question is used for training.

  • batch_size – Size of mini-batch.

  • ctx – MXNet context to use to store data.

  • dtype – Floating point type to use.

  • num_workers – The number of multiprocessing workers to use for data preprocessing. By default 0, in which case no multiprocessing will be utilized.

  • num_prefetch – The number of prefetching batches only works if num_workers > 0. If prefetch > 0, it allow worker process to prefetch certain batches before acquiring data from iterators. Note that using large prefetching batch will provide smoother bootstrapping performance, but will consume more shared_memory. Using smaller number may forfeit the purpose of using multiple worker processes, try reduce num_workers in this case. By default it defaults to num_workers * 2.

  • shuffle_buffer_length – The length of the buffer used to do pseudo shuffle. If not None, the loader will perform pseudo shuffle when generating batches. Note that using a larger buffer will provide more randomized batches, but will make the job require a bit more time to be done.

class gluonts.dataset.parallelized_loader.ShuffleIter(base_iterator: Iterator[Dict[str, Any]], shuffle_buffer_length: int)[source]

Bases: typing.Iterator

A wrapper class which takes a serialized iterator as an input and generates a pseudo randomized iterator using the same elements from the input iterator.

gluonts.dataset.parallelized_loader.batchify(data: List[dict], dtype: gluonts.core.component.DType, multi_processing: bool, single_process_ctx: Optional[mxnet.context.Context] = None, variable_length: bool = False) → Dict[str, Any][source]

reduce the list of dictionaries to a single dictionary, where values referenced by identical key are reduced using the stack function

gluonts.dataset.parallelized_loader.rebuild_ndarray(pid, fd, shape, dtype)[source]

Rebuild ndarray from pickled shared memory


Reduce ndarray to shared memory handle

gluonts.dataset.parallelized_loader.stack(data, multi_processing: bool, dtype: gluonts.core.component.DType, single_process_ctx: Optional[mxnet.context.Context] = None, variable_length: bool = False)[source]

Stack a list of data. Used when creating a single batch from list of dicts depending on whether multiprocessing is turned on, the batches will be constructed using different memory allocation techniques. If variable_length is specified, the data will be ‘padded’ with zeros along the first axis.

  • data (List) – Lists of array-like, stacked into data batches and loaded to appropriate memory (according to whether multi_processing is specified).

  • multi_processing (bool) – If True, data will be loaded to mxnet ndarrays on shared CPU memory.

  • dtype (DType) –

  • single_process_ctx (Optional[mx.Context]) –

  • variable_length (bool) – If True, the function will check if the list of data are “stackable”, i.e., they have matching axes. If not, it will assume that the first dimension of each array is heterogeneous (i.e., ragged) and will pad this axis before stacking.