Table Of Contents
Table Of Contents

gluonts.model.transformer.layers module

class gluonts.model.transformer.layers.InputLayer(model_size: int = 64, **kwargs)[source]

Bases: mxnet.gluon.block.HybridBlock

Transforms the input vector to model_size with an one-layer MPL, i.e., (batch_size, time_length, input_dim) -> (batch_size, time_length, model_size)

hybrid_forward(F, data: Union[mxnet.ndarray.ndarray.NDArray, mxnet.symbol.symbol.Symbol], *args)[source]

Overrides to construct symbolic graph for this Block.

Parameters:
  • x (Symbol or NDArray) – The first input tensor.
  • *args (list of Symbol or list of NDArray) – Additional input tensors.
class gluonts.model.transformer.layers.LayerNormalization(scale_init: str = 'ones', shift_init: str = 'zeros', eps: float = 1e-06, **kwargs)[source]

Bases: mxnet.gluon.block.HybridBlock

Implements layer normalization as proposed in [BKH16].

hybrid_forward(F, data: Union[mxnet.ndarray.ndarray.NDArray, mxnet.symbol.symbol.Symbol]) → Union[mxnet.ndarray.ndarray.NDArray, mxnet.symbol.symbol.Symbol][source]

Normalizes hidden units of data as follows:

data = scale * (data - mean) / sqrt(var + eps) + shift

Normalization is performed over the last dimension of the input data.

Parameters:data – Data to normalize of shape (d0, …, dn, num_hidden)
Returns:Normalized inputs of shape
Return type:(d0, .., dn, num_hidden)
class gluonts.model.transformer.layers.MultiHeadAttention(att_dim_in: int = 32, heads: int = 8, att_dim_out: int = 32, dropout: float = 0.0, **kwargs)[source]

Bases: gluonts.model.transformer.layers.MultiHeadAttentionBase

Multi-head attention layer for queries independent from keys/values.

Parameters:
  • att_dim_in – Attention dimension (number of hidden units)
  • heads – Number of attention heads
  • att_dim_out – Output dimension (number of output units)
  • dropout – Dropout rate on attention scores
hybrid_forward(F, queries: Union[mxnet.ndarray.ndarray.NDArray, mxnet.symbol.symbol.Symbol], memory: Union[mxnet.ndarray.ndarray.NDArray, mxnet.symbol.symbol.Symbol], mask: Union[mxnet.ndarray.ndarray.NDArray, mxnet.symbol.symbol.Symbol, None] = None) → Union[mxnet.ndarray.ndarray.NDArray, mxnet.symbol.symbol.Symbol][source]

Computes multi-head attention for queries given a memory tensor. If sequence lengths are provided, they will be used to mask the attention scores. A mask tensor may also be used to mask the attention scores. Returns a tensor of shape (batch_size, max_length, att_dim_out).

Parameters:
  • queries – Queries tensor of shape (batch_size, query_max_length, att_dim_in)
  • memory – Memory tensor to attend to of shape (batch_size, memory_max_length, att_dim_in)
  • mask – Optional tensor to mask attention scores
Returns:

Return type:

Tensor of shape (batch_size, query_seq_len, att_dim_out)

class gluonts.model.transformer.layers.MultiHeadAttentionBase(att_dim_in: int = 32, heads: int = 8, att_dim_out: int = 32, dropout: float = 0.0, **kwargs)[source]

Bases: mxnet.gluon.block.HybridBlock

Base class for Multi-head attention.

Parameters:
  • att_dim_in – Attention dimension (number of hidden units)
  • heads – Number of attention heads
  • att_dim_out – Output dimension (number of output units)
  • dropout – Dropout rate on attention scores
hybrid_forward(F, *args, **kwargs)[source]

Overrides to construct symbolic graph for this Block.

Parameters:
  • x (Symbol or NDArray) – The first input tensor.
  • *args (list of Symbol or list of NDArray) – Additional input tensors.
class gluonts.model.transformer.layers.MultiHeadSelfAttention(att_dim_in: int = 32, heads: int = 8, att_dim_out: int = 32, dropout: float = 0.0, **kwargs)[source]

Bases: gluonts.model.transformer.layers.MultiHeadAttentionBase

Multi-head self-attention. Independent linear projections of inputs serve as queries, keys, and values for the attention.

Parameters:
  • att_dim_in – Attention dimension (number of hidden units)
  • heads – Number of attention heads
  • att_dim_out – Output dimension (number of output units)
  • dropout – Dropout rate on attention scores
hybrid_forward(F, inputs: Union[mxnet.ndarray.ndarray.NDArray, mxnet.symbol.symbol.Symbol], mask: Union[mxnet.ndarray.ndarray.NDArray, mxnet.symbol.symbol.Symbol, None] = None, cache: Optional[Dict[str, Union[mxnet.ndarray.ndarray.NDArray, mxnet.symbol.symbol.Symbol, None]]] = None) → Tuple[Union[mxnet.ndarray.ndarray.NDArray, mxnet.symbol.symbol.Symbol], Optional[Dict]][source]

Computes multi-head attention on a set of inputs, serving as queries, keys, and values. If sequence lengths are provided, they will be used to mask the attention scores. May also use a cache of previously computed inputs.

Parameters:
  • inputs – Input data of shape (batch_size, max_length, att_dim_in)
  • mask – Optional tensor to mask attention scores
  • cache – Optional dictionary of previously computed keys and values
Returns:

A tensor of shape (batch_size, max_length, att_dim_out)

Return type:

Tensor

class gluonts.model.transformer.layers.TransformerFeedForward(inner_dim: int = 32, out_dim: int = 32, act_type: str = 'softrelu', dropout: float = 0.0, **kwargs)[source]

Bases: mxnet.gluon.block.HybridBlock

Position-wise feed-forward network with activation.

\[activation(XW_1 + b_1)W_2 + b_2\]

\(W_1\): (batch_size, d, inner_dim) \(W_2\): (batch_size, inner_dim, out_dim)

hybrid_forward(F, x: Union[mxnet.ndarray.ndarray.NDArray, mxnet.symbol.symbol.Symbol], *args) → Union[mxnet.ndarray.ndarray.NDArray, mxnet.symbol.symbol.Symbol][source]

Position-wise feed-forward network with activation.

Parameters:x – Tensor of shape (batch_size, d, in_dim)
Returns:
Return type:Tensor of shape (batch_size, d1, out_dim)
class gluonts.model.transformer.layers.TransformerProcessBlock(sequence: str, dropout: float, **kwargs)[source]

Bases: mxnet.gluon.block.HybridBlock

Block to perform pre/post processing on layer inputs. The processing steps are determined by the sequence argument, which can contain one of the three operations: n: layer normalization r: residual connection d: dropout

hybrid_forward(F, data: Union[mxnet.ndarray.ndarray.NDArray, mxnet.symbol.symbol.Symbol], prev: Union[mxnet.ndarray.ndarray.NDArray, mxnet.symbol.symbol.Symbol, None] = None) → Union[mxnet.ndarray.ndarray.NDArray, mxnet.symbol.symbol.Symbol][source]

Apply processing sequence to data with optional previous input.

Parameters:
  • data – Input data of shape: (batch_size, length, num_hidden)
  • prev – Previous data of shape (batch_size, length, num_hidden)
Returns:

Return type:

Processed data of shape (batch_size, length, num_hidden)

gluonts.model.transformer.layers.combine_heads(F, x: Union[mxnet.ndarray.ndarray.NDArray, mxnet.symbol.symbol.Symbol], dim_per_head: int, heads: int) → Union[mxnet.ndarray.ndarray.NDArray, mxnet.symbol.symbol.Symbol][source]
Parameters:
  • x – Tensor of shape (batch_size * heads, time_length, dim_per_head)
  • dim_per_head – Dimension per head
  • heads – Number of heads
Returns:

Return type:

Tensor of shape (batch_size, time_length, dim)

gluonts.model.transformer.layers.dot_attention(F, queries: Union[mxnet.ndarray.ndarray.NDArray, mxnet.symbol.symbol.Symbol], keys: Union[mxnet.ndarray.ndarray.NDArray, mxnet.symbol.symbol.Symbol], values: Union[mxnet.ndarray.ndarray.NDArray, mxnet.symbol.symbol.Symbol], mask: Union[mxnet.ndarray.ndarray.NDArray, mxnet.symbol.symbol.Symbol, None] = None, dropout: float = 0.0) → Union[mxnet.ndarray.ndarray.NDArray, mxnet.symbol.symbol.Symbol][source]
Parameters:
  • queries – Attention queries of shape (n, lq, d)
  • keys – Attention keys of shape (n, lk, d)
  • values – Attention values of shape (n, lk, dv)
  • mask – Optional mask tensor
  • dropout – Dropout rate
Returns:

Return type:

‘Context’ vectors for each query of shape (n, lq, dv)

gluonts.model.transformer.layers.split_heads(F, x: Union[mxnet.ndarray.ndarray.NDArray, mxnet.symbol.symbol.Symbol], dim_per_head: int, heads: int) → Union[mxnet.ndarray.ndarray.NDArray, mxnet.symbol.symbol.Symbol][source]

Returns a tensor with head dimension folded into batch and last dimension divided by the number of heads.

Parameters:
  • x – Tensor of shape (batch_size, time_length, dim).
  • dim_per_head – Dimension per head
  • heads – Number of heads
Returns:

Return type:

Tensor of shape (batch_size * heads, time_length, dim_per_head)