LayerNorm
-
class torch.nn.LayerNorm(normalized_shape, eps=1e-05, elementwise_affine=True)
[source] -
Applies Layer Normalization over a mini-batch of inputs as described in the paper Layer Normalization
The mean and standard-deviation are calculated separately over the last certain number dimensions which have to be of the shape specified by
normalized_shape
. and are learnable affine transform parameters ofnormalized_shape
ifelementwise_affine
isTrue
. The standard-deviation is calculated via the biased estimator, equivalent totorch.var(input, unbiased=False)
.Note
Unlike Batch Normalization and Instance Normalization, which applies scalar scale and bias for each entire channel/plane with the
affine
option, Layer Normalization applies per-element scale and bias withelementwise_affine
.This layer uses statistics computed from input data in both training and evaluation modes.
- Parameters
-
-
normalized_shape (int or list or torch.Size) –
input shape from an expected input of size
If a single integer is used, it is treated as a singleton list, and this module will normalize over the last dimension which is expected to be of that specific size.
- eps – a value added to the denominator for numerical stability. Default: 1e-5
-
elementwise_affine – a boolean value that when set to
True
, this module has learnable per-element affine parameters initialized to ones (for weights) and zeros (for biases). Default:True
.
-
- Shape:
-
- Input:
- Output: (same shape as input)
Examples:
>>> input = torch.randn(20, 5, 10, 10) >>> # With Learnable Parameters >>> m = nn.LayerNorm(input.size()[1:]) >>> # Without Learnable Parameters >>> m = nn.LayerNorm(input.size()[1:], elementwise_affine=False) >>> # Normalize over last two dimensions >>> m = nn.LayerNorm([10, 10]) >>> # Normalize over last dimension of size 10 >>> m = nn.LayerNorm(10) >>> # Activating the module >>> output = m(input)
© 2019 Torch Contributors
Licensed under the 3-clause BSD License.
https://pytorch.org/docs/1.8.0/generated/torch.nn.LayerNorm.html