From: @lihongkang1 Reviewed-by: @liangchenghui,@wuxuejian Signed-off-by: @liangchenghuitags/v1.2.0-rc1
| @@ -96,7 +96,11 @@ class LogSoftmax(Cell): | |||||
| The input is transformed by the Softmax function and then by the log function to lie in range[-inf,0). | The input is transformed by the Softmax function and then by the log function to lie in range[-inf,0). | ||||
| Logsoftmax is defined as: | Logsoftmax is defined as: | ||||
| :math:`\text{logsoftmax}(x_i) = \log \left(\frac{\exp(x_i)}{\sum_{j=0}^{n-1} \exp(x_j)}\right)`, | |||||
| .. math:: | |||||
| \text{logsoftmax}(x_i) = \log \left(\frac{\exp(x_i)}{\sum_{j=0}^{n-1} \exp(x_j)}\right), | |||||
| where :math:`x_{i}` is the :math:`i`-th slice in the given dimension of the input Tensor. | where :math:`x_{i}` is the :math:`i`-th slice in the given dimension of the input Tensor. | ||||
| Args: | Args: | ||||
| @@ -177,8 +181,13 @@ class ReLU(Cell): | |||||
| r""" | r""" | ||||
| Rectified Linear Unit activation function. | Rectified Linear Unit activation function. | ||||
| Applies the rectified linear unit function element-wise. It returns | |||||
| element-wise :math:`\max(0, x)`, specially, the neurons with the negative output | |||||
| Applies the rectified linear unit function element-wise. | |||||
| .. math:: | |||||
| \text{ReLU}(x) = (x)^+ = \max(0, x), | |||||
| It returns element-wise :math:`\max(0, x)`, specially, the neurons with the negative output | |||||
| will be suppressed and the active neurons will stay the same. | will be suppressed and the active neurons will stay the same. | ||||
| The picture about ReLU looks like this `ReLU <https://en.wikipedia.org/wiki/ | The picture about ReLU looks like this `ReLU <https://en.wikipedia.org/wiki/ | ||||
| @@ -215,7 +224,13 @@ class ReLU6(Cell): | |||||
| ReLU6 is similar to ReLU with a upper limit of 6, which if the inputs are greater than 6, the outputs | ReLU6 is similar to ReLU with a upper limit of 6, which if the inputs are greater than 6, the outputs | ||||
| will be suppressed to 6. | will be suppressed to 6. | ||||
| It computes element-wise as :math:`\min(\max(0, x), 6)`. The input is a Tensor of any valid shape. | |||||
| It computes element-wise as | |||||
| .. math:: | |||||
| \min(\max(0, x), 6). | |||||
| The input is a Tensor of any valid shape. | |||||
| Inputs: | Inputs: | ||||
| - **input_data** (Tensor) - The input of ReLU6. | - **input_data** (Tensor) - The input of ReLU6. | ||||
| @@ -338,7 +353,12 @@ class GELU(Cell): | |||||
| Applies GELU function to each element of the input. The input is a Tensor with any valid shape. | Applies GELU function to each element of the input. The input is a Tensor with any valid shape. | ||||
| GELU is defined as: | GELU is defined as: | ||||
| :math:`GELU(x_i) = x_i*P(X < x_i)`, where :math:`P` is the cumulative distribution function | |||||
| .. math:: | |||||
| GELU(x_i) = x_i*P(X < x_i), | |||||
| where :math:`P` is the cumulative distribution function | |||||
| of standard Gaussian distribution and :math:`x_i` is the element of the input. | of standard Gaussian distribution and :math:`x_i` is the element of the input. | ||||
| The picture about GELU looks like this `GELU <https://en.wikipedia.org/wiki/ | The picture about GELU looks like this `GELU <https://en.wikipedia.org/wiki/ | ||||
| @@ -417,7 +437,12 @@ class Sigmoid(Cell): | |||||
| Applies sigmoid-type activation element-wise. | Applies sigmoid-type activation element-wise. | ||||
| Sigmoid function is defined as: | Sigmoid function is defined as: | ||||
| :math:`\text{sigmoid}(x_i) = \frac{1}{1 + \exp(-x_i)}`, where :math:`x_i` is the element of the input. | |||||
| .. math:: | |||||
| \text{sigmoid}(x_i) = \frac{1}{1 + \exp(-x_i)}, | |||||
| where :math:`x_i` is the element of the input. | |||||
| The picture about Sigmoid looks like this `Sigmoid <https://en.wikipedia.org/wiki/ | The picture about Sigmoid looks like this `Sigmoid <https://en.wikipedia.org/wiki/ | ||||
| Sigmoid_function#/media/File:Logistic-curve.svg>`_. | Sigmoid_function#/media/File:Logistic-curve.svg>`_. | ||||
| @@ -453,8 +478,13 @@ class PReLU(Cell): | |||||
| Applies the PReLU function element-wise. | Applies the PReLU function element-wise. | ||||
| PReLU is defined as: :math:`prelu(x_i)= \max(0, x_i) + w * \min(0, x_i)`, where :math:`x_i` | |||||
| is an element of an channel of the input. | |||||
| PReLU is defined as: | |||||
| .. math:: | |||||
| prelu(x_i)= \max(0, x_i) + w * \min(0, x_i), | |||||
| where :math:`x_i` is an element of an channel of the input. | |||||
| Here :math:`w` is a learnable parameter with a default initial value 0.25. | Here :math:`w` is a learnable parameter with a default initial value 0.25. | ||||
| Parameter :math:`w` has dimensionality of the argument channel. If called without argument | Parameter :math:`w` has dimensionality of the argument channel. If called without argument | ||||
| @@ -98,18 +98,18 @@ class Dropout(Cell): | |||||
| Randomly set some elements of the input tensor to zero with probability :math:`1 - keep\_prob` during training | Randomly set some elements of the input tensor to zero with probability :math:`1 - keep\_prob` during training | ||||
| using samples from a Bernoulli distribution. | using samples from a Bernoulli distribution. | ||||
| Note: | |||||
| Each channel will be zeroed out independently on every construct call. | |||||
| The outputs are scaled by a factor of :math:`\frac{1}{keep\_prob}` during training so | |||||
| that the output layer remains at a similar scale. During inference, this | |||||
| layer returns the same tensor as the input. | |||||
| The outputs are scaled by a factor of :math:`\frac{1}{keep\_prob}` during training so | |||||
| that the output layer remains at a similar scale. During inference, this | |||||
| layer returns the same tensor as the input. | |||||
| This technique is proposed in paper `Dropout: A Simple Way to Prevent Neural Networks from Overfitting | |||||
| <http://www.cs.toronto.edu/~rsalakhu/papers/srivastava14a.pdf>`_ and proved to be effective to reduce | |||||
| over-fitting and prevents neurons from co-adaptation. See more details in `Improving neural networks by | |||||
| preventing co-adaptation of feature detectors | |||||
| <https://arxiv.org/pdf/1207.0580.pdf>`_. | |||||
| This technique is proposed in paper `Dropout: A Simple Way to Prevent Neural Networks from Overfitting | |||||
| <http://www.cs.toronto.edu/~rsalakhu/papers/srivastava14a.pdf>`_ and proved to be effective to reduce | |||||
| over-fitting and prevents neurons from co-adaptation. See more details in `Improving neural networks by | |||||
| preventing co-adaptation of feature detectors | |||||
| <https://arxiv.org/pdf/1207.0580.pdf>`_. | |||||
| Note: | |||||
| Each channel will be zeroed out independently on every construct call. | |||||
| Args: | Args: | ||||
| keep_prob (float): The keep rate, greater than 0 and less equal than 1. E.g. rate=0.9, | keep_prob (float): The keep rate, greater than 0 and less equal than 1. E.g. rate=0.9, | ||||
| @@ -512,23 +512,23 @@ class BCELoss(_Loss): | |||||
| r""" | r""" | ||||
| BCELoss creates a criterion to measure the binary cross entropy between the true labels and predicted labels. | BCELoss creates a criterion to measure the binary cross entropy between the true labels and predicted labels. | ||||
| Note: | |||||
| Set the predicted labels as :math:`x`, true labels as :math:`y`, the output loss as :math:`\ell(x, y)`. | |||||
| Let, | |||||
| Set the predicted labels as :math:`x`, true labels as :math:`y`, the output loss as :math:`\ell(x, y)`. | |||||
| Let, | |||||
| .. math:: | |||||
| L = \{l_1,\dots,l_N\}^\top, \quad | |||||
| l_n = - w_n \left[ y_n \cdot \log x_n + (1 - y_n) \cdot \log (1 - x_n) \right] | |||||
| .. math:: | |||||
| L = \{l_1,\dots,l_N\}^\top, \quad | |||||
| l_n = - w_n \left[ y_n \cdot \log x_n + (1 - y_n) \cdot \log (1 - x_n) \right] | |||||
| Then, | |||||
| Then, | |||||
| .. math:: | |||||
| \ell(x, y) = \begin{cases} | |||||
| L, & \text{if reduction} = \text{`none';}\\ | |||||
| \operatorname{mean}(L), & \text{if reduction} = \text{`mean';}\\ | |||||
| \operatorname{sum}(L), & \text{if reduction} = \text{`sum'.} | |||||
| \end{cases} | |||||
| .. math:: | |||||
| \ell(x, y) = \begin{cases} | |||||
| L, & \text{if reduction} = \text{`none';}\\ | |||||
| \operatorname{mean}(L), & \text{if reduction} = \text{`mean';}\\ | |||||
| \operatorname{sum}(L), & \text{if reduction} = \text{`sum'.} | |||||
| \end{cases} | |||||
| Note: | |||||
| Note that the predicted labels should always be the output of sigmoid and the true labels should be numbers | Note that the predicted labels should always be the output of sigmoid and the true labels should be numbers | ||||
| between 0 and 1. | between 0 and 1. | ||||
| @@ -45,13 +45,6 @@ class Momentum(Optimizer): | |||||
| Refer to the paper on the importance of initialization and momentum in deep learning for more details. | Refer to the paper on the importance of initialization and momentum in deep learning for more details. | ||||
| Note: | |||||
| When separating parameter groups, the weight decay in each group will be applied on the parameters if the | |||||
| weight decay is positive. When not separating parameter groups, the `weight_decay` in the API will be applied | |||||
| on the parameters without 'beta' or 'gamma' in their names if `weight_decay` is positive. | |||||
| To improve parameter groups performance, the customized order of parameters can be supported. | |||||
| .. math:: | .. math:: | ||||
| v_{t} = v_{t-1} \ast u + gradients | v_{t} = v_{t-1} \ast u + gradients | ||||
| @@ -67,6 +60,13 @@ class Momentum(Optimizer): | |||||
| Here: where grad, lr, p, v and u denote the gradients, learning_rate, params, moments, and momentum respectively. | Here: where grad, lr, p, v and u denote the gradients, learning_rate, params, moments, and momentum respectively. | ||||
| Note: | |||||
| When separating parameter groups, the weight decay in each group will be applied on the parameters if the | |||||
| weight decay is positive. When not separating parameter groups, the `weight_decay` in the API will be applied | |||||
| on the parameters without 'beta' or 'gamma' in their names if `weight_decay` is positive. | |||||
| To improve parameter groups performance, the customized order of parameters can be supported. | |||||
| Args: | Args: | ||||
| params (Union[list[Parameter], list[dict]]): When the `params` is a list of `Parameter` which will be updated, | params (Union[list[Parameter], list[dict]]): When the `params` is a list of `Parameter` which will be updated, | ||||
| the element in `params` must be class `Parameter`. When the `params` is a list of `dict`, the "params", | the element in `params` must be class `Parameter`. When the `params` is a list of `dict`, the "params", | ||||
| @@ -46,13 +46,13 @@ class RMSProp(Optimizer): | |||||
| The equation is as follows: | The equation is as follows: | ||||
| .. math:: | |||||
| .. math:: | |||||
| s_{t} = \\rho s_{t-1} + (1 - \\rho)(\\nabla Q_{i}(w))^2 | s_{t} = \\rho s_{t-1} + (1 - \\rho)(\\nabla Q_{i}(w))^2 | ||||
| .. math:: | |||||
| .. math:: | |||||
| m_{t} = \\beta m_{t-1} + \\frac{\\eta} {\\sqrt{s_{t} + \\epsilon}} \\nabla Q_{i}(w) | m_{t} = \\beta m_{t-1} + \\frac{\\eta} {\\sqrt{s_{t} + \\epsilon}} \\nabla Q_{i}(w) | ||||
| .. math:: | |||||
| .. math:: | |||||
| w = w - m_{t} | w = w - m_{t} | ||||
| The first equation calculates moving average of the squared gradient for | The first equation calculates moving average of the squared gradient for | ||||
| @@ -60,16 +60,16 @@ class RMSProp(Optimizer): | |||||
| if centered is True: | if centered is True: | ||||
| .. math:: | |||||
| .. math:: | |||||
| g_{t} = \\rho g_{t-1} + (1 - \\rho)\\nabla Q_{i}(w) | g_{t} = \\rho g_{t-1} + (1 - \\rho)\\nabla Q_{i}(w) | ||||
| .. math:: | |||||
| .. math:: | |||||
| s_{t} = \\rho s_{t-1} + (1 - \\rho)(\\nabla Q_{i}(w))^2 | s_{t} = \\rho s_{t-1} + (1 - \\rho)(\\nabla Q_{i}(w))^2 | ||||
| .. math:: | |||||
| .. math:: | |||||
| m_{t} = \\beta m_{t-1} + \\frac{\\eta} {\\sqrt{s_{t} - g_{t}^2 + \\epsilon}} \\nabla Q_{i}(w) | m_{t} = \\beta m_{t-1} + \\frac{\\eta} {\\sqrt{s_{t} - g_{t}^2 + \\epsilon}} \\nabla Q_{i}(w) | ||||
| .. math:: | |||||
| .. math:: | |||||
| w = w - m_{t} | w = w - m_{t} | ||||
| where :math:`w` represents `params`, which will be updated. | where :math:`w` represents `params`, which will be updated. | ||||
| @@ -39,13 +39,6 @@ class SGD(Optimizer): | |||||
| Nesterov momentum is based on the formula from paper `On the importance of initialization and | Nesterov momentum is based on the formula from paper `On the importance of initialization and | ||||
| momentum in deep learning <http://proceedings.mlr.press/v28/sutskever13.html>`_. | momentum in deep learning <http://proceedings.mlr.press/v28/sutskever13.html>`_. | ||||
| Note: | |||||
| When separating parameter groups, the weight decay in each group will be applied on the parameters if the | |||||
| weight decay is positive. When not separating parameter groups, the `weight_decay` in the API will be applied | |||||
| on the parameters without 'beta' or 'gamma' in their names if `weight_decay` is positive. | |||||
| To improve parameter groups performance, the customized order of parameters can be supported. | |||||
| .. math:: | .. math:: | ||||
| v_{t+1} = u \ast v_{t} + gradient \ast (1-dampening) | v_{t+1} = u \ast v_{t} + gradient \ast (1-dampening) | ||||
| @@ -63,6 +56,13 @@ class SGD(Optimizer): | |||||
| Here : where p, v and u denote the parameters, accum, and momentum respectively. | Here : where p, v and u denote the parameters, accum, and momentum respectively. | ||||
| Note: | |||||
| When separating parameter groups, the weight decay in each group will be applied on the parameters if the | |||||
| weight decay is positive. When not separating parameter groups, the `weight_decay` in the API will be applied | |||||
| on the parameters without 'beta' or 'gamma' in their names if `weight_decay` is positive. | |||||
| To improve parameter groups performance, the customized order of parameters can be supported. | |||||
| Args: | Args: | ||||
| params (Union[list[Parameter], list[dict]]): When the `params` is a list of `Parameter` which will be updated, | params (Union[list[Parameter], list[dict]]): When the `params` is a list of `Parameter` which will be updated, | ||||
| the element in `params` must be class `Parameter`. When the `params` is a list of `dict`, the "params", | the element in `params` must be class `Parameter`. When the `params` is a list of `dict`, the "params", | ||||
| @@ -211,9 +211,13 @@ def gamma(shape, alpha, beta, seed=None): | |||||
| def poisson(shape, mean, seed=None): | def poisson(shape, mean, seed=None): | ||||
| """ | |||||
| r""" | |||||
| Generates random numbers according to the Poisson random number distribution. | Generates random numbers according to the Poisson random number distribution. | ||||
| .. math:: | |||||
| \text{P}(i|μ) = \frac{\exp(-μ)μ^{i}}{i!} | |||||
| Args: | Args: | ||||
| shape (tuple): The shape of random tensor to be generated. | shape (tuple): The shape of random tensor to be generated. | ||||
| mean (Tensor): The mean μ distribution parameter. It should be greater than 0 with float32 data type. | mean (Tensor): The mean μ distribution parameter. It should be greater than 0 with float32 data type. | ||||
| @@ -4104,21 +4104,22 @@ class BroadcastTo(PrimitiveWithInfer): | |||||
| When input shape is broadcast to target shape, it starts with the trailing dimensions. | When input shape is broadcast to target shape, it starts with the trailing dimensions. | ||||
| Raises: | |||||
| ValueError: Given a shape tuple, if it has several -1; or if the -1 is in an invalid position | |||||
| such as one that does not have a opposing dimension in an input tensor; or if the target and | |||||
| input shapes are incompatiable. | |||||
| Args: | Args: | ||||
| shape (tuple): The target shape to broadcast. Can be fully specified, or have -1 in one position | shape (tuple): The target shape to broadcast. Can be fully specified, or have -1 in one position | ||||
| where it will be substituted by the input tensor's shape in that position, see example. | where it will be substituted by the input tensor's shape in that position, see example. | ||||
| Inputs: | Inputs: | ||||
| - **input_x** (Tensor) - The input tensor. | |||||
| - **input_x** (Tensor) - The input tensor. The data type should be one of the following types: float16, float32, | |||||
| int32, int8, uint8. | |||||
| Outputs: | Outputs: | ||||
| Tensor, with the given `shape` and the same data type as `input_x`. | Tensor, with the given `shape` and the same data type as `input_x`. | ||||
| Raises: | |||||
| ValueError: Given a shape tuple, if it has several -1; or if the -1 is in an invalid position | |||||
| such as one that does not have a opposing dimension in an input tensor; or if the target and | |||||
| input shapes are incompatiable. | |||||
| Supported Platforms: | Supported Platforms: | ||||
| ``Ascend`` ``GPU`` | ``Ascend`` ``GPU`` | ||||
| @@ -4402,7 +4403,9 @@ class ReverseSequence(PrimitiveWithInfer): | |||||
| class EditDistance(PrimitiveWithInfer): | class EditDistance(PrimitiveWithInfer): | ||||
| """ | """ | ||||
| Computes the Levebshtein Edit Distance. It is used to measure the similarity of two sequences. | |||||
| Computes the Levebshtein Edit Distance. It is used to measure the similarity of two sequences. The inputs are | |||||
| variable-length sequences provided by SparseTensors (hypothesis_indices, hypothesis_values, hypothesis_shape) | |||||
| and (truth_indices, truth_values, truth_shape). | |||||
| Args: | Args: | ||||
| normalize (bool): If true, edit distances are normalized by length of truth. Default: True. | normalize (bool): If true, edit distances are normalized by length of truth. Default: True. | ||||
| @@ -840,6 +840,10 @@ class CumSum(PrimitiveWithInfer): | |||||
| """ | """ | ||||
| Computes the cumulative sum of input tensor along axis. | Computes the cumulative sum of input tensor along axis. | ||||
| .. math:: | |||||
| y_i = x_1 + x_2 + x_3 + ... + x_i | |||||
| Args: | Args: | ||||
| exclusive (bool): If true, perform exclusive mode. Default: False. | exclusive (bool): If true, perform exclusive mode. Default: False. | ||||
| reverse (bool): If true, perform inverse cumulative sum. Default: False. | reverse (bool): If true, perform inverse cumulative sum. Default: False. | ||||
| @@ -2248,6 +2252,10 @@ class Ceil(PrimitiveWithInfer): | |||||
| """ | """ | ||||
| Rounds a tensor up to the closest integer element-wise. | Rounds a tensor up to the closest integer element-wise. | ||||
| .. math:: | |||||
| out_i = [input_i] = [input_i] + 1 | |||||
| Inputs: | Inputs: | ||||
| - **input_x** (Tensor) - The input tensor. It's element data type must be float16 or float32. | - **input_x** (Tensor) - The input tensor. It's element data type must be float16 or float32. | ||||
| @@ -2357,6 +2365,10 @@ class Acosh(PrimitiveWithInfer): | |||||
| """ | """ | ||||
| Computes inverse hyperbolic cosine of the input element-wise. | Computes inverse hyperbolic cosine of the input element-wise. | ||||
| .. math:: | |||||
| out_i = cosh^{-1}(input_i) | |||||
| Inputs: | Inputs: | ||||
| - **input_x** (Tensor) - The shape of tensor is :math:`(x_1, x_2, ..., x_R)`. | - **input_x** (Tensor) - The shape of tensor is :math:`(x_1, x_2, ..., x_R)`. | ||||
| @@ -2423,6 +2435,10 @@ class Asinh(PrimitiveWithInfer): | |||||
| """ | """ | ||||
| Computes inverse hyperbolic sine of the input element-wise. | Computes inverse hyperbolic sine of the input element-wise. | ||||
| .. math:: | |||||
| out_i = sinh^{-1}(input_i) | |||||
| Inputs: | Inputs: | ||||
| - **input_x** (Tensor) - The shape of tensor is :math:`(x_1, x_2, ..., x_R)`. | - **input_x** (Tensor) - The shape of tensor is :math:`(x_1, x_2, ..., x_R)`. | ||||
| @@ -3241,6 +3257,10 @@ class ACos(PrimitiveWithInfer): | |||||
| """ | """ | ||||
| Computes arccosine of input tensors element-wise. | Computes arccosine of input tensors element-wise. | ||||
| .. math:: | |||||
| out_i = cos^{-1}(input_i) | |||||
| Inputs: | Inputs: | ||||
| - **input_x** (Tensor) - The shape of tensor is :math:`(x_1, x_2, ..., x_R)`. | - **input_x** (Tensor) - The shape of tensor is :math:`(x_1, x_2, ..., x_R)`. | ||||
| @@ -3405,6 +3425,10 @@ class Abs(PrimitiveWithInfer): | |||||
| """ | """ | ||||
| Returns absolute value of a tensor element-wise. | Returns absolute value of a tensor element-wise. | ||||
| .. math:: | |||||
| out_i = |input_i| | |||||
| Inputs: | Inputs: | ||||
| - **input_x** (Tensor) - The input tensor. The shape of tensor is :math:`(x_1, x_2, ..., x_R)`. | - **input_x** (Tensor) - The input tensor. The shape of tensor is :math:`(x_1, x_2, ..., x_R)`. | ||||
| @@ -260,7 +260,8 @@ class Softsign(PrimitiveWithInfer): | |||||
| The function is shown as follows: | The function is shown as follows: | ||||
| .. math:: | .. math:: | ||||
| \text{output} = \frac{\text{input_x}}{1 + \left| \text{input_x} \right|}, | |||||
| \text{SoftSign}(x) = \frac{x}{ 1 + |x|} | |||||
| Inputs: | Inputs: | ||||
| - **input_x** (Tensor) - The input tensor whose data type must be float16 or float32. | - **input_x** (Tensor) - The input tensor whose data type must be float16 or float32. | ||||
| @@ -332,6 +333,10 @@ class ReLU6(PrimitiveWithInfer): | |||||
| r""" | r""" | ||||
| Computes ReLU (Rectified Linear Unit) upper bounded by 6 of input tensors element-wise. | Computes ReLU (Rectified Linear Unit) upper bounded by 6 of input tensors element-wise. | ||||
| .. math:: | |||||
| \text{ReLU6}(x) = \min(\max(0,x), 6) | |||||
| It returns :math:`\min(\max(0,x), 6)` element-wise. | It returns :math:`\min(\max(0,x), 6)` element-wise. | ||||
| Inputs: | Inputs: | ||||
| @@ -437,15 +442,12 @@ class Elu(PrimitiveWithInfer): | |||||
| r""" | r""" | ||||
| Computes exponential linear: | Computes exponential linear: | ||||
| if x < 0: | |||||
| .. math:: | .. math:: | ||||
| \text{x} = \alpha * (\exp(\text{x}) - 1) | |||||
| if x >= 0: | |||||
| .. math:: | |||||
| \text{x} = \text{x} | |||||
| \text{x} = \begin{cases} | |||||
| \alpha * (\exp(\text{x}) - 1), & \text{if x} < \text{0;}\\ | |||||
| \text{x}, & \text{if x} >= \text{0.} | |||||
| \end{cases} | |||||
| The data type of input tensor must be float. | The data type of input tensor must be float. | ||||
| @@ -1569,8 +1571,11 @@ class MaxPoolWithArgmax(_Pool): | |||||
| It has the same data type as `input`. | It has the same data type as `input`. | ||||
| - **mask** (Tensor) - Max values' index represented by the mask. Data type is int32. | - **mask** (Tensor) - Max values' index represented by the mask. Data type is int32. | ||||
| Raises: | |||||
| TypeError: If the input data type is not float16 or float32. | |||||
| Supported Platforms: | Supported Platforms: | ||||
| ``Ascend`` | |||||
| ``Ascend`` ``GPU`` | |||||
| Examples: | Examples: | ||||
| >>> input_tensor = Tensor(np.arange(1 * 3 * 3 * 4).reshape((1, 3, 3, 4)), mindspore.float32) | >>> input_tensor = Tensor(np.arange(1 * 3 * 3 * 4).reshape((1, 3, 3, 4)), mindspore.float32) | ||||
| @@ -2357,8 +2362,8 @@ class SGD(PrimitiveWithCheck): | |||||
| """ | """ | ||||
| Computes the stochastic gradient descent. Momentum is optional. | Computes the stochastic gradient descent. Momentum is optional. | ||||
| Nesterov momentum is based on the formula from On the importance of | |||||
| initialization and momentum in deep learning. | |||||
| Nesterov momentum is based on the formula from paper 'On the importance of | |||||
| initialization and momentum in deep learning <http://proceedings.mlr.press/v28/sutskever13.html>'_. | |||||
| Note: | Note: | ||||
| For details, please refer to `nn.SGD` source code. | For details, please refer to `nn.SGD` source code. | ||||
| @@ -3005,7 +3010,7 @@ class Gelu(PrimitiveWithInfer): | |||||
| class FastGelu(PrimitiveWithInfer): | class FastGelu(PrimitiveWithInfer): | ||||
| r""" | r""" | ||||
| fast Gaussian Error Linear Units activation function. | |||||
| Fast Gaussian Error Linear Units activation function. | |||||
| FastGelu is defined as follows: | FastGelu is defined as follows: | ||||
| @@ -3181,7 +3186,8 @@ class LSTM(PrimitiveWithInfer): | |||||
| """ | """ | ||||
| Performs the Long Short-Term Memory (LSTM) on the input. | Performs the Long Short-Term Memory (LSTM) on the input. | ||||
| For detailed information, please refer to `nn.LSTM`. | |||||
| For detailed information, please refer to `nn.LSTM | |||||
| <https://www.mindspore.cn/doc/api_python/zh-CN/master/mindspore/nn/mindspore.nn.LSTM.html>`_. | |||||
| Supported Platforms: | Supported Platforms: | ||||
| ``GPU`` ``CPU`` | ``GPU`` ``CPU`` | ||||
| @@ -3289,14 +3295,13 @@ class SigmoidCrossEntropyWithLogits(PrimitiveWithInfer): | |||||
| r""" | r""" | ||||
| Uses the given logits to compute sigmoid cross entropy. | Uses the given logits to compute sigmoid cross entropy. | ||||
| Note: | |||||
| Sets input logits as `X`, input label as `Y`, output as `loss`. Then, | |||||
| Sets input logits as `X`, input label as `Y`, output as `loss`. Then, | |||||
| .. math:: | |||||
| p_{ij} = sigmoid(X_{ij}) = \frac{1}{1 + e^{-X_{ij}}} | |||||
| .. math:: | |||||
| p_{ij} = sigmoid(X_{ij}) = \frac{1}{1 + e^{-X_{ij}}} | |||||
| .. math:: | |||||
| loss_{ij} = -[Y_{ij} * ln(p_{ij}) + (1 - Y_{ij})ln(1 - p_{ij})] | |||||
| .. math:: | |||||
| loss_{ij} = -[Y_{ij} * ln(p_{ij}) + (1 - Y_{ij})ln(1 - p_{ij})] | |||||
| Inputs: | Inputs: | ||||
| - **logits** (Tensor) - Input logits. | - **logits** (Tensor) - Input logits. | ||||
| @@ -4376,22 +4381,21 @@ class BinaryCrossEntropy(PrimitiveWithInfer): | |||||
| r""" | r""" | ||||
| Computes the binary cross entropy between the target and the output. | Computes the binary cross entropy between the target and the output. | ||||
| Note: | |||||
| Sets input as :math:`x`, input label as :math:`y`, output as :math:`\ell(x, y)`. | |||||
| Let, | |||||
| Sets input as :math:`x`, input label as :math:`y`, output as :math:`\ell(x, y)`. | |||||
| Let, | |||||
| .. math:: | |||||
| L = \{l_1,\dots,l_N\}^\top, \quad | |||||
| l_n = - w_n \left[ y_n \cdot \log x_n + (1 - y_n) \cdot \log (1 - x_n) \right] | |||||
| .. math:: | |||||
| L = \{l_1,\dots,l_N\}^\top, \quad | |||||
| l_n = - w_n \left[ y_n \cdot \log x_n + (1 - y_n) \cdot \log (1 - x_n) \right] | |||||
| Then, | |||||
| Then, | |||||
| .. math:: | |||||
| \ell(x, y) = \begin{cases} | |||||
| L, & \text{if reduction} = \text{'none';}\\ | |||||
| \operatorname{mean}(L), & \text{if reduction} = \text{'mean';}\\ | |||||
| \operatorname{sum}(L), & \text{if reduction} = \text{'sum'.} | |||||
| \end{cases} | |||||
| .. math:: | |||||
| \ell(x, y) = \begin{cases} | |||||
| L, & \text{if reduction} = \text{'none';}\\ | |||||
| \operatorname{mean}(L), & \text{if reduction} = \text{'mean';}\\ | |||||
| \operatorname{sum}(L), & \text{if reduction} = \text{'sum'.} | |||||
| \end{cases} | |||||
| Args: | Args: | ||||
| reduction (str): Specifies the reduction to be applied to the output. | reduction (str): Specifies the reduction to be applied to the output. | ||||
| @@ -6568,6 +6572,21 @@ class DynamicGRUV2(PrimitiveWithInfer): | |||||
| r""" | r""" | ||||
| Applies a single-layer gated recurrent unit (GRU) to an input sequence. | Applies a single-layer gated recurrent unit (GRU) to an input sequence. | ||||
| .. math:: | |||||
| \begin{array}{ll} | |||||
| r_t = \sigma(W_{ir} x_t + b_{ir} + W_{hr} h_{(t-1)} + b_{hr}) \\ | |||||
| z_t = \sigma(W_{iz} x_t + b_{iz} + W_{hz} h_{(t-1)} + b_{hz}) \\ | |||||
| n_t = \tanh(W_{in} x_t + b_{in} + r_t * (W_{hn} h_{(t-1)}+ b_{hn})) \\ | |||||
| h_t = (1 - z_t) * n_t + z_t * h_{(t-1)} | |||||
| \end{array} | |||||
| where :math:`h_t` is the hidden state at time `t`, :math:`x_t` is the input | |||||
| at time `t`, :math:`h_{(t-1)}` is the hidden state of the layer | |||||
| at time `t-1` or the initial hidden state at time `0`, and :math:`r_t`, | |||||
| :math:`z_t`, :math:`n_t` are the reset, update, and new gates, respectively. | |||||
| :math:`\sigma` is the sigmoid function, and :math:`*` is the Hadamard product. | |||||
| Args: | Args: | ||||
| direction (str): A string identifying the direction in the op. Default: 'UNIDIRECTIONAL'. | direction (str): A string identifying the direction in the op. Default: 'UNIDIRECTIONAL'. | ||||
| Only 'UNIDIRECTIONAL' is currently supported. | Only 'UNIDIRECTIONAL' is currently supported. | ||||
| @@ -6619,6 +6638,8 @@ class DynamicGRUV2(PrimitiveWithInfer): | |||||
| - **hidden_new** (Tensor) - A Tensor of shape :math:`(\text{num_step}, \text{batch_size}, \text{hidden_size})`. | - **hidden_new** (Tensor) - A Tensor of shape :math:`(\text{num_step}, \text{batch_size}, \text{hidden_size})`. | ||||
| Has the same data type with input `bias_type`. | Has the same data type with input `bias_type`. | ||||
| A note about the bias_type: | |||||
| - If `bias_input` and `bias_hidden` both are `None`, `bias_type` is date type of `init_h`. | - If `bias_input` and `bias_hidden` both are `None`, `bias_type` is date type of `init_h`. | ||||
| - If `bias_input` is not `None`, `bias_type` is the date type of `bias_input`. | - If `bias_input` is not `None`, `bias_type` is the date type of `bias_input`. | ||||
| - If `bias_input` is `None` and `bias_hidden` is not `None, `bias_type` is the date type of `bias_hidden`. | - If `bias_input` is `None` and `bias_hidden` is not `None, `bias_type` is the date type of `bias_hidden`. | ||||
| @@ -6772,6 +6793,11 @@ class LRN(PrimitiveWithInfer): | |||||
| r""" | r""" | ||||
| Local Response Normalization. | Local Response Normalization. | ||||
| .. math:: | |||||
| b_{c} = a_{c}\left(k + \frac{\alpha}{n} | |||||
| \sum_{c'=\max(0, c-n/2)}^{\min(N-1,c+n/2)}a_{c'}^2\right)^{-\beta} | |||||
| Args: | Args: | ||||
| depth_radius (int): Half-width of the 1-D normalization window with the shape of 0-D. | depth_radius (int): Half-width of the 1-D normalization window with the shape of 0-D. | ||||
| bias (float): An offset (usually positive to avoid dividing by 0). | bias (float): An offset (usually positive to avoid dividing by 0). | ||||