Browse Source

!14375 Add optimizer formula to comments - Adagrad

From: @gerayking
Reviewed-by: @zh_qh,@zhunaipan
Signed-off-by: @zhunaipan
pull/14375/MERGE
mindspore-ci-bot Gitee 4 years ago
parent
commit
5f5ad74346
1 changed files with 11 additions and 1 deletions
  1. +11
    -1
      mindspore/nn/optim/ada_grad.py

+ 11
- 1
mindspore/nn/optim/ada_grad.py View File

@@ -36,12 +36,22 @@ def _check_param_value(accum, update_slots, prim_name=None):




class Adagrad(Optimizer): class Adagrad(Optimizer):
"""
r"""
Implements the Adagrad algorithm with ApplyAdagrad Operator. Implements the Adagrad algorithm with ApplyAdagrad Operator.


Adagrad is an online Learning and Stochastic Optimization. Adagrad is an online Learning and Stochastic Optimization.
Refer to paper `Efficient Learning using Forward-Backward Splitting Refer to paper `Efficient Learning using Forward-Backward Splitting
<https://proceedings.neurips.cc/paper/2009/file/621bf66ddb7c962aa0d22ac97d69b793-Paper.pdf>`_. <https://proceedings.neurips.cc/paper/2009/file/621bf66ddb7c962aa0d22ac97d69b793-Paper.pdf>`_.
The updating formulas are as follows,

.. math::
\begin{array}{ll} \\
h_{t} = h_{t-1} + g\\
w_{t} = w_{t-1} - lr*\frac{1}{\sqrt{h_{t}}}*g
\end{array}

:math:`h` represents the cumulative sum of gradient squared, :math:`g` represents `gradients`.
:math:`lr` represents `learning_rate`, :math:`w` represents `params`.


Note: Note:
When separating parameter groups, the weight decay in each group will be applied on the parameters if the When separating parameter groups, the weight decay in each group will be applied on the parameters if the


Loading…
Cancel
Save