Browse Source

!5063 modify sgd and momentum and WithGradCell comments

Merge pull request !5063 from lijiaqi/momentum_and_sgd
tags/v1.0.0
mindspore-ci-bot Gitee 5 years ago
parent
commit
15ae3702f9
3 changed files with 9 additions and 9 deletions
  1. +4
    -4
      mindspore/nn/optim/momentum.py
  2. +4
    -4
      mindspore/nn/optim/sgd.py
  3. +1
    -1
      mindspore/nn/wrap/cell_wrapper.py

+ 4
- 4
mindspore/nn/optim/momentum.py View File

@@ -56,12 +56,12 @@ class Momentum(Optimizer):
.. math::
v_{t} = v_{t-1} \ast u + gradients

If use_nesterov is True:
.. math::
If use_nesterov is True:
.. math::
p_{t} = p_{t-1} - (grad \ast lr + v_{t} \ast u \ast lr)

If use_nesterov is Flase:
.. math::
If use_nesterov is Flase:
.. math::
p_{t} = p_{t-1} - lr \ast v_{t}

Here: where grad, lr, p, v and u denote the gradients, learning_rate, params, moments, and momentum respectively.


+ 4
- 4
mindspore/nn/optim/sgd.py View File

@@ -49,12 +49,12 @@ class SGD(Optimizer):
.. math::
v_{t+1} = u \ast v_{t} + gradient \ast (1-dampening)

If nesterov is True:
.. math::
If nesterov is True:
.. math::
p_{t+1} = p_{t} - lr \ast (gradient + u \ast v_{t+1})

If nesterov is Flase:
.. math::
If nesterov is Flase:
.. math::
p_{t+1} = p_{t} - lr \ast v_{t+1}

To be noticed, for the first step, v_{t+1} = gradient


+ 1
- 1
mindspore/nn/wrap/cell_wrapper.py View File

@@ -82,7 +82,7 @@ class WithGradCell(Cell):

Wraps the network with backward cell to compute gradients. A network with a loss function is necessary
as argument. If loss function in None, the network must be a wrapper of network and loss function. This
Cell accepts *inputs as inputs and returns gradients for each trainable parameter.
Cell accepts '*inputs' as inputs and returns gradients for each trainable parameter.

Note:
Run in PyNative mode.


Loading…
Cancel
Save