Merge pull request !5063 from lijiaqi/momentum_and_sgdtags/v1.0.0
| @@ -56,12 +56,12 @@ class Momentum(Optimizer): | |||||
| .. math:: | .. math:: | ||||
| v_{t} = v_{t-1} \ast u + gradients | v_{t} = v_{t-1} \ast u + gradients | ||||
| If use_nesterov is True: | |||||
| .. math:: | |||||
| If use_nesterov is True: | |||||
| .. math:: | |||||
| p_{t} = p_{t-1} - (grad \ast lr + v_{t} \ast u \ast lr) | p_{t} = p_{t-1} - (grad \ast lr + v_{t} \ast u \ast lr) | ||||
| If use_nesterov is Flase: | |||||
| .. math:: | |||||
| If use_nesterov is Flase: | |||||
| .. math:: | |||||
| p_{t} = p_{t-1} - lr \ast v_{t} | p_{t} = p_{t-1} - lr \ast v_{t} | ||||
| Here: where grad, lr, p, v and u denote the gradients, learning_rate, params, moments, and momentum respectively. | Here: where grad, lr, p, v and u denote the gradients, learning_rate, params, moments, and momentum respectively. | ||||
| @@ -49,12 +49,12 @@ class SGD(Optimizer): | |||||
| .. math:: | .. math:: | ||||
| v_{t+1} = u \ast v_{t} + gradient \ast (1-dampening) | v_{t+1} = u \ast v_{t} + gradient \ast (1-dampening) | ||||
| If nesterov is True: | |||||
| .. math:: | |||||
| If nesterov is True: | |||||
| .. math:: | |||||
| p_{t+1} = p_{t} - lr \ast (gradient + u \ast v_{t+1}) | p_{t+1} = p_{t} - lr \ast (gradient + u \ast v_{t+1}) | ||||
| If nesterov is Flase: | |||||
| .. math:: | |||||
| If nesterov is Flase: | |||||
| .. math:: | |||||
| p_{t+1} = p_{t} - lr \ast v_{t+1} | p_{t+1} = p_{t} - lr \ast v_{t+1} | ||||
| To be noticed, for the first step, v_{t+1} = gradient | To be noticed, for the first step, v_{t+1} = gradient | ||||
| @@ -82,7 +82,7 @@ class WithGradCell(Cell): | |||||
| Wraps the network with backward cell to compute gradients. A network with a loss function is necessary | Wraps the network with backward cell to compute gradients. A network with a loss function is necessary | ||||
| as argument. If loss function in None, the network must be a wrapper of network and loss function. This | as argument. If loss function in None, the network must be a wrapper of network and loss function. This | ||||
| Cell accepts *inputs as inputs and returns gradients for each trainable parameter. | |||||
| Cell accepts '*inputs' as inputs and returns gradients for each trainable parameter. | |||||
| Note: | Note: | ||||
| Run in PyNative mode. | Run in PyNative mode. | ||||