|
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116 |
- Class mindspore.nn.Adam(*args, **kwargs)
-
- ͨAdaptive Moment Estimation (Adam)㷨ݶȡ
-
- `Adam: A Method for Stochastic Optimization <https://arxiv.org/abs/1412.6980>`_
-
- ʽ£
-
- .. math::
- \begin{array}{ll} \\
- m_{t+1} = \beta_1 * m_{t} + (1 - \beta_1) * g \\
- v_{t+1} = \beta_2 * v_{t} + (1 - \beta_2) * g * g \\
- l = \alpha * \frac{\sqrt{1-\beta_2^t}}{1-\beta_1^t} \\
- w_{t+1} = w_{t} - l * \frac{m_{t+1}}{\sqrt{v_{t+1}} + \epsilon}
- \end{array}
-
- :math:`m`һ`moment1`:math:`v`ڶ`moment2`:math:`g``gradients`:math:`l`ӣ:math:`\beta_1,\beta_2``beta1``beta2`:math:`t`²裬:math:`beta_1^t`:math:`beta_2^t``beta1_power``beta2_power`:math:`\alpha``learning_rate`:math:`w``params`:math:`\epsilon``eps`
-
- ע
- ǰʹSparseGatherV2ӣŻִϡ㣬ͨ`target`ΪCPUhostϽϡ㡣
- ϡڳС
-
- ڲδʱŻõ`weight_decay`Ӧƺ"beta""gamma"ͨɵȨ˥ԡʱÿ`weight_decay`δãʹŻõ`weight_decay`
-
-
-
- params (Union[list[Parameter], list[dict]]): `Parameter`ɵбֵɵббԪֵʱֵļ"params""lr""weight_decay""grad_centralization""order_params"
-
- - params: ǰȨأֵ`Parameter`б
-
- - lr: ѡд"lr"ʹöӦֵΪѧϰʡ
- ûУʹŻõ`learning_rate`Ϊѧϰʡ
-
- - weight_decay: ѡд"weight_decayʹöӦֵΪȨ˥ֵûУʹŻõ`weight_decay`ΪȨ˥ֵ
-
- - grad_centralization: ѡд"grad_centralization"ʹöӦֵֵΪ͡ûУΪ`grad_centralization`ΪFalse
- òھ㡣
-
- - order_params: ѡӦֵԤڵIJ˳ʹò鹦ʱͨʹø`parameters`˳ܡ
- д"order_params"Ըе"order_params"еIJijһ`params`С
-
- learning_rate (Union[float, Tensor, Iterable, LearningRateSchedule]): Ĭֵ1e-3
-
- - float: ̶ѧϰʡڵ㡣
-
- - int: ̶ѧϰʡڵ㡣ͻᱻתΪ
-
- - Tensor: DZһάǹ̶ѧϰʡһάǶ̬ѧϰʣiȡеiֵΪѧϰʡ
-
- - Iterable: ̬ѧϰʡiȡiֵΪѧϰʡ
-
- - LearningRateSchedule: ̬ѧϰʡѵУŻʹòstepΪ룬`LearningRateSchedule`ʵ㵱ǰѧϰʡ
-
- beta1 (float): `moment1`ָ˥ʡΧ0.0,1.0
- Ĭֵ0.9
-
- beta2 (float): `moment2`ָ˥ʡΧ0.0,1.0
- Ĭֵ0.999
-
- eps (float): ӵĸУֵȶԡ0Ĭֵ1e-8
-
- use_locking (bool): ǷԲ¼
- ΪTrue`w``m``v`tensor½ܵı
- ΪFalseԤ⡣ĬֵFalse
-
- use_nesterov (bool): ǷʹNesterov Accelerated Gradient (NAG)㷨ݶȡ
- ΪTrueʹNAGݶȡ
- ΪFalseڲʹNAG¸ݶȡĬֵFalse
-
- weight_decay (float): Ȩ˥L2 penaltyڵ0Ĭֵ0.0
-
- loss_scale (float): ݶϵ0`loss_scale`תΪͨʹĬֵѵʱʹ`FixedLossScaleManager``FixedLossScaleManager``drop_overflow_update`ΪFalseʱֵҪ`FixedLossScaleManager`е`loss_scale`ͬйظϸϢclass`mindspore.FixedLossScaleManager`
- Ĭֵ1.0
-
- 룺
- - **gradients** (tuple[Tensor]) - `params`ݶȣ״shape`params`ͬ
-
-
- Tensor[bool]ֵΪTrue
-
- 쳣
- TypeError: `learning_rate`intfloatTensorIterableLearningRateSchedule
- TypeError: `parameters`ԪزParameterֵ䡣
- TypeError: `beta1``beta2``eps``loss_scale`float
- TypeError: `weight_decay`floatint
- TypeError: `use_locking``use_nesterov`bool
- ValueError: `loss_scale``eps`Сڻ0
- ValueError: `beta1``beta2`ڣ0.0,1.0Χڡ
- ValueError: `weight_decay`С0
-
- ֧ƽ̨
- ``Ascend`` ``GPU`` ``CPU``
-
- ʾ
- >>> net = Net()
- >>> #1) вʹͬѧϰʺȨ˥
- >>> optim = nn.Adam(params=net.trainable_params())
- >>>
- >>> #2) ʹò鲢òֵͬ
- >>> conv_params = list(filter(lambda x: 'conv' in x.name, net.trainable_params()))
- >>> no_conv_params = list(filter(lambda x: 'conv' not in x.name, net.trainable_params()))
- >>> group_params = [{'params': conv_params, 'weight_decay': 0.01, 'grad_centralization':True},
- ... {'params': no_conv_params, 'lr': 0.01},
- ... {'order_params': net.trainable_params()}]
- >>> optim = nn.Adam(group_params, learning_rate=0.1, weight_decay=0.0)
- >>> # conv_params齫ʹŻеѧϰ0.1Ȩ˥0.01ݶĻTrue
- >>> # no_conv_params齫ʹøѧϰ0.01ŻеȨ˥0.0ݶĻʹĬֵFalse
- >>> # Ż"order_params"õIJ˳²
- >>>
- >>> loss = nn.SoftmaxCrossEntropyWithLogits()
- >>> model = Model(net, loss_fn=loss, optimizer=optim)
-
-
- target
-
- ָhostϻ豸deviceϸ²Ϊstrֻ'CPU''Ascend''GPU'
|