|
|
|
@@ -398,8 +398,6 @@ class AdamWeightDecay(Optimizer): |
|
|
|
eps (float): Term added to the denominator to improve numerical stability. Default: 1e-6. |
|
|
|
Should be greater than 0. |
|
|
|
weight_decay (float): Weight decay (L2 penalty). It should be in range [0.0, 1.0]. Default: 0.0. |
|
|
|
decay_filter (Function): A function to determine whether to apply weight decay on parameters. Default: |
|
|
|
lambda x: 'LayerNorm' not in x.name and 'bias' not in x.name. |
|
|
|
|
|
|
|
Inputs: |
|
|
|
- **gradients** (tuple[Tensor]) - The gradients of `params`, the shape is the same as `params`. |
|
|
|
|