change for 中文api视检

4 years ago · ce0ce12a08
--- a/docs/api/api_python/nn/mindspore.nn.thor.rst
+++ b/docs/api/api_python/nn/mindspore.nn.thor.rst
@@ -28,12 +28,12 @@ mindspore.nn.thor
    :math:`\otimes` 表示克罗内克尔积， :math:`\gamma` 表示学习率。

    .. note::
        在分离参数组时，如果权重衰减为正，则每个组的权重衰减将应用于参数。当不分离参数组时，如果 `weight_decay` 为正数，则API中的 `weight_decay` 将应用于名称中没有'beta'或 'gamma'的参数。
        在分离参数组时，每个组的 `weight_decay` 将应用于对应参数。当不分离参数组时，优化器中的 `weight_decay` 将应用于名称中没有'beta'或 'gamma'的参数。

        在分离参数组时，如果要集中梯度，请将grad_centralization设置为True，但梯度集中只能应用于卷积层的参数。
        在分离参数组时，如果要集中梯度，请将grad_centralization设置为True，但集中梯度只能应用于卷积层的参数。
        如果非卷积层的参数设置为True，则会报错。

        为了提高参数组的性能，可以支持参数的自定义顺序。
        为了提高参数组的性能，可以支持自定义参数的顺序。

    **参数：**
        
@@ -42,13 +42,13 @@ mindspore.nn.thor
    - **damping** (Tensor) - 阻尼值。
    - **momentum** (float) - float类型的超参数，表示移动平均的动量。至少为0.0。
    - **weight_decay** (int, float) - 权重衰减（L2 penalty）。必须等于或大于0.0。默认值：0.0。
    - **loss_scale** (float) - loss缩放的值。必须大于0.0。一般情况下，使用默认值。默认值：1.0。
    - **loss_scale** (float) - loss损失缩放系数。必须大于0.0。一般情况下，使用默认值。默认值：1.0。
    - **batch_size** (int) - batch的大小。默认值：32。
    - **use_nesterov** (bool) - 启用Nesterov动量。默认值：False。
    - **decay_filter** (function) - 用于确定权重衰减应用于哪些层的函数，只有在weight_decay>0时才有效。默认值：lambda x: x.name not in []。
    - **split_indices** (list) - 按A/G层（A/G含义见上述公式）索引设置allreduce融合策略。仅在分布式计算中有效。ResNet50作为一个样本，A/G的层数分别为54层，当split_indices设置为[26,53]时，表示A/G被分成两组allreduce，一组为0~26层，另一组是27~53层。默认值：None。
    - **enable_clip_grad** (bool) - 是否剪切梯度。默认值：False。
    - **frequency** (int) - A/G和$A^{-1}/G^{-1}$的更新间隔。当频率等于N（N大于1）时，A/G和$A^{-1}/G^{-1}$将每N步更新一次，和其他步骤将使用过时的A/G和$A^{-1}/G^{-1}$更新权重。默认值：100。
    - **frequency** (int) - A/G和$A^{-1}/G^{-1}$的更新间隔。每隔frequency个step，A/G和$A^{-1}/G^{-1}$将更新一次。必须大于1。默认值：100。

    **输入：**

--- a/docs/api/api_python/train/mindspore.train.train_thor.ConvertModelUtils.rst
+++ b/docs/api/api_python/train/mindspore.train.train_thor.ConvertModelUtils.rst
@@ -9,7 +9,7 @@

        **参数：**
        
        - **model** (Object) - 用于训练的高级API。 `Model` 将图层分组到具有训练特征的对象中。
        - **model** (Object) - 用于训练的高级API。 
        - **network** (Cell) - 训练网络。
        - **loss_fn** (Cell) - 目标函数。默认值：None。
        - **optimizer** (Cell) - 用于更新权重的优化器。默认值：None。
@@ -19,11 +19,11 @@
          - **O0** - 不改变。
          - **O2** - 将网络转换为float16，使用动态loss scale保持BN在float32中运行。
          - **O3** - 将网络强制转换为float16，并使用附加属性 `keep_batchnorm_fp32=False` 。
          - **auto** - 在不同设备中，将级别设置为建议级别。GPU上建议使用O2，Ascend上建议使用O3。建议级别基于专家经验，不能总是一概而论。用户应指定特殊网络的级别。
          - **auto** - 在不同设备中，将级别设置为建议级别。GPU上建议使用O2，Ascend上建议使用O3。建议级别基于专家经验，不能总是一概而论。对于特殊网络，用户需要指定对应的混合精度训练级别。

        - **loss_scale_manager** (Union[None, LossScaleManager]) - 如果为None，则不会按比例缩放loss。否则，通过LossScaleManager和优化器缩放loss不能为None。这是一个关键参数。例如，使用 `loss_scale_manager=None` 设置值。
        - **loss_scale_manager** (Union[None, LossScaleManager]) - 如果为None，则不会按比例缩放loss。否则，需设置LossScaleManager，且优化器的入参loss_scale不为None。这是一个关键参数。例如，使用 `loss_scale_manager=None` 设置值。
        - **keep_batchnorm_fp32** (bool) - 保持BN在 `float32` 中运行。如果为True，则将覆盖之前的级别设置。默认值：False。

        **返回：**

        model (Object) - 用于训练的高级API。 `Model` 将图层分组到具有训练特征的对象中。
        model (Object) - 用于训练的高级API。 
--- a/mindspore/python/mindspore/nn/optim/thor.py
+++ b/mindspore/python/mindspore/nn/optim/thor.py
@@ -31,7 +31,6 @@ from mindspore.nn.wrap import DistributedGradReducer
 from mindspore.train.train_thor.convert_utils import ConvertNetUtils
 from mindspore.parallel._auto_parallel_context import auto_parallel_context


 # Enumerates types of Layer
 Other = -1
 Conv = 1
@@ -60,6 +59,7 @@ def _tensor_run_opt_ext(opt, momentum, learning_rate, gradient, weight, moment):
    success = F.depend(success, opt(weight, moment, learning_rate, gradient, momentum))
    return success


 IS_ENABLE_GLOBAL_NORM = False
 GRADIENT_CLIP_TYPE = 1
 GRADIENT_CLIP_VALUE = 1.0
@@ -100,6 +100,7 @@ def clip_gradient(enable_clip_grad, gradients):
            gradients = hyper_map_op(F.partial(clip_grad, GRADIENT_CLIP_TYPE, GRADIENT_CLIP_VALUE), gradients)
    return gradients


 C0 = 16


@@ -272,6 +273,15 @@ def thor(net, learning_rate, damping, momentum, weight_decay=0.0, loss_scale=1.0
    :math:`\lambda` represents :math:`damping`, :math:`g_i` represents gradients of the i-th layer,
    :math:`\otimes` represents Kronecker product, :math:`\gamma` represents 'learning rate'

     Note:
        When a parameter group is separated, 'weight_decay' of each group is applied to the corresponding parameter.
        'weight_decay' in the optimizer is applied to arguments that do not have 'beta' or 'gamma' in their name
        when the argument group is not separated.
        When separating parameter groups, set grad_centralization to True if you want to concentrate gradients,
        but concentration gradients can only be applied to parameters of the convolution layer.
        If the parameter for the unconvolutional layer is set to True, an error will be reported.
        To improve the performance of parameter groups, you can customize the order of parameters.

    Args:
        net (Cell): The training network.

@@ -361,6 +371,7 @@ class ThorGpu(Optimizer):
    """
    ThorGpu
    """

    def __init__(self, net, learning_rate, damping, momentum, weight_decay=0.0, loss_scale=1.0, batch_size=32,
                 use_nesterov=False, decay_filter=lambda x: x.name not in [], split_indices=None,
                 enable_clip_grad=False, frequency=100):
@@ -432,7 +443,6 @@ class ThorGpu(Optimizer):
        self.square = P.Square()
        self.expand = P.ExpandDims()


    def _define_gpu_reducer(self, split_indices):
        """define gpu reducer"""
        self.parallel_mode = context.get_auto_parallel_context("parallel_mode")
@@ -449,7 +459,6 @@ class ThorGpu(Optimizer):
            self.grad_reducer_a = DistributedGradReducer(self.matrix_a_cov, mean, degree, fusion_type=6)
            self.grad_reducer_g = DistributedGradReducer(self.matrix_a_cov, mean, degree, fusion_type=8)


    def _process_matrix_init_and_weight_idx_map(self, net):
        """for GPU, process matrix init shape, and get weight idx map"""
        layer_type_map = get_net_layertype_mask(net)
@@ -703,7 +712,6 @@ class ThorAscend(Optimizer):
        self.frequency = frequency
        self._define_ascend_reducer(split_indices)


    def get_frequency(self):
        """get thor frequency"""
        return self.frequency
@@ -890,7 +898,6 @@ class ThorAscend(Optimizer):
            input_matrix = self.concat((input_matrix, matrix_sup))
        return input_matrix


    def _get_abs_max(self, matrix_inv, origin_dim):
        """get matrix abs max"""
        cholesky_shape = self.shape(matrix_inv)
@@ -904,7 +911,6 @@ class ThorAscend(Optimizer):
            matrix_max = P.ReduceMax(keep_dims=False)(matrix_abs)
        return matrix_max, matrix_inv


    def _get_fc_ainv_ginv(self, index, damping_step, gradients, matrix_a_allreduce, matrix_g_allreduce,
                          matrix_a_max_allreduce, matrix_g_max_allreduce):
        """get fc layer ainv and ginv"""
@@ -983,7 +989,6 @@ class ThorAscend(Optimizer):
                                  (0, self.C0 - in_channels)))(matrix_a_inv)
        return matrix_a_inv


    def _get_ainv_ginv_amax_gmax_list(self, gradients, damping_step, matrix_a_allreduce, matrix_g_allreduce,
                                      matrix_a_max_allreduce, matrix_g_max_allreduce):
        """get matrixA inverse list, matrixG inverse list, matrixA_max list, matrixG_max list"""
--- a/mindspore/python/mindspore/train/train_thor/convert_utils.py
+++ b/mindspore/python/mindspore/train/train_thor/convert_utils.py
@@ -185,7 +185,6 @@ class ConvertModelUtils:

        Args:
            model (Object): High-Level API for Training.
                            `Model` groups layers into an object with training features.
            network (Cell): A training network.
            loss_fn (Cell): Objective function. Default: None.
            optimizer (Cell): Optimizer used to updating the weights. Default: None.
@@ -208,7 +207,6 @@ class ConvertModelUtils:

        Returns:
             model (Object): High-Level API for Training.
                            `Model` groups layers into an object with training features.

        Supported Platforms:
            ``Ascend`` ``GPU``