fix some issues of chinese api

4 years ago · 87e13c5458
--- a/docs/api/api_python/mindspore.parallel.rst
+++ b/docs/api/api_python/mindspore.parallel.rst
@@ -13,7 +13,7 @@
    **参数：**

    - **fully_use_devices** (bool) - 表示是否仅搜索充分利用所有可用设备的策略。默认值：True。例如，如果有8个可用设备，当该参数设为true时，策略(4, 1)将不包括在ReLU的候选策略中，因为策略(4, 1)仅使用4个设备。
    - **elementwise_op_strategy_follow** (bool) - 表示elementwise算子是否具有与后续算子一样的策略。默认值：False例如，Add跟随的ReLU，其中ReLU是elementwise算子。如果该参数设置为true，则算法搜索的策略可以保证这两个算子的策略是一致的，例如，ReLU的策略(8, 1)和Add的策略((8, 1), (8, 1))。
    - **elementwise_op_strategy_follow** (bool) - 表示elementwise算子是否具有与后续算子一样的策略。默认值：False。例如，Add的输出给了ReLU，其中ReLU是elementwise算子。如果该参数设置为true，则算法搜索的策略可以保证这两个算子的策略是一致的，例如，ReLU的策略(8, 1)和Add的策略((8, 1), (8, 1))。
    - **enable_algo_approxi** (bool) - 表示是否在算法中启用近似。默认值：False。由于大型DNN模型的并行搜索策略有较大的解空间，该算法在这种情况下耗时较长。为了缓解这种情况，如果该参数设置为true，则会进行近似丢弃一些候选策略，以便缩小解空间。
    - **algo_approxi_epsilon** (float) - 表示近似算法中使用的epsilon值。默认值：0.1 此值描述了近似程度。例如，一个算子的候选策略数量为S，如果 `enable_algo_approxi` 为true，则剩余策略的大小为min{S, 1/epsilon}。
    - **tensor_slice_align_enable** (bool) - 表示是否检查MatMul的tensor切片的shape。默认值：False 受某些硬件的属性限制，只有shape较大的MatMul内核才能显示出优势。如果该参数为true，则检查MatMul的切片shape以阻断不规则的shape。
@@ -58,4 +58,5 @@

    **异常：**

    ValueError：无法识别传入的关键字。
    - **ValueError** - 无法识别传入的关键字。
 
--- a/docs/api/api_python/nn/mindspore.nn.AdaSumByDeltaWeightWrapCell.rst
+++ b/docs/api/api_python/nn/mindspore.nn.AdaSumByDeltaWeightWrapCell.rst
@@ -23,7 +23,7 @@ mindspore.nn.AdaSumByDeltaWeightWrapCell

    **参数：**

    - **optimizer** (nn.optimizer) - 必须是单输入的优化器：
    - **optimizer** (nn.optimizer) - 必须是单输入的优化器。

    **输入：**

--- a/docs/api/api_python/nn/mindspore.nn.AdaSumByGradWrapCell.rst
+++ b/docs/api/api_python/nn/mindspore.nn.AdaSumByGradWrapCell.rst
@@ -23,7 +23,7 @@ mindspore.nn.AdaSumByGradWrapCell

    **参数：**

    - **optimizer** (nn.optimizer) - 必须是单输入的优化器：
    - **optimizer** (nn.optimizer) - 必须是单输入的优化器。

    **输入：**

--- a/docs/api/api_python/transformer/mindspore.nn.transformer.AttentionMask.rst
+++ b/docs/api/api_python/transformer/mindspore.nn.transformer.AttentionMask.rst
@@ -1,6 +1,6 @@
 .. py:class:: mindspore.nn.transformer.AttentionMask(seq_length, parallel_config=default_dpmp_config)

    从输入掩码中获取下三角矩阵。输入掩码是值为1或0的二维Tensor (batch_size, seq_length)。1表示当前位置是一个有效的标记，其他值则表示当前位置不是一个有效的标记。
    从输入掩码中获取下三角矩阵。输入掩码是值为1或0的二维Tensor (batch_size, seq_length)。1表示当前位置是一个有效的标记，0则表示当前位置不是一个有效的标记。

    **参数：**

--- a/docs/api/api_python/transformer/mindspore.nn.transformer.FeedForward.rst
+++ b/docs/api/api_python/transformer/mindspore.nn.transformer.FeedForward.rst
@@ -1,7 +1,7 @@
 .. py:class:: mindspore.nn.transformer.FeedForward(hidden_size, ffn_hidden_size, dropout_rate, hidden_act="gelu", expert_num=1, param_init_type=mstype.float32, parallel_config=default_dpmp_config)

    具有两层线性层的多层感知器，并行在最终输出上使用Dropout。第一层前馈层将输入维度从hidden_size投影到ffn_hidden_size，并在中间应用激活层。第二个线性将该维度从ffn_hidden_size投影到hidden_size。配置parallel_config之后，
    第一个前馈层的权重将在输入维度上被分片，第二个线性在输出维度上进行切分。总体过程如下
    具有两层线性层的多层感知器，并在最终输出上使用Dropout。第一个线性层将输入维度从hidden_size投影到ffn_hidden_size，并在中间应用激活层。第二个线性层将该维度从ffn_hidden_size投影到hidden_size。配置parallel_config之后，
    第一个线性层的权重将在输入维度上被分片，第二个线性层在输出维度上进行切分。总体过程如下

    .. math:
        Dropout((xW_1+b_1)W_2 + b_2))
@@ -12,9 +12,9 @@

    - **hidden_size** (int) - 表示输入的维度。
    - **ffn_hidden_size** (int) - 表示中间隐藏大小。
    - **dropout_rate** (float) - 表示第二个线性输出的丢弃率。
    - **hidden_act** (str) - 表示第一层前馈层的激活。其值可为'relu'、'relu6'、'tanh'、'gelu'、'fast_gelu'、'elu'、'sigmoid'、'prelu'、'leakyrelu'、'hswish'、'hsigmoid'、'logsigmoid'等等。默认值：gelu。
    - **expert_num** (int) - 表示线性中使用的专家数量。对于expert_num > 1用例，使用BatchMatMul。BatchMatMul中的第一个维度表示expert_num。默认值：1
    - **dropout_rate** (float) - 表示第二个线性层输出的丢弃率。
    - **hidden_act** (str) - 表示第一个线性层的激活。其值可为'relu'、'relu6'、'tanh'、'gelu'、'fast_gelu'、'elu'、'sigmoid'、'prelu'、'leakyrelu'、'hswish'、'hsigmoid'、'logsigmoid'等等。默认值：'gelu'。
    - **expert_num** (int) - 表示线性层中使用的专家数量。对于expert_num > 1用例，使用BatchMatMul。BatchMatMul中的第一个维度表示expert_num。默认值：1
    - **param_init_type** (dtype.Number) - 表示参数初始化类型。其值应为dtype.float32或dtype.float16。默认值：dtype.float32
    - **parallel_config** (OpParallelConfig) - 表示并行配置。更多详情，请参见 `OpParallelConfig` 。默认值为 `default_dpmp_config` ，表示一个带有默认参数的 `OpParallelConfig` 实例。

--- a/docs/api/api_python/transformer/mindspore.nn.transformer.MultiHeadAttention.rst
+++ b/docs/api/api_python/transformer/mindspore.nn.transformer.MultiHeadAttention.rst
@@ -2,7 +2,7 @@

    论文 `Attention Is All You Need <https://arxiv.org/pdf/1706.03762v5.pdf>`_ 中所述的多头注意力的实现。给定src_seq_length长度的query向量，tgt_seq_length长度的key向量和value，注意力计算流程如下：

    .. math:
    .. math::
           MultiHeadAttention(query, key, vector) = Dropout(Concat(head_1, \dots, head_h)W^O)

    其中， `head_i = Attention(QW_i^Q, KW_i^K, VW_i^V)` 。注意：输出层的投影计算中带有偏置参数。
--- a/docs/api/api_python/transformer/mindspore.nn.transformer.Transformer.rst
+++ b/docs/api/api_python/transformer/mindspore.nn.transformer.Transformer.rst
@@ -43,5 +43,5 @@
    Tuple，表示包含(`output`, `encoder_layer_present`, `encoder_layer_present`)的元组。

    - **output** (Tensor) - 如果只有编码器，则表示编码器层的输出logit。shape为[batch, src_seq_length, hidden_size] or [batch * src_seq_length, hidden_size]。如果有编码器和解码器，则输出来自于解码器层。shape为[batch, tgt_seq_length, hidden_size]或[batch * tgt_seq_length, hidden_size]。
    - **encoder_layer_present** (Tuple) - 大小为num_layers的元组，其中每个元组都是shape为((batch_size, num_heads, size_per_head, src_seq_length)或(batch_size, num_heads, src_seq_length, size_per_head))的自注意力中的投影key向量和value向量的tensor。
    - **decoder_layer_present** (Tuple) - 大小为num_layers的元组，其中每个元组都是shape为((batch_size, num_heads, size_per_head, tgt_seq_length)或(batch_size, num_heads, tgt_seq_length, size_per_head))的self attention中的投影key向量和value向量的tensor，或者是shape为(batch_size, num_heads, size_per_head, src_seq_length)或(batch_size, num_heads, src_seq_length, size_per_head))的交叉注意力中的投影key向量和value向量的tensor。如果未设置解码器，返回值将为None。
    - **encoder_layer_present** (Tuple) - 大小为num_layers的元组，其中每个元组都是shape为((batch_size, num_heads, size_per_head, src_seq_length)或(batch_size, num_heads, src_seq_length, size_per_head))的自注意力中的投影key向量和value向量的tensor的元组。
    - **decoder_layer_present** (Tuple) - 大小为num_layers的元组，其中每个元组都是shape为((batch_size, num_heads, size_per_head, tgt_seq_length)或(batch_size, num_heads, tgt_seq_length, size_per_head))的self attention中的投影key向量和value向量的tensor的元组，或者是shape为(batch_size, num_heads, size_per_head, src_seq_length)或(batch_size, num_heads, src_seq_length, size_per_head))的交叉注意力中的投影key向量和value向量的tensor的元组。如果未设置解码器，返回值将为None。
--- a/docs/api/api_python/transformer/mindspore.nn.transformer.TransformerDecoder.rst
+++ b/docs/api/api_python/transformer/mindspore.nn.transformer.TransformerDecoder.rst
@@ -41,4 +41,4 @@
    Tuple，表示一个包含(`output`, `layer_present`)的元组。

    - **output** (Tensor) - 输出的logit。shape为[batch, tgt_seq_length, hidden_size]或[batch * tgt_seq_length, hidden_size]。
    - **layer_present** (Tuple) - 大小为层数的元组，其中每个元组都是shape为((batch_size, num_heads, size_per_head, tgt_seq_length)或(batch_size, num_heads, tgt_seq_length, size_per_head)的自注意力中的投影key向量和value向量的tensor，或者是shape为(batch_size, num_heads, size_per_head, src_seq_length)或(batch_size, num_heads, src_seq_length, size_per_head))的交叉注意力中的投影key向量和value向量的tensor。
    - **layer_present** (Tuple) - 大小为层数的元组，其中每个元组都是shape为((batch_size, num_heads, size_per_head, tgt_seq_length)或(batch_size, num_heads, tgt_seq_length, size_per_head)的自注意力中的投影key向量和value向量的tensor的元组，或者是shape为(batch_size, num_heads, size_per_head, src_seq_length)或(batch_size, num_heads, src_seq_length, size_per_head))的交叉注意力中的投影key向量和value向量的tensor的元组。
--- a/docs/api/api_python/transformer/mindspore.nn.transformer.TransformerDecoderLayer.rst
+++ b/docs/api/api_python/transformer/mindspore.nn.transformer.TransformerDecoderLayer.rst
@@ -35,4 +35,4 @@
    Tuple，表示一个包含(`output`, `layer_present`)的元组。

    - **output** (Tensor) - 此层的输出logit。shape为[batch, seq_length, hidden_size]或[batch * seq_length, hidden_size]。
    - **layer_present** (Tensor) - 元组，其中每个元组都是shape为((batch_size, num_heads, size_per_head, tgt_seq_length)或(batch_size, num_heads, tgt_seq_length, size_per_head)的自注意力中的投影key向量和value向量的tensor，或者是shape为(batch_size, num_heads, size_per_head, src_seq_length)或(batch_size, num_heads, src_seq_length, size_per_head))的交叉注意力中的投影key向量和value向量的tensor。
    - **layer_present** (Tuple) - 元组，其中每个元组都是shape为((batch_size, num_heads, size_per_head, tgt_seq_length)或(batch_size, num_heads, tgt_seq_length, size_per_head)的自注意力中的投影key向量和value向量的tensor的元组，或者是shape为(batch_size, num_heads, size_per_head, src_seq_length)或(batch_size, num_heads, src_seq_length, size_per_head))的交叉注意力中的投影key向量和value向量的tensor的元组。
--- a/docs/api/api_python/transformer/mindspore.nn.transformer.TransformerEncoder.rst
+++ b/docs/api/api_python/transformer/mindspore.nn.transformer.TransformerEncoder.rst
@@ -35,4 +35,4 @@
    Tuple，表示一个包含(`output`, `layer_present`)的元组。

    - **output** (Tensor) - use_past为False或is_first_iteration为True时，表示shape为(batch_size, seq_length, hidden_size)或(batch_size * seq_length, hidden_size)的层输出的float tensor。否则，shape将为(batch_size, 1, hidden_size)。
    - **layer_present** (Tuple) - 大小为num_layers的元组，其中每个元组都包含shape为((batch_size, num_heads, size_per_head, seq_length)或(batch_size, num_heads, seq_length, size_per_head))的投影key向量和value向量的Tensor。
    - **layer_present** (Tuple) - 大小为num_layers的元组，其中每个元组都包含shape为((batch_size, num_heads, size_per_head, seq_length)或(batch_size, num_heads, seq_length, size_per_head))的投影key向量和value向量的Tensor的元组。
--- a/mindspore/python/mindspore/nn/transformer/transformer.py
+++ b/mindspore/python/mindspore/nn/transformer/transformer.py
@@ -1504,7 +1504,7 @@ class TransformerDecoderLayer(Cell):

            - **output** (Tensor) - The output logit of this layer. The shape is [batch, seq_length, hidden_size] or
              [batch * seq_length, hidden_size].
            - **layer_present** (Tensor) - A tuple, where each tuple is the tensor of the projected key and value
            - **layer_present** (Tuple) - A tuple, where each tuple is the tensor of the projected key and value
              vector in self attention with shape ((batch_size, num_heads, size_per_head, tgt_seq_length),
              (batch_size, num_heads, tgt_seq_length, size_per_head), and of the projected key and value vector
              in cross attention with shape  (batch_size, num_heads, size_per_head, src_seq_length),