fix tutorial_5

5 years ago · dd29021a59
--- a/docs/source/tutorials/tutorial_5_loss_optimizer.rst
+++ b/docs/source/tutorials/tutorial_5_loss_optimizer.rst
@@ -1,9 +1,8 @@
 ==============================================================================
 动手实现一个文本分类器I-使用Trainer和Tester快速训练和测试
 使用Trainer和Tester快速训练和测试
 ==============================================================================

 我们使用和 :doc:`/user/quickstart` 中一样的任务来进行详细的介绍。给出一段评价性文字，预测其情感倾向是积极的（label=0）、
 还是消极的（label=1），使用 :class:`~fastNLP.Trainer`  和  :class:`~fastNLP.Tester`  来进行快速训练和测试。
 我们使用前面介绍过的 :doc:`/tutorials/文本分类` 任务来进行详细的介绍。这里我们把数据集换成了SST2，使用 :class:`~fastNLP.Trainer`  和  :class:`~fastNLP.Tester`  来进行快速训练和测试。

 数据读入和处理
 -----------------
@@ -50,21 +49,22 @@
         
        Vocabulary(['hide', 'new', 'secretions', 'from', 'the']...)

    除了可以对数据进行读入的Pipe类，fastNLP还提供了读入和下载数据的Loader类，不同数据集的Pipe和Loader及其用法详见 :doc:` </tutorials/tutorial_4_load_dataset>` 。
    除了可以对数据进行读入的Pipe类，fastNLP还提供了读入和下载数据的Loader类，不同数据集的Pipe和Loader及其用法详见 :doc:`/tutorials/tutorial_4_load_dataset` 。
    
 数据集分割
    由于SST2数据集的测试集并不带有标签数值，故我们分割出一部分训练集作为测试集。下面这段代码展示了 :meth:`~fastNLP.DataSet.split`  的使用方法
    由于SST2数据集的测试集并不带有标签数值，故我们分割出一部分训练集作为测试集。下面这段代码展示了 :meth:`~fastNLP.DataSet.split`  的使用方法，
    为了能让读者快速运行完整个教程，我们只取了训练集的前5000个数据。

    .. code-block:: python

        train_data = databundle.get_dataset('train')
        train_data = databundle.get_dataset('train')[:5000]
        train_data, test_data = train_data.split(0.015)
        dev_data = databundle.get_dataset('dev')
        print(len(train_data),len(dev_data),len(test_data))

    输出结果为::
 	
        66339 872 1010
        4925 872 75

 数据集 :meth:`~fastNLP.DataSet.set_input` 和  :meth:`~fastNLP.DataSet.set_target` 函数
    :class:`~fastNLP.io.SST2Pipe`  类的 :meth:`~fastNLP.io.SST2Pipe.process_from_file` 方法在预处理过程中还将训练、测试、验证
@@ -92,7 +92,7 @@
    当 :mod:`~fastNLP.core.field` 设定为input或者target的时候才有存在的意义。

    is_input为true的 :mod:`~fastNLP.core.field` 在 :class:`~fastNLP.DataSetIter` 迭代取出的batch_x 中，而is_target为true
    的 :mod:`~fastNLP.core.field` 在:class:`~fastNLP.DataSetIter` 迭代取出的 batch_y 中。
    的 :mod:`~fastNLP.core.field` 在 :class:`~fastNLP.DataSetIter` 迭代取出的 batch_y 中。
    具体分析见 :doc:`使用DataSetIter实现自定义训练过程 </tutorials/tutorial_6_datasetiter>` 。

 使用内置模型训练
@@ -111,7 +111,7 @@
        #还可以传入 kernel_nums, kernel_sizes, padding, dropout的自定义值
        model_cnn = CNNText((len(vocab),EMBED_DIM), num_classes=2, dropout=0.1)

    使用fastNLP快速搭建自己的模型详见 :doc:`</tutorials/tutorial_8_modules_models>`  。
    使用fastNLP快速搭建自己的模型详见 :doc:`/tutorials/tutorial_8_modules_models`  。

 评价指标
    训练模型需要提供一个评价指标。这里使用准确率做为评价指标。
@@ -199,25 +199,25 @@
    训练过程的输出如下::

        input fields after batch(if batch size is 2):
            words: (1)type:torch.Tensor (2)dtype:torch.int64, (3)shape:torch.Size([2, 16])
            words: (1)type:torch.Tensor (2)dtype:torch.int64, (3)shape:torch.Size([2, 13])
            seq_len: (1)type:torch.Tensor (2)dtype:torch.int64, (3)shape:torch.Size([2])
        target fields after batch(if batch size is 2):
            target: (1)type:torch.Tensor (2)dtype:torch.int64, (3)shape:torch.Size([2])

        training epochs started 2019-09-17-14-29-00
        training epochs started 2020-02-26-16-45-40
        Evaluate data in 0.5 seconds!
        Evaluation on dev at Epoch 1/10. Step:308/3080:
        AccuracyMetric: acc=0.677752

        Evaluate data in 0.11 seconds!
        Evaluation on dev at Epoch 1/10. Step:4147/41470: 
        AccuracyMetric: acc=0.762615
        ......

        ...
        Evaluate data in 0.44 seconds!
        Evaluation on dev at Epoch 10/10. Step:3080/3080:
        AccuracyMetric: acc=0.725917

        Evaluate data in 0.2 seconds!
        Evaluation on dev at Epoch 10/10. Step:41470/41470: 
        AccuracyMetric: acc=0.769495

        In Epoch:2/Step:8294, got best dev performance:
        AccuracyMetric: acc=0.800459
        In Epoch:5/Step:1540, got best dev performance:
        AccuracyMetric: acc=0.740826
        Reloaded the best model.

 快速测试