You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

文本分类.rst 21 kB

5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542
  1. 文本分类
  2. =============================
  3. 文本分类(Text classification)任务是将一句话或一段话划分到某个具体的类别。比如垃圾邮件识别,文本情绪分类等。这篇教程可以带你从零开始了解 fastNLP 的使用
  4. .. note::
  5. 本教程推荐使用 GPU 进行实验
  6. .. code-block:: text
  7. 1, 商务大床房,房间很大,床有2M宽,整体感觉经济实惠不错!
  8. 其中开头的1是只这条评论的标签,表示是正面的情绪。我们将使用到的数据可以通过 `此链接 <http://download.fastnlp.top/dataset/chn_senti_corp.zip>`_
  9. 下载并解压,当然也可以通过fastNLP自动下载该数据。
  10. 数据中的内容如下图所示。接下来,我们将用fastNLP在这个数据上训练一个分类网络。
  11. .. figure:: ./cn_cls_example.png
  12. :alt: jupyter
  13. 步骤
  14. ----
  15. 一共有以下的几个步骤:
  16. 1. `读取数据 <#id4>`_
  17. 2. `预处理数据 <#id5>`_
  18. 3. `选择预训练词向量 <#id6>`_
  19. 4. `创建模型 <#id7>`_
  20. 5. `训练模型 <#id8>`_
  21. (1) 读取数据
  22. ~~~~~~~~~~~~~~~~~~~~
  23. fastNLP提供多种数据的自动下载与自动加载功能,对于这里我们要用到的数据,我们可以用 :class:`~fastNLP.io.Loader` 自动下载并加载该数据。
  24. 更多有关Loader的使用可以参考 :mod:`~fastNLP.io.loader`
  25. .. code-block:: python
  26. from fastNLP.io import ChnSentiCorpLoader
  27. loader = ChnSentiCorpLoader() # 初始化一个中文情感分类的loader
  28. data_dir = loader.download() # 这一行代码将自动下载数据到默认的缓存地址, 并将该地址返回
  29. data_bundle = loader.load(data_dir) # 这一行代码将从{data_dir}处读取数据至DataBundle
  30. DataBundle的相关介绍,可以参考 :class:`~fastNLP.io.DataBundle` 。我们可以打印该data\_bundle的基本信息。
  31. .. code-block:: python
  32. print(data_bundle)
  33. .. code-block:: text
  34. In total 3 datasets:
  35. dev has 1200 instances.
  36. train has 9600 instances.
  37. test has 1200 instances.
  38. In total 0 vocabs:
  39. 可以看出,该data\_bundle中一个含有三个 :class:`~fastNLP.DataSet` 。通过下面的代码,我们可以查看DataSet的基本情况
  40. .. code-block:: python
  41. print(data_bundle.get_dataset('train')[:2]) # 查看Train集前两个sample
  42. .. code-block:: text
  43. +-----------------------------+--------+
  44. | raw_chars | target |
  45. +-----------------------------+--------+
  46. | 选择珠江花园的原因就是方... | 1 |
  47. | 15.4寸笔记本的键盘确实爽... | 1 |
  48. +-----------------------------+--------+
  49. (2) 预处理数据
  50. ~~~~~~~~~~~~~~~~~~~~
  51. 在NLP任务中,预处理一般包括:
  52. (a) 将一整句话切分成汉字或者词;
  53. (b) 将文本转换为index
  54. fastNLP中也提供了多种数据集的处理类,这里我们直接使用fastNLP的ChnSentiCorpPipe。更多关于Pipe的说明可以参考 :mod:`~fastNLP.io.pipe` 。
  55. .. code-block:: python
  56. from fastNLP.io import ChnSentiCorpPipe
  57. pipe = ChnSentiCorpPipe()
  58. data_bundle = pipe.process(data_bundle) # 所有的Pipe都实现了process()方法,且输入输出都为DataBundle类型
  59. print(data_bundle) # 打印data_bundle,查看其变化
  60. .. code-block:: text
  61. In total 3 datasets:
  62. dev has 1200 instances.
  63. train has 9600 instances.
  64. test has 1200 instances.
  65. In total 2 vocabs:
  66. chars has 4409 entries.
  67. target has 2 entries.
  68. 可以看到除了之前已经包含的3个 :class:`~fastNLP.DataSet` ,还新增了两个 :class:`~fastNLP.Vocabulary` 。我们可以打印DataSet中的内容
  69. .. code-block:: python
  70. print(data_bundle.get_dataset('train')[:2])
  71. .. code-block:: text
  72. +-----------------+--------+-----------------+---------+
  73. | raw_chars | target | chars | seq_len |
  74. +-----------------+--------+-----------------+---------+
  75. | 选择珠江花园... | 0 | [338, 464, 1... | 106 |
  76. | 15.4寸笔记本... | 0 | [50, 133, 20... | 56 |
  77. +-----------------+--------+-----------------+---------+
  78. 新增了一列为数字列表的chars,以及变为数字的target列。可以看出这两列的名称和刚好与data\_bundle中两个Vocabulary的名称是一致的,我们可以打印一下Vocabulary看一下里面的内容。
  79. .. code-block:: python
  80. char_vocab = data_bundle.get_vocab('chars')
  81. print(char_vocab)
  82. .. code-block:: text
  83. Vocabulary(['选', '择', '珠', '江', '花']...)
  84. Vocabulary是一个记录着词语与index之间映射关系的类,比如
  85. .. code-block:: python
  86. index = char_vocab.to_index('选')
  87. print("'选'的index是{}".format(index)) # 这个值与上面打印出来的第一个instance的chars的第一个index是一致的
  88. print("index:{}对应的汉字是{}".format(index, char_vocab.to_word(index)))
  89. .. code-block:: text
  90. '选'的index是338
  91. index:338对应的汉字是选
  92. (3) 选择预训练词向量
  93. ~~~~~~~~~~~~~~~~~~~~
  94. 由于Word2vec, Glove, Elmo, Bert等预训练模型可以增强模型的性能,所以在训练具体任务前,选择合适的预训练词向量非常重要。
  95. 在fastNLP中我们提供了多种Embedding使得加载这些预训练模型的过程变得更加便捷。
  96. 这里我们先给出一个使用word2vec的中文汉字预训练的示例,之后再给出一个使用Bert的文本分类。
  97. 这里使用的预训练词向量为'cn-fastnlp-100d',fastNLP将自动下载该embedding至本地缓存,
  98. fastNLP支持使用名字指定的Embedding以及相关说明可以参见 :mod:`fastNLP.embeddings`
  99. .. code-block:: python
  100. from fastNLP.embeddings import StaticEmbedding
  101. word2vec_embed = StaticEmbedding(char_vocab, model_dir_or_name='cn-char-fastnlp-100d')
  102. .. code-block:: text
  103. Found 4321 out of 4409 compound in the pre-training embedding.
  104. (4) 创建模型
  105. ~~~~~~~~~~~~
  106. .. code-block:: python
  107. from torch import nn
  108. from fastNLP.modules import LSTM
  109. import torch
  110. # 定义模型
  111. class BiLSTMMaxPoolCls(nn.Module):
  112. def __init__(self, embed, num_classes, hidden_size=400, num_layers=1, dropout=0.3):
  113. super().__init__()
  114. self.embed = embed
  115. self.lstm = LSTM(self.embed.embedding_dim, hidden_size=hidden_size//2, num_layers=num_layers,
  116. batch_first=True, bidirectional=True)
  117. self.dropout_layer = nn.Dropout(dropout)
  118. self.fc = nn.Linear(hidden_size, num_classes)
  119. def forward(self, chars, seq_len): # 这里的名称必须和DataSet中相应的field对应,比如之前我们DataSet中有chars,这里就必须为chars
  120. # chars:[batch_size, max_len]
  121. # seq_len: [batch_size, ]
  122. chars = self.embed(chars)
  123. outputs, _ = self.lstm(chars, seq_len)
  124. outputs = self.dropout_layer(outputs)
  125. outputs, _ = torch.max(outputs, dim=1)
  126. outputs = self.fc(outputs)
  127. return {'pred':outputs} # [batch_size,], 返回值必须是dict类型,且预测值的key建议设为pred
  128. # 初始化模型
  129. model = BiLSTMMaxPoolCls(word2vec_embed, len(data_bundle.get_vocab('target')))
  130. (5) 训练模型
  131. ~~~~~~~~~~~~
  132. fastNLP提供了Trainer对象来组织训练过程,包括完成loss计算(所以在初始化Trainer的时候需要指定loss类型),梯度更新(所以在初始化Trainer的时候需要提供优化器optimizer)以及在验证集上的性能验证(所以在初始化时需要提供一个Metric)
  133. .. code-block:: python
  134. from fastNLP import Trainer
  135. from fastNLP import CrossEntropyLoss
  136. from torch.optim import Adam
  137. from fastNLP import AccuracyMetric
  138. loss = CrossEntropyLoss()
  139. optimizer = Adam(model.parameters(), lr=0.001)
  140. metric = AccuracyMetric()
  141. device = 0 if torch.cuda.is_available() else 'cpu' # 如果有gpu的话在gpu上运行,训练速度会更快
  142. trainer = Trainer(train_data=data_bundle.get_dataset('train'), model=model, loss=loss,
  143. optimizer=optimizer, batch_size=32, dev_data=data_bundle.get_dataset('dev'),
  144. metrics=metric, device=device)
  145. trainer.train() # 开始训练,训练完成之后默认会加载在dev上表现最好的模型
  146. # 在测试集上测试一下模型的性能
  147. from fastNLP import Tester
  148. print("Performance on test is:")
  149. tester = Tester(data=data_bundle.get_dataset('test'), model=model, metrics=metric, batch_size=64, device=device)
  150. tester.test()
  151. .. code-block:: text
  152. input fields after batch(if batch size is 2):
  153. target: (1)type:torch.Tensor (2)dtype:torch.int64, (3)shape:torch.Size([2])
  154. chars: (1)type:torch.Tensor (2)dtype:torch.int64, (3)shape:torch.Size([2, 106])
  155. seq_len: (1)type:torch.Tensor (2)dtype:torch.int64, (3)shape:torch.Size([2])
  156. target fields after batch(if batch size is 2):
  157. target: (1)type:torch.Tensor (2)dtype:torch.int64, (3)shape:torch.Size([2])
  158. seq_len: (1)type:torch.Tensor (2)dtype:torch.int64, (3)shape:torch.Size([2])
  159. Evaluate data in 0.01 seconds!
  160. training epochs started 2019-09-03-23-57-10
  161. Evaluate data in 0.43 seconds!
  162. Evaluation on dev at Epoch 1/10. Step:300/3000:
  163. AccuracyMetric: acc=0.81
  164. Evaluate data in 0.44 seconds!
  165. Evaluation on dev at Epoch 2/10. Step:600/3000:
  166. AccuracyMetric: acc=0.8675
  167. Evaluate data in 0.44 seconds!
  168. Evaluation on dev at Epoch 3/10. Step:900/3000:
  169. AccuracyMetric: acc=0.878333
  170. ....
  171. Evaluate data in 0.48 seconds!
  172. Evaluation on dev at Epoch 9/10. Step:2700/3000:
  173. AccuracyMetric: acc=0.8875
  174. Evaluate data in 0.43 seconds!
  175. Evaluation on dev at Epoch 10/10. Step:3000/3000:
  176. AccuracyMetric: acc=0.895833
  177. In Epoch:7/Step:2100, got best dev performance:
  178. AccuracyMetric: acc=0.8975
  179. Reloaded the best model.
  180. Evaluate data in 0.34 seconds!
  181. [tester]
  182. AccuracyMetric: acc=0.8975
  183. {'AccuracyMetric': {'acc': 0.8975}}
  184. PS: 使用Bert进行文本分类
  185. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  186. .. code-block:: python
  187. # 只需要切换一下Embedding即可
  188. from fastNLP.embeddings import BertEmbedding
  189. # 这里为了演示一下效果,所以默认Bert不更新权重
  190. bert_embed = BertEmbedding(char_vocab, model_dir_or_name='cn', auto_truncate=True, requires_grad=False)
  191. model = BiLSTMMaxPoolCls(bert_embed, len(data_bundle.get_vocab('target')))
  192. import torch
  193. from fastNLP import Trainer
  194. from fastNLP import CrossEntropyLoss
  195. from torch.optim import Adam
  196. from fastNLP import AccuracyMetric
  197. loss = CrossEntropyLoss()
  198. optimizer = Adam(model.parameters(), lr=2e-5)
  199. metric = AccuracyMetric()
  200. device = 0 if torch.cuda.is_available() else 'cpu' # 如果有gpu的话在gpu上运行,训练速度会更快
  201. trainer = Trainer(train_data=data_bundle.get_dataset('train'), model=model, loss=loss,
  202. optimizer=optimizer, batch_size=16, dev_data=data_bundle.get_dataset('test'),
  203. metrics=metric, device=device, n_epochs=3)
  204. trainer.train() # 开始训练,训练完成之后默认会加载在dev上表现最好的模型
  205. # 在测试集上测试一下模型的性能
  206. from fastNLP import Tester
  207. print("Performance on test is:")
  208. tester = Tester(data=data_bundle.get_dataset('test'), model=model, metrics=metric, batch_size=64, device=device)
  209. tester.test()
  210. .. code-block:: text
  211. loading vocabulary file ~/.fastNLP/embedding/bert-chinese-wwm/vocab.txt
  212. Load pre-trained BERT parameters from file ~/.fastNLP/embedding/bert-chinese-wwm/chinese_wwm_pytorch.bin.
  213. Start to generating word pieces for word.
  214. Found(Or segment into word pieces) 4286 words out of 4409.
  215. input fields after batch(if batch size is 2):
  216. target: (1)type:torch.Tensor (2)dtype:torch.int64, (3)shape:torch.Size([2])
  217. chars: (1)type:torch.Tensor (2)dtype:torch.int64, (3)shape:torch.Size([2, 106])
  218. seq_len: (1)type:torch.Tensor (2)dtype:torch.int64, (3)shape:torch.Size([2])
  219. target fields after batch(if batch size is 2):
  220. target: (1)type:torch.Tensor (2)dtype:torch.int64, (3)shape:torch.Size([2])
  221. seq_len: (1)type:torch.Tensor (2)dtype:torch.int64, (3)shape:torch.Size([2])
  222. Evaluate data in 0.05 seconds!
  223. training epochs started 2019-09-04-00-02-37
  224. Evaluate data in 15.89 seconds!
  225. Evaluation on dev at Epoch 1/3. Step:1200/3600:
  226. AccuracyMetric: acc=0.9
  227. Evaluate data in 15.92 seconds!
  228. Evaluation on dev at Epoch 2/3. Step:2400/3600:
  229. AccuracyMetric: acc=0.904167
  230. Evaluate data in 15.91 seconds!
  231. Evaluation on dev at Epoch 3/3. Step:3600/3600:
  232. AccuracyMetric: acc=0.918333
  233. In Epoch:3/Step:3600, got best dev performance:
  234. AccuracyMetric: acc=0.918333
  235. Reloaded the best model.
  236. Performance on test is:
  237. Evaluate data in 29.24 seconds!
  238. [tester]
  239. AccuracyMetric: acc=0.919167
  240. {'AccuracyMetric': {'acc': 0.919167}}
  241. PS: 基于词进行文本分类
  242. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  243. 由于汉字中没有显示的字与字的边界,一般需要通过分词器先将句子进行分词操作。
  244. 下面的例子演示了如何不基于fastNLP已有的数据读取、预处理代码进行文本分类。
  245. (1) 读取数据
  246. ~~~~~~~~~~~~~~~~~~~~
  247. 这里我们继续以之前的数据为例,但这次我们不使用fastNLP自带的数据读取代码
  248. .. code-block:: python
  249. from fastNLP.io import ChnSentiCorpLoader
  250. loader = ChnSentiCorpLoader() # 初始化一个中文情感分类的loader
  251. data_dir = loader.download() # 这一行代码将自动下载数据到默认的缓存地址, 并将该地址返回
  252. 获取到的data_dir下应该有类似以下的文件
  253. .. code-block:: text
  254. - chn_senti_corp
  255. - train.tsv
  256. - dev.tsv
  257. - test.tsv
  258. 如果打开任何一个文件查看,会发现里面的格式均为
  259. .. code-block:: text
  260. target raw_chars
  261. 1 这个宾馆比较陈旧了,特价的房间也很一般。总体来说一般
  262. 0 怀着十分激动的心情放映...
  263. 下面我们先定义一个read_file_to_dataset的函数, 即给定一个文件路径,读取其中的内容,并返回一个DataSet。然后我们将所有的DataSet放入到DataBundle对象中来方便接下来的预处理
  264. .. code-block:: python
  265. import os
  266. from fastNLP import DataSet, Instance
  267. from fastNLP.io import DataBundle
  268. def read_file_to_dataset(fp):
  269. ds = DataSet()
  270. with open(fp, 'r') as f:
  271. f.readline() # 第一行是title名称,忽略掉
  272. for line in f:
  273. line = line.strip()
  274. target, chars = line.split('\t')
  275. ins = Instance(target=target, raw_chars=chars)
  276. ds.append(ins)
  277. return ds
  278. data_bundle = DataBundle()
  279. for name in ['train.tsv', 'dev.tsv', 'test.tsv']:
  280. fp = os.path.join(data_dir, name)
  281. ds = read_file_to_dataset(fp)
  282. data_bundle.set_dataset(name=name.split('.')[0], dataset=ds)
  283. print(data_bundle) # 查看以下数据集的情况
  284. # In total 3 datasets:
  285. # train has 9600 instances.
  286. # dev has 1200 instances.
  287. # test has 1200 instances.
  288. (2) 数据预处理
  289. ~~~~~~~~~~~~~~~~~~~~
  290. 在这里,我们首先把句子通过 fastHan_ 进行分词操作,然后创建词表,并将词语转换为序号。
  291. .. _fastHan: https://gitee.com/fastnlp/fastHan
  292. .. code-block:: python
  293. from fastHan import FastHan
  294. from fastNLP import Vocabulary
  295. model=FastHan()
  296. # model.set_device('cuda') # 可以注视掉这一行增加速度
  297. # 定义分词处理操作
  298. def word_seg(ins):
  299. raw_chars = ins['raw_chars']
  300. # 由于有些句子比较长,我们只截取前128个汉字
  301. raw_words = model(raw_chars[:128], target='CWS')[0]
  302. return raw_words
  303. for name, ds in data_bundle.iter_datasets():
  304. # apply函数将对内部的instance依次执行word_seg操作,并把其返回值放入到raw_words这个field
  305. ds.apply(word_seg, new_field_name='raw_words')
  306. # 除了apply函数,fastNLP还支持apply_field, apply_more(可同时创建多个field)等操作
  307. # 同时我们增加一个seq_len的field
  308. ds.add_seq_len('raw_words')
  309. vocab = Vocabulary()
  310. # 对raw_words列创建词表, 建议把非训练集的dataset放在no_create_entry_dataset参数中
  311. # 也可以通过add_word(), add_word_lst()等建立词表,请参考http://www.fastnlp.top/docs/fastNLP/tutorials/tutorial_2_vocabulary.html
  312. vocab.from_dataset(data_bundle.get_dataset('train'), field_name='raw_words',
  313. no_create_entry_dataset=[data_bundle.get_dataset('dev'),
  314. data_bundle.get_dataset('test')])
  315. # 将建立好词表的Vocabulary用于对raw_words列建立词表,并把转为序号的列存入到words列
  316. vocab.index_dataset(data_bundle.get_dataset('train'), data_bundle.get_dataset('dev'),
  317. data_bundle.get_dataset('test'), field_name='raw_words', new_field_name='words')
  318. # 建立target的词表,target的词表一般不需要padding和unknown
  319. target_vocab = Vocabulary(padding=None, unknown=None)
  320. # 一般情况下我们可以只用训练集建立target的词表
  321. target_vocab.from_dataset(data_bundle.get_dataset('train'), field_name='target')
  322. # 如果没有传递new_field_name, 则默认覆盖原词表
  323. target_vocab.index_dataset(data_bundle.get_dataset('train'), data_bundle.get_dataset('dev'),
  324. data_bundle.get_dataset('test'), field_name='target')
  325. # 我们可以把词表保存到data_bundle中,方便之后使用
  326. data_bundle.set_vocab(field_name='words', vocab=vocab)
  327. data_bundle.set_vocab(field_name='target', vocab=target_vocab)
  328. # 我们把words和target分别设置为input和target,这样它们才会在训练循环中被取出并自动padding, 有关这部分更多的内容参考
  329. # http://www.fastnlp.top/docs/fastNLP/tutorials/tutorial_6_datasetiter.html
  330. data_bundle.set_target('target')
  331. data_bundle.set_input('words') # DataSet也有这两个接口
  332. # 如果某些field,您希望它被设置为target或者input,但是不希望fastNLP自动padding或需要使用特定的padding方式,请参考
  333. # http://www.fastnlp.top/docs/fastNLP/fastNLP.core.dataset.html
  334. print(data_bundle.get_dataset('train')[:2]) # 我们可以看一下当前dataset的内容
  335. # +--------+-----------------------+-----------------------+----------------------+
  336. # | target | raw_chars | raw_words | words |
  337. # +--------+-----------------------+-----------------------+----------------------+
  338. # | 0 | 选择珠江花园的原因... | ['选择', '珠江', ... | [2, 3, 4, 5, 6, 7... |
  339. # | 0 | 15.4寸笔记本的键盘... | ['15.4', '寸', '笔... | [71, 72, 73, 74, ... |
  340. # +--------+-----------------------+-----------------------+----------------------+
  341. # 由于之后需要使用之前定义的BiLSTMMaxPoolCls模型,所以需要将words这个field修改为chars
  342. data_bundle.rename_field('words', 'chars')
  343. 我们可以打印一下vocab看一下当前的词表内容
  344. .. code-block:: python
  345. print(data_bundle.get_vocab('chars'))
  346. # Vocabulary([选择, 珠江, 花园, 的, 原因]...)
  347. (3) 选择预训练词向量
  348. ~~~~~~~~~~~~~~~~~~~~
  349. 这里我们选择腾讯的预训练中文词向量,可以在 腾讯词向量_ 处下载并解压。这里我们不能直接使用BERT,因为BERT是基于中文字进行预训练的。
  350. .. _腾讯词向量: https://ai.tencent.com/ailab/nlp/en/embedding.html
  351. 下面我们使用 :mod:`fastNLP.embeddings` 加载该词向量,fastNLP会抽取vocabulary中包含的词的向量,并随机初始化不包含在文件中的词语的词向量。
  352. .. code-block:: python
  353. from fastNLP.embeddings import StaticEmbedding
  354. word2vec_embed = StaticEmbedding(data_bundle.get_vocab('chars'), model_dir_or_name='/path/to/Tencent_AILab_ChineseEmbedding.txt')
  355. 再之后的模型定义与训练过程与上面是一致的,这里就不再赘述了。
  356. ----------------------------------
  357. 代码下载
  358. ----------------------------------
  359. .. raw:: html
  360. <a href="../_static/notebooks/%E6%96%87%E6%9C%AC%E5%88%86%E7%B1%BB.ipynb" download="文本分类.ipynb">点击下载 IPython Notebook 文件 </a><hr>