You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

README.md 2.2 kB

6 years ago
12345678910111213141516171819202122232425262728293031323334353637383940
  1. # text_classification任务模型复现
  2. 这里使用fastNLP复现以下模型:
  3. char_cnn :论文链接[Character-level Convolutional Networks for Text Classification](https://arxiv.org/pdf/1509.01626v3.pdf)
  4. dpcnn:论文链接[Deep Pyramid Convolutional Neural Networks for TextCategorization](https://ai.tencent.com/ailab/media/publications/ACL3-Brady.pdf)
  5. HAN:论文链接[Hierarchical Attention Networks for Document Classification](https://www.cs.cmu.edu/~diyiy/docs/naacl16.pdf)
  6. LSTM+self_attention:论文链接[A Structured Self-attentive Sentence Embedding](https://arxiv.org/pdf/1703.03130.pdf)
  7. AWD-LSTM:论文链接[Regularizing and Optimizing LSTM Language Models](https://arxiv.org/pdf/1708.02182.pdf)
  8. #数据集来源
  9. IMDB:http://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz
  10. SST-2:https://firebasestorage.googleapis.com/v0/b/mtl-sentence-representations.appspot.com/o/data%2FSST-2.zip?alt=media&token=aabc5f6b-e466-44a2-b9b4-cf6337f84ac8
  11. SST:https://nlp.stanford.edu/sentiment/trainDevTestTrees_PTB.zip
  12. yelp_full:https://drive.google.com/drive/folders/0Bz8a_Dbh9Qhbfll6bVpmNUtUcFdjYmF2SEpmZUZUcVNiMUw1TWN6RDV3a0JHT3kxLVhVR2M
  13. yelp_polarity:https://drive.google.com/drive/folders/0Bz8a_Dbh9Qhbfll6bVpmNUtUcFdjYmF2SEpmZUZUcVNiMUw1TWN6RDV3a0JHT3kxLVhVR2M
  14. dataset |classes | train samples | dev samples | test samples|refer|
  15. :---: | :---: | :---: | :---: | :---: | :---: |
  16. yelp_polarity | 2 |560k | - |38k|[char_cnn](https://arxiv.org/pdf/1509.01626v3.pdf)|
  17. yelp_full | 5|650k | - |50k|[char_cnn](https://arxiv.org/pdf/1509.01626v3.pdf)|
  18. IMDB | 2 |25k | - |25k|[IMDB](https://ai.stanford.edu/~ang/papers/acl11-WordVectorsSentimentAnalysis.pdf)|
  19. sst-2 | 2 |67k | 872 |1.8k|[GLUE](https://arxiv.org/pdf/1804.07461.pdf)|
  20. # 数据集及复现结果汇总
  21. 使用fastNLP复现的结果vs论文汇报结果(/前为fastNLP实现,后面为论文报道,-表示论文没有在该数据集上列出结果)
  22. model name | yelp_p | yelp_f | sst-2|IMDB
  23. :---: | :---: | :---: | :---: |-----
  24. char_cnn | 93.80/95.12 | - | - |-
  25. dpcnn | 95.50/97.36 | - | - |-
  26. HAN |- | - | - |-
  27. LSTM| 95.74/- |64.16/- |- |88.52/-
  28. AWD-LSTM| 95.96/- |64.74/- |- |88.91/-
  29. LSTM+self_attention| 96.34/- | 65.78/- | - |89.53/-