Browse Source

[to #43726282] fix bugs and refine docs

1. remove pai-easynlp temporarily due to its hard dependency on scipy==1.5.4
2. fix sentiment classification output
3. update quickstart and trainer doc

Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/9646399
master
wenmeng.zwm 3 years ago
parent
commit
49192f94be
5 changed files with 12 additions and 22 deletions
  1. +2
    -2
      docs/source/quick_start.md
  2. +2
    -14
      docs/source/tutorials/trainer.md
  3. +1
    -1
      modelscope/metainfo.py
  4. +4
    -4
      modelscope/outputs.py
  5. +3
    -1
      requirements/nlp.txt

+ 2
- 2
docs/source/quick_start.md View File

@@ -1,7 +1,7 @@
# 快速开始
ModelScope Library目前支持tensorflow,pytorch深度学习框架进行模型训练、推理, 在Python 3.7+, Pytorch 1.8+, Tensorflow1.13-1.15,Tensorflow 2.x上测试可运行。
ModelScope Library目前支持tensorflow,pytorch深度学习框架进行模型训练、推理, 在Python 3.7+, Pytorch 1.8+, Tensorflow1.15,Tensorflow 2.x上测试可运行。

注: 当前(630)版本 `语音相关`的功能仅支持 python3.7,tensorflow1.13-1.15的`linux`环境使用。 其他功能可以在windows、mac上安装使用。
注: `语音相关`的功能仅支持 python3.7,tensorflow1.15的`linux`环境使用。 其他功能可以在windows、mac上安装使用。

## python环境配置
首先,参考[文档](https://docs.anaconda.com/anaconda/install/) 安装配置Anaconda环境


+ 2
- 14
docs/source/tutorials/trainer.md View File

@@ -8,22 +8,10 @@ Modelscope提供了众多预训练模型,你可以使用其中任意一个,

在开始Finetuning前,需要准备一个数据集用以训练和评估,详细可以参考数据集使用教程。

`临时写法`,我们通过数据集接口创建一个虚假的dataset
```python
from datasets import Dataset
dataset_dict = {
'sentence1': [
'This is test sentence1-1', 'This is test sentence2-1',
'This is test sentence3-1'
],
'sentence2': [
'This is test sentence1-2', 'This is test sentence2-2',
'This is test sentence3-2'
],
'label': [0, 1, 1]
}
train_dataset = MsDataset.from_hf_dataset(Dataset.from_dict(dataset_dict))
eval_dataset = MsDataset.from_hf_dataset(Dataset.from_dict(dataset_dict))
train_dataset = MsDataset.load'afqmc_small', namespace='modelscope', split='train')
eval_dataset = MsDataset.load('afqmc_small', namespace='modelscope', split='validation')
```
### 训练
ModelScope把所有训练相关的配置信息全部放到了模型仓库下的`configuration.json`中,因此我们只需要创建Trainer,加载配置文件,传入数据集即可完成训练。


+ 1
- 1
modelscope/metainfo.py View File

@@ -141,7 +141,7 @@ class Trainers(object):
Holds the standard trainer name to use for identifying different trainer.
This should be used to register trainers.

For a general Trainer, you can use easynlp-trainer/ofa-trainer.
For a general Trainer, you can use EpochBasedTrainer.
For a model specific Trainer, you can use ${ModelName}-${Task}-trainer.
"""



+ 4
- 4
modelscope/outputs.py View File

@@ -214,10 +214,10 @@ TASK_OUTPUTS = {
Tasks.nli: [OutputKeys.SCORES, OutputKeys.LABELS],

# sentiment classification result for single sample
# {
# "labels": ["happy", "sad", "calm", "angry"],
# "scores": [0.9, 0.1, 0.05, 0.05]
# }
# {
# 'scores': [0.07183828949928284, 0.9281617403030396],
# 'labels': ['1', '0']
# }
Tasks.sentiment_classification: [OutputKeys.SCORES, OutputKeys.LABELS],

# zero-shot classification result for single sample


+ 3
- 1
requirements/nlp.txt View File

@@ -1,6 +1,8 @@
en_core_web_sm>=2.3.5
fairseq>=0.10.2
pai-easynlp
# temporarily remove pai-easynl due to its hard dependency scipy==1.5.4
# will be added back
# pai-easynlp
# rough-score was just recently updated from 0.0.4 to 0.0.7
# which introduced compatability issues that are being investigated
rouge_score<=0.0.4


Loading…
Cancel
Save