Are you sure you want to delete this task? Once this task is deleted, it cannot be recovered.
|
|
4 years ago | |
|---|---|---|
| .. | ||
| scripts | 4 years ago | |
| src | 4 years ago | |
| README.md | 4 years ago | |
| eval.py | 4 years ago | |
| export.py | 4 years ago | |
| requirements.txt | 4 years ago | |
| train.py | 4 years ago | |
FastText is a fast text classification algorithm, which is simple and efficient. It was proposed by Armand
Joulin, Tomas Mikolov etc. in the article "Bag of Tricks for Efficient Text Classification" in 2016. It is similar to
CBOW in model architecture, where the middle word is replace by a label. FastText adopts ngram feature as addition feature
to get some information about words. It speeds up training and testing while maintaining high precision, and widly used
in various tasks of text classification.
Paper: "Bag of Tricks for Efficient Text Classification", 2016, A. Joulin, E. Grave, P. Bojanowski, and T. Mikolov
The FastText model mainly consists of an input layer, hidden layer and output layer, where the input is a sequence of words (text or sentence).
The output layer is probability that the words sequence belongs to different categories. The hidden layer is formed by average of multiple word vector.
The feature is mapped to the hidden layer through linear transformation, and then mapped to the label from the hidden layer.
Note that you can run the scripts based on the dataset mentioned in original paper or widely used in relevant domain/network
architecture. In the following sections, we will introduce how to run the scripts using the related dataset below.
After dataset preparation, you can start training and evaluation as follows:
# run training example
cd ./scripts
sh run_standalone_train.sh [TRAIN_DATASET] [DEVICEID]
# run distributed training example
sh run_distribute_train.sh [TRAIN_DATASET] [RANK_TABLE_PATH]
# run evaluation example
sh run_eval.sh [EVAL_DATASET_PATH] [DATASET_NAME] [MODEL_CKPT] [DEVICEID]
The FastText network script and code result are as follows:
├── fasttext
├── README.md // Introduction of FastText model.
├── src
│ ├──config.py // Configuration instance definition.
│ ├──create_dataset.py // Dataset preparation.
│ ├──fasttext_model.py // FastText model architecture.
│ ├──fasttext_train.py // Use FastText model architecture.
│ ├──load_dataset.py // Dataset loader to feed into model.
│ ├──lr_scheduler.py // Learning rate scheduler.
├── scripts
│ ├──run_distributed_train.sh // shell script for distributed train on ascend.
│ ├──run_eval.sh // shell script for standalone eval on ascend.
│ ├──run_standalone_train.sh // shell script for standalone eval on ascend.
│ ├──run_distributed_train_gpu.sh // shell script for distributed train on GPU.
│ ├──run_eval_gpu.sh // shell script for standalone eval on GPU.
│ ├──run_standalone_train_gpu.sh // shell script for standalone train on GPU.
├── eval.py // Infer API entry.
├── requirements.txt // Requirements of third party package.
├── train.py // Train API entry.
Download the AG's News Topic Classification Dataset, DBPedia Ontology Classification Dataset and Yelp Review Polarity Dataset. Unzip datasets to any path you want.
Run the following scripts to do data preprocess and convert the original data to mindrecord for training and evaluation.
cd scripts
sh creat_dataset.sh [SOURCE_DATASET_PATH] [DATASET_NAME]
Parameters for both training and evaluation can be set in config.py. All the datasets are using same parameter name, parameters value could be changed according the needs.
Network Parameters
vocab_size # vocabulary size.
buckets # bucket sequence length.
test_buckets # test dataset bucket sequence length
batch_size # batch size of input dataset.
embedding_dims # The size of each embedding vector.
num_class # number of labels.
epoch # total training epochs.
lr # initial learning rate.
min_lr # minimum learning rate.
warmup_steps # warm up steps.
poly_lr_scheduler_power # a value used to calculate decayed learning rate.
pretrain_ckpt_dir # pretrain checkpoint direction.
keep_ckpt_max # Max ckpt files number.
Running on Ascend
Start task training on a single device and run the shell script
cd ./scripts
sh run_standalone_train.sh [DATASET_PATH]
Running scripts for distributed training of FastText. Task training on multiple device and run the following command in bash to be executed in scripts/:
cd ./scripts
sh run_distributed_train.sh [DATASET_PATH] [RANK_TABLE_PATH]
Running on GPU
Start task training on a single device and run the shell script
cd ./scripts
sh run_standalone_train_gpu.sh [DATASET_PATH]
Running scripts for distributed training of FastText. Task training on multiple device and run the following command in bash to be executed in scripts/:
cd ./scripts
sh run_distributed_train_gpu.sh [DATASET_PATH] [NUM_OF_DEVICES]
Running on Ascend
Running scripts for evaluation of FastText. The commdan as below.
cd ./scripts
sh run_eval.sh [DATASET_PATH] [DATASET_NAME] [MODEL_CKPT]
Note: The DATASET_PATH is path to mindrecord. eg. /dataset_path/*.mindrecord
Running on GPU
Running scripts for evaluation of FastText. The commdan as below.
cd ./scripts
sh run_eval_gpu.sh [DATASET_PATH] [DATASET_NAME] [MODEL_CKPT]
Note: The DATASET_PATH is path to mindrecord. eg. /dataset_path/*.mindrecord
| Parameters | Ascend | GPU |
|---|---|---|
| Resource | Ascend 910; OS Euler2.8 | NV SMX3 V100-32G |
| uploaded Date | 12/21/2020 (month/day/year) | 1/29/2021 (month/day/year) |
| MindSpore Version | 1.1.0 | 1.1.0 |
| Dataset | AG's News Topic Classification Dataset | AG's News Topic Classification Dataset |
| Training Parameters | epoch=5, batch_size=512 | epoch=5, batch_size=512 |
| Optimizer | Adam | Adam |
| Loss Function | Softmax Cross Entropy | Softmax Cross Entropy |
| outputs | probability | probability |
| Speed | 10ms/step (1pcs) | 11.91ms/step(1pcs) |
| Epoch Time | 2.36s (1pcs) | 2.815s(1pcs) |
| Loss | 0.0067 | 0.0085 |
| Params (M) | 22 | 22 |
| Checkpoint for inference | 254M (.ckpt file) | 254M (.ckpt file) |
| Scripts | fasttext | fasttext |
| Parameters | Ascend | GPU |
|---|---|---|
| Resource | Ascend 910; OS Euler2.8 | NV SMX3 V100-32G |
| uploaded Date | 11/21/2020 (month/day/year) | 1/29/2020 (month/day/year) |
| MindSpore Version | 1.1.0 | 1.1.0 |
| Dataset | DBPedia Ontology Classification Dataset | DBPedia Ontology Classification Dataset |
| Training Parameters | epoch=5, batch_size=4096 | epoch=5, batch_size=4096 |
| Optimizer | Adam | Adam |
| Loss Function | Softmax Cross Entropy | Softmax Cross Entropy |
| outputs | probability | probability |
| Speed | 58ms/step (1pcs) | 34.82ms/step(1pcs) |
| Epoch Time | 8.15s (1pcs) | 4.87s(1pcs) |
| Loss | 2.6e-4 | 0.0004 |
| Params (M) | 106 | 106 |
| Checkpoint for inference | 1.2G (.ckpt file) | 1.2G (.ckpt file) |
| Scripts | fasttext | fasttext |
| Parameters | Ascend | GPU |
|---|---|---|
| Resource | Ascend 910; OS Euler2.8 | NV SMX3 V100-32G |
| uploaded Date | 11/21/2020 (month/day/year) | 1/29/2020 (month/day/year) |
| MindSpore Version | 1.1.0 | 1.1.0 |
| Dataset | Yelp Review Polarity Dataset | Yelp Review Polarity Dataset |
| Training Parameters | epoch=5, batch_size=2048 | epoch=5, batch_size=2048 |
| Optimizer | Adam | Adam |
| Loss Function | Softmax Cross Entropy | Softmax Cross Entropy |
| outputs | probability | probability |
| Speed | 101ms/step (1pcs) | 30.54ms/step(1pcs) |
| Epoch Time | 28s (1pcs) | 8.46s(1pcs) |
| Loss | 0.062 | 0.002 |
| Params (M) | 103 | 103 |
| Checkpoint for inference | 1.2G (.ckpt file) | 1.2G (.ckpt file) |
| Scripts | fasttext | fasttext |
| Parameters | Ascend | GPU |
|---|---|---|
| Resource | Ascend 910; OS Euler2.8 | NV SMX3 V100-32G |
| Uploaded Date | 12/21/2020 (month/day/year) | 1/29/2020 (month/day/year) |
| MindSpore Version | 1.1.0 | 1.1.0 |
| Dataset | AG's News Topic Classification Dataset | AG's News Topic Classification Dataset |
| batch_size | 512 | 128 |
| Epoch Time | 2.36s | 2.815s(1pcs) |
| outputs | label index | label index |
| Accuracy | 92.53 | 92.58 |
| Model for inference | 254M (.ckpt file) | 254M (.ckpt file) |
| Parameters | Ascend | GPU |
|---|---|---|
| Resource | Ascend 910; OS Euler2.8 | NV SMX3 V100-32G |
| Uploaded Date | 12/21/2020 (month/day/year) | 1/29/2020 (month/day/year) |
| MindSpore Version | 1.1.0 | 1.1.0 |
| Dataset | DBPedia Ontology Classification Dataset | DBPedia Ontology Classification Dataset |
| batch_size | 4096 | 4096 |
| Epoch Time | 8.15s | 4.87s |
| outputs | label index | label index |
| Accuracy | 98.6 | 98.49 |
| Model for inference | 1.2G (.ckpt file) | 1.2G (.ckpt file) |
| Parameters | Ascend | GPU |
|---|---|---|
| Resource | Ascend 910; OS Euler2.8 | NV SMX3 V100-32G |
| Uploaded Date | 12/21/2020 (month/day/year) | 12/29/2020 (month/day/year) |
| MindSpore Version | 1.1.0 | 1.1.0 |
| Dataset | Yelp Review Polarity Dataset | Yelp Review Polarity Dataset |
| batch_size | 2048 | 2048 |
| Epoch Time | 28s | 8.46s |
| outputs | label index | label index |
| Accuracy | 95.7 | 95.7 |
| Model for inference | 1.2G (.ckpt file) | 1.2G (.ckpt file) |
There only one random situation.
Some seeds have already been set in train.py to avoid the randomness of weight initialization.
This model has been validated in the Ascend environment and is not validated on the CPU and GPU.
Please check the official homepage
MindSpore is a new open source deep learning training/inference framework that could be used for mobile, edge and cloud scenarios.
C++ Python Text Unity3D Asset C other