You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

README.md 4.2 kB

5 years ago
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576
  1. # It is still under development.
  2. # Contents
  3. - [Contents](#contents)
  4. - [GPT Description](#bert-description)
  5. - [Model Architecture](#model-architecture)
  6. - [Dataset](#dataset)
  7. - [Environment Requirements](#environment-requirements)
  8. - [Quick Start](#quick-start)
  9. - [Script Description](#script-description)
  10. - [Script and Sample Code](#script-and-sample-code)
  11. - [ModelZoo Homepage](#modelzoo-homepage)
  12. # [GPT Description](#contents)
  13. The GPT network was proposed by OpenAI and it has three versions, i.e., GPT, GPT2 and GPT3. The newest version GPT3 was proposed in Jul 2020 and it is quite a large language model with 175 billion parameters. Stacking many Decoder structure of Transformer and feeding massive amount of training data, GPT3 becomes such a powerful language model that no fine-tuning process is needed. As the papre title says, language models are few-shot learners, GPT3 proves that with a large and well-trained model, we can achieve a similar performance compared to those of fine-tuning methods.
  14. [Paper](https://arxiv.org/abs/2005.14165): Tom B.Brown, Benjamin Mann, Nick Ryder et al. [Language Models are Few-Shot Learners]((https://arxiv.org/abs/2005.14165)). arXiv preprint arXiv:2005.14165
  15. # [Model Architecture](#contents)
  16. GPT3 stacks many layers of decoder of transformer. According to the layer numbers and embedding size, GPT3 has several versions. The largest model contains 96 layers with embedding size of 12288 resulting to a total parameter of 175 billion.
  17. # [Dataset](#contents)
  18. - OpenWebText is utilized as the training data and the training objective is to predict the next token at each position.
  19. # [Environment Requirements](#contents)
  20. - Hardware(Ascend)
  21. - Prepare hardware environment with Ascend processor. If you want to try Ascend, please send the [application form](https://obs-9be7.obs.cn-east-2.myhuaweicloud.com/file/other/Ascend%20Model%20Zoo%E4%BD%93%E9%AA%8C%E8%B5%84%E6%BA%90%E7%94%B3%E8%AF%B7%E8%A1%A8.docx) to ascend@huawei.com. Once approved, you can get access to the resources.
  22. - Framework
  23. - [MindSpore](https://gitee.com/mindspore/mindspore)
  24. - For more information, please check the resources below:
  25. - [MindSpore Tutorials](https://www.mindspore.cn/tutorial/training/en/master/index.html)
  26. - [MindSpore Python API](https://www.mindspore.cn/doc/api_python/en/master/index.html)
  27. # [Quick Start](#contents)
  28. After installing MindSpore via the official website, you can start training and evaluation as follows:
  29. ```bash
  30. # run standalone training example
  31. bash scripts/run_standalone_train.sh 0 10 /path/dataset
  32. # run distributed training example
  33. bash scripts/run_distribute_training.sh /path/dataset /path/hccl.json 8
  34. # run evaluation example, now only accuracy and perplexity for lambada and wikitext103 are supported
  35. bash scripts/run_evaluation.sh lambada /your/ckpt /your/data acc
  36. ```
  37. For distributed training, an hccl configuration file with JSON format needs to be created in advance.
  38. Please follow the instructions in the link below:
  39. https:gitee.com/mindspore/mindspore/tree/master/model_zoo/utils/hccl_tools.
  40. # [Script Description](#contents)
  41. ## [Script and Sample Code](#contents)
  42. ```shell
  43. .
  44. └─gpt
  45. ├─README.md
  46. ├─scripts
  47. ├─run_standalone_train.sh # shell script for standalone training on ascend
  48. ├─run_distribut_train.sh # shell script for distributed training on ascend
  49. └─run_evaluation.sh # shell script for evaluation of ascend
  50. ├─src
  51. ├─gpt_wrapper.py # backbone code of network
  52. ├─gpt.py # backbone code of network
  53. ├─dataset.py # data preprocessing
  54. ├─inference.py # evaluation function
  55. ├─utils.py # util function
  56. ├─train.py # train net for training phase
  57. └─eval.py # eval net for evaluation
  58. ```
  59. # [ModelZoo Homepage](#contents)
  60. Please check the official [homepage](https://gitee.com/mindspore/mindspore/tree/master/model_zoo).