You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

mindspore.dataset.text.rst 2.3 kB

4 years ago
4 years ago
4 years ago
4 years ago
4 years ago
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566
  1. mindspore.dataset.text
  2. ======================
  3. 此模块用于文本数据增强,包括 `transforms` 和 `utils` 两个子模块。
  4. `transforms` 是一个高性能文本数据增强模块,支持常见的文本数据增强处理。
  5. `utils` 提供了一些文本处理的工具方法。
  6. 在API示例中,常用的模块导入方法如下:
  7. .. code-block::
  8. import mindspore.dataset as ds
  9. from mindspore.dataset import text
  10. 常用数据处理术语说明如下:
  11. - TensorOperation,所有C++实现的数据处理操作的基类。
  12. - TextTensorOperation,所有文本数据处理操作的基类,派生自TensorOperation。
  13. mindspore.dataset.text.transforms
  14. ---------------------------------
  15. .. mscnnoteautosummary::
  16. :toctree: dataset_text
  17. :nosignatures:
  18. :template: classtemplate.rst
  19. mindspore.dataset.text.transforms.BasicTokenizer
  20. mindspore.dataset.text.transforms.BertTokenizer
  21. mindspore.dataset.text.transforms.CaseFold
  22. mindspore.dataset.text.transforms.JiebaTokenizer
  23. mindspore.dataset.text.transforms.Lookup
  24. mindspore.dataset.text.transforms.Ngram
  25. mindspore.dataset.text.transforms.NormalizeUTF8
  26. mindspore.dataset.text.transforms.PythonTokenizer
  27. mindspore.dataset.text.transforms.RegexReplace
  28. mindspore.dataset.text.transforms.RegexTokenizer
  29. mindspore.dataset.text.transforms.SentencePieceTokenizer
  30. mindspore.dataset.text.transforms.SlidingWindow
  31. mindspore.dataset.text.transforms.ToNumber
  32. mindspore.dataset.text.transforms.TruncateSequencePair
  33. mindspore.dataset.text.transforms.UnicodeCharTokenizer
  34. mindspore.dataset.text.transforms.UnicodeScriptTokenizer
  35. mindspore.dataset.text.transforms.WhitespaceTokenizer
  36. mindspore.dataset.text.transforms.WordpieceTokenizer
  37. mindspore.dataset.text.utils
  38. ----------------------------
  39. .. mscnnoteautosummary::
  40. :toctree: dataset_text
  41. :nosignatures:
  42. :template: classtemplate.rst
  43. mindspore.dataset.text.JiebaMode
  44. mindspore.dataset.text.NormalizeForm
  45. mindspore.dataset.text.SentencePieceModel
  46. mindspore.dataset.text.SentencePieceVocab
  47. mindspore.dataset.text.SPieceTokenizerLoadType
  48. mindspore.dataset.text.SPieceTokenizerOutType
  49. mindspore.dataset.text.to_str
  50. mindspore.dataset.text.to_bytes
  51. mindspore.dataset.text.Vocab