You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

mindspore.dataset.text.rst 2.1 kB

4 years ago
4 years ago
4 years ago
4 years ago
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657
  1. mindspore.dataset.text
  2. ======================
  3. This module is to support text processing for NLP. It includes two parts: transforms and utils. transforms is a high performance NLP text processing module which is developed with ICU4C and cppjieba. utils provides some general methods for NLP text processing.
  4. Common imported modules in corresponding API examples are as follows:
  5. .. code-block::
  6. import mindspore.dataset as ds
  7. from mindspore.dataset import text
  8. mindspore.dataset.text.transforms
  9. ---------------------------------
  10. .. msnoteautosummary::
  11. :toctree: dataset_text
  12. :nosignatures:
  13. :template: classtemplate.rst
  14. mindspore.dataset.text.transforms.BasicTokenizer
  15. mindspore.dataset.text.transforms.BertTokenizer
  16. mindspore.dataset.text.transforms.CaseFold
  17. mindspore.dataset.text.transforms.JiebaTokenizer
  18. mindspore.dataset.text.transforms.Lookup
  19. mindspore.dataset.text.transforms.Ngram
  20. mindspore.dataset.text.transforms.NormalizeUTF8
  21. mindspore.dataset.text.transforms.PythonTokenizer
  22. mindspore.dataset.text.transforms.RegexReplace
  23. mindspore.dataset.text.transforms.RegexTokenizer
  24. mindspore.dataset.text.transforms.SentencePieceTokenizer
  25. mindspore.dataset.text.transforms.SlidingWindow
  26. mindspore.dataset.text.transforms.ToNumber
  27. mindspore.dataset.text.transforms.TruncateSequencePair
  28. mindspore.dataset.text.transforms.UnicodeCharTokenizer
  29. mindspore.dataset.text.transforms.UnicodeScriptTokenizer
  30. mindspore.dataset.text.transforms.WhitespaceTokenizer
  31. mindspore.dataset.text.transforms.WordpieceTokenizer
  32. mindspore.dataset.text.utils
  33. ----------------------------
  34. .. msnoteautosummary::
  35. :toctree: dataset_text
  36. :nosignatures:
  37. :template: classtemplate.rst
  38. mindspore.dataset.text.JiebaMode
  39. mindspore.dataset.text.NormalizeForm
  40. mindspore.dataset.text.SentencePieceModel
  41. mindspore.dataset.text.SentencePieceVocab
  42. mindspore.dataset.text.SPieceTokenizerLoadType
  43. mindspore.dataset.text.SPieceTokenizerOutType
  44. mindspore.dataset.text.to_str
  45. mindspore.dataset.text.to_bytes
  46. mindspore.dataset.text.Vocab