Browse Source

!8513 Add API info about Cases not supported by Windows

From: @shenwei41
Reviewed-by: @heleiwang
Signed-off-by:
tags/v1.1.0
mindspore-ci-bot Gitee 5 years ago
parent
commit
98531ebcda
1 changed files with 24 additions and 0 deletions
  1. +24
    -0
      mindspore/dataset/text/transforms.py

+ 24
- 0
mindspore/dataset/text/transforms.py View File

@@ -417,6 +417,9 @@ if platform.system().lower() != 'windows':
"""
Tokenize a scalar tensor of UTF-8 string on ICU4C defined whitespaces, such as: ' ', '\\\\t', '\\\\r', '\\\\n'.

Note:
The WhitespaceTokenizer is not supported on windows platform yet.

Args:
with_offsets (bool, optional): If or not output offsets of tokens (default=False).

@@ -445,6 +448,9 @@ if platform.system().lower() != 'windows':
"""
Tokenize a scalar tensor of UTF-8 string on Unicode script boundaries.

Note:
The UnicodeScriptTokenizer is not supported on windows platform yet.

Args:
keep_whitespace (bool, optional): If or not emit whitespace tokens (default=False).
with_offsets (bool, optional): If or not output offsets of tokens (default=False).
@@ -475,6 +481,9 @@ if platform.system().lower() != 'windows':
"""
Apply case fold operation on utf-8 string tensor.

Note:
The CaseFold is not supported on windows platform yet.

Examples:
>>> import mindspore.dataset.text as text
>>>
@@ -495,6 +504,9 @@ if platform.system().lower() != 'windows':
"""
Apply normalize operation on utf-8 string tensor.

Note:
The NormalizeUTF8 is not supported on windows platform yet.

Args:
normalize_form (NormalizeForm, optional): Valid values can be any of [NormalizeForm.NONE,
NormalizeForm.NFC, NormalizeForm.NFKC, NormalizeForm.NFD,
@@ -528,6 +540,9 @@ if platform.system().lower() != 'windows':

See http://userguide.icu-project.org/strings/regexp for support regex pattern.

Note:
The RegexReplace is not supported on windows platform yet.

Args:
pattern (str): the regex expression patterns.
replace (str): the string to replace matched element.
@@ -556,6 +571,9 @@ if platform.system().lower() != 'windows':

See http://userguide.icu-project.org/strings/regexp for support regex pattern.

Note:
The RegexTokenizer is not supported on windows platform yet.

Args:
delim_pattern (str): The pattern of regex delimiters.
The original string will be split by matched elements.
@@ -591,6 +609,9 @@ if platform.system().lower() != 'windows':
"""
Tokenize a scalar tensor of UTF-8 string by specific rules.

Note:
The BasicTokenizer is not supported on windows platform yet.

Args:
lower_case (bool, optional): If True, apply CaseFold, NormalizeUTF8(NFD mode), RegexReplace operation
on input text to fold the text to lower case and strip accents characters. If False, only apply
@@ -644,6 +665,9 @@ if platform.system().lower() != 'windows':
"""
Tokenizer used for Bert text process.

Note:
The BertTokenizer is not supported on windows platform yet.

Args:
vocab (Vocab): A vocabulary object.
suffix_indicator (str, optional): Used to show that the subword is the last part of a word (default='##').


Loading…
Cancel
Save