Browse Source

!8513 Add API info about Cases not supported by Windows

From: @shenwei41
Reviewed-by: @heleiwang
Signed-off-by:
tags/v1.1.0
mindspore-ci-bot Gitee 5 years ago
parent
commit
98531ebcda
1 changed files with 24 additions and 0 deletions
  1. +24
    -0
      mindspore/dataset/text/transforms.py

+ 24
- 0
mindspore/dataset/text/transforms.py View File

@@ -417,6 +417,9 @@ if platform.system().lower() != 'windows':
""" """
Tokenize a scalar tensor of UTF-8 string on ICU4C defined whitespaces, such as: ' ', '\\\\t', '\\\\r', '\\\\n'. Tokenize a scalar tensor of UTF-8 string on ICU4C defined whitespaces, such as: ' ', '\\\\t', '\\\\r', '\\\\n'.


Note:
The WhitespaceTokenizer is not supported on windows platform yet.

Args: Args:
with_offsets (bool, optional): If or not output offsets of tokens (default=False). with_offsets (bool, optional): If or not output offsets of tokens (default=False).


@@ -445,6 +448,9 @@ if platform.system().lower() != 'windows':
""" """
Tokenize a scalar tensor of UTF-8 string on Unicode script boundaries. Tokenize a scalar tensor of UTF-8 string on Unicode script boundaries.


Note:
The UnicodeScriptTokenizer is not supported on windows platform yet.

Args: Args:
keep_whitespace (bool, optional): If or not emit whitespace tokens (default=False). keep_whitespace (bool, optional): If or not emit whitespace tokens (default=False).
with_offsets (bool, optional): If or not output offsets of tokens (default=False). with_offsets (bool, optional): If or not output offsets of tokens (default=False).
@@ -475,6 +481,9 @@ if platform.system().lower() != 'windows':
""" """
Apply case fold operation on utf-8 string tensor. Apply case fold operation on utf-8 string tensor.


Note:
The CaseFold is not supported on windows platform yet.

Examples: Examples:
>>> import mindspore.dataset.text as text >>> import mindspore.dataset.text as text
>>> >>>
@@ -495,6 +504,9 @@ if platform.system().lower() != 'windows':
""" """
Apply normalize operation on utf-8 string tensor. Apply normalize operation on utf-8 string tensor.


Note:
The NormalizeUTF8 is not supported on windows platform yet.

Args: Args:
normalize_form (NormalizeForm, optional): Valid values can be any of [NormalizeForm.NONE, normalize_form (NormalizeForm, optional): Valid values can be any of [NormalizeForm.NONE,
NormalizeForm.NFC, NormalizeForm.NFKC, NormalizeForm.NFD, NormalizeForm.NFC, NormalizeForm.NFKC, NormalizeForm.NFD,
@@ -528,6 +540,9 @@ if platform.system().lower() != 'windows':


See http://userguide.icu-project.org/strings/regexp for support regex pattern. See http://userguide.icu-project.org/strings/regexp for support regex pattern.


Note:
The RegexReplace is not supported on windows platform yet.

Args: Args:
pattern (str): the regex expression patterns. pattern (str): the regex expression patterns.
replace (str): the string to replace matched element. replace (str): the string to replace matched element.
@@ -556,6 +571,9 @@ if platform.system().lower() != 'windows':


See http://userguide.icu-project.org/strings/regexp for support regex pattern. See http://userguide.icu-project.org/strings/regexp for support regex pattern.


Note:
The RegexTokenizer is not supported on windows platform yet.

Args: Args:
delim_pattern (str): The pattern of regex delimiters. delim_pattern (str): The pattern of regex delimiters.
The original string will be split by matched elements. The original string will be split by matched elements.
@@ -591,6 +609,9 @@ if platform.system().lower() != 'windows':
""" """
Tokenize a scalar tensor of UTF-8 string by specific rules. Tokenize a scalar tensor of UTF-8 string by specific rules.


Note:
The BasicTokenizer is not supported on windows platform yet.

Args: Args:
lower_case (bool, optional): If True, apply CaseFold, NormalizeUTF8(NFD mode), RegexReplace operation lower_case (bool, optional): If True, apply CaseFold, NormalizeUTF8(NFD mode), RegexReplace operation
on input text to fold the text to lower case and strip accents characters. If False, only apply on input text to fold the text to lower case and strip accents characters. If False, only apply
@@ -644,6 +665,9 @@ if platform.system().lower() != 'windows':
""" """
Tokenizer used for Bert text process. Tokenizer used for Bert text process.


Note:
The BertTokenizer is not supported on windows platform yet.

Args: Args:
vocab (Vocab): A vocabulary object. vocab (Vocab): A vocabulary object.
suffix_indicator (str, optional): Used to show that the subword is the last part of a word (default='##'). suffix_indicator (str, optional): Used to show that the subword is the last part of a word (default='##').


Loading…
Cancel
Save