77 Commits (a64920c46802a3030bfdf8754870ffb1e5c6ba78)

Author SHA1 Message Date
  Zirui Wu bde9f18f5a update lookup api to take in a type 5 years ago
  nhussain 3bac9d3713 switch input columns and operation 5 years ago
  Cathy Wong 4d4c11b133 dataset API docstring cleanup: Standard product terms NumPy, Python 5 years ago
  nhussain eb9a611041 remove old defaults 5 years ago
  ms_yan 501f549bc9 modify comment for api 5 years ago
  guansongsong 0bf6ae913a fix python api doc for mindspore.dataset 5 years ago
  xulei2020 18b519ae0f add sentence piece 5 years ago
  YangLuo 4136892a3e add SlidingWindow Op 5 years ago
  mindspore-ci-bot 6284c42a76 !2941 MD tokenizer support output offsets 5 years ago
  xiefangqi 47060631e5 add offsets feature to tokenizer 5 years ago
  Zirui Wu 7b15e5a742 rework on lookup 5 years ago
  nhussain 6c37ea3be0 fix validators 5 years ago
  qianlong 94581f1c43 del JiebaMode and NormalizeForm from python api doc 5 years ago
  YangLuo 36d1613f9a !2464 [Dataset] code review & add citation 5 years ago
  qianlong d9f4549d13 add comment for dataset.text 5 years ago
  mindspore-ci-bot 1ea38eb60c !2375 Add Python Tokenizer 5 years ago
  mindspore-ci-bot 886dfe6fd7 !2419 Rectification and modification of dataset api documentation comments 5 years ago
  qianlong cb01a99b08 fix dataset.text api doc 5 years ago
  qianlong 980ddd32a2 change output of WordpieceTokenizer and BertTokenizer to 1-D string tensors 5 years ago
  hesham e981c67acd Python Tokenizer 5 years ago
  peilinwang 1e36b0649f remove graphengine changes 5 years ago
  Zirui Wu b6e9504b31 phase I of Vocab rework 5 years ago
  Zirui Wu 8f2674850b address API doc style and content 5 years ago
  hesham b9495a9ccc Truncate Pair 5 years ago
  qianlong 4f16f036be Add WhitespaceTokenizer and UnicodeScriptTokenizer for nlp 6 years ago
  Zirui Wu dbf9936ec4 Implemented n-gram for dataset TensorOp 6 years ago
  hesham 6c21e556c4 Clean up work for text python package 6 years ago