| @@ -4421,23 +4421,7 @@ class CelebADataset(MappableDataset): | |||||
| The generated dataset has two columns ['image', 'attr']. | The generated dataset has two columns ['image', 'attr']. | ||||
| The type of the image tensor is uint8. The attr tensor is uint32 and one hot type. | The type of the image tensor is uint8. The attr tensor is uint32 and one hot type. | ||||
| Args: | |||||
| dataset_dir (str): Path to the root directory that contains the dataset. | |||||
| num_parallel_workers (int, optional): Number of workers to read the data (default=value set in the config). | |||||
| shuffle (bool, optional): Whether to perform shuffle on the dataset (default=None). | |||||
| dataset_type (str): one of 'all', 'train', 'valid' or 'test'. | |||||
| sampler (Sampler, optional): Object used to choose samples from the dataset (default=None). | |||||
| decode (bool, optional): decode the images after reading (default=False). | |||||
| extensions (list[str], optional): List of file extensions to be | |||||
| included in the dataset (default=None). | |||||
| num_samples (int, optional): The number of images to be included in the dataset. | |||||
| (default=None, all images). | |||||
| num_shards (int, optional): Number of shards that the dataset should be divided | |||||
| into (default=None). | |||||
| shard_id (int, optional): The shard ID within num_shards (default=None). This | |||||
| argument should be specified only when num_shards is also specified. | |||||
| Citation of CelebA dataset. | |||||
| Citation of CelebA dataset. | |||||
| .. code-block:: | .. code-block:: | ||||
| @@ -4455,9 +4439,9 @@ class CelebADataset(MappableDataset): | |||||
| bibsource = {dblp computer science bibliography, https://dblp.org}, | bibsource = {dblp computer science bibliography, https://dblp.org}, | ||||
| howpublished = {http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html}, | howpublished = {http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html}, | ||||
| description = {CelebFaces Attributes Dataset (CelebA) is a large-scale face attributes dataset | description = {CelebFaces Attributes Dataset (CelebA) is a large-scale face attributes dataset | ||||
| with more than 200K celebrity images, each with 40 attribute annotations. The | |||||
| images in this dataset cover large pose variations and background clutter. CelebA | |||||
| has large diversities, large quantities, and rich annotations, including | |||||
| with more than 200K celebrity images, each with 40 attribute annotations. | |||||
| The images in this dataset cover large pose variations and background clutter. | |||||
| CelebA has large diversities, large quantities, and rich annotations, including | |||||
| * 10,177 number of identities, | * 10,177 number of identities, | ||||
| * 202,599 number of face images, and | * 202,599 number of face images, and | ||||
| * 5 landmark locations, 40 binary attributes annotations per image. | * 5 landmark locations, 40 binary attributes annotations per image. | ||||
| @@ -4465,6 +4449,22 @@ class CelebADataset(MappableDataset): | |||||
| vision tasks: face attribute recognition, face detection, landmark (or facial part) | vision tasks: face attribute recognition, face detection, landmark (or facial part) | ||||
| localization, and face editing & synthesis.} | localization, and face editing & synthesis.} | ||||
| } | } | ||||
| Args: | |||||
| dataset_dir (str): Path to the root directory that contains the dataset. | |||||
| num_parallel_workers (int, optional): Number of workers to read the data (default=value set in the config). | |||||
| shuffle (bool, optional): Whether to perform shuffle on the dataset (default=None). | |||||
| dataset_type (str): one of 'all', 'train', 'valid' or 'test'. | |||||
| sampler (Sampler, optional): Object used to choose samples from the dataset (default=None). | |||||
| decode (bool, optional): decode the images after reading (default=False). | |||||
| extensions (list[str], optional): List of file extensions to be | |||||
| included in the dataset (default=None). | |||||
| num_samples (int, optional): The number of images to be included in the dataset. | |||||
| (default=None, all images). | |||||
| num_shards (int, optional): Number of shards that the dataset should be divided | |||||
| into (default=None). | |||||
| shard_id (int, optional): The shard ID within num_shards (default=None). This | |||||
| argument should be specified only when num_shards is also specified. | |||||
| """ | """ | ||||
| @check_celebadataset | @check_celebadataset | ||||
| @@ -4542,6 +4542,24 @@ class CLUEDataset(SourceDataset): | |||||
| models, corpus and leaderboard. Here we bring in classification task of CLUE, which are AFQMC, TNEWS, IFLYTEK, | models, corpus and leaderboard. Here we bring in classification task of CLUE, which are AFQMC, TNEWS, IFLYTEK, | ||||
| CMNLI, WSC and CSL. | CMNLI, WSC and CSL. | ||||
| Citation of CLUE dataset. | |||||
| .. code-block:: | |||||
| @article{CLUEbenchmark, | |||||
| title = {CLUE: A Chinese Language Understanding Evaluation Benchmark}, | |||||
| author = {Liang Xu, Xuanwei Zhang, Lu Li, Hai Hu, Chenjie Cao, Weitang Liu, Junyi Li, Yudong Li, | |||||
| Kai Sun, Yechen Xu, Yiming Cui, Cong Yu, Qianqian Dong, Yin Tian, Dian Yu, Bo Shi, Jun Zeng, | |||||
| Rongzhao Wang, Weijian Xie, Yanting Li, Yina Patterson, Zuoyu Tian, Yiwen Zhang, He Zhou, | |||||
| Shaoweihua Liu, Qipeng Zhao, Cong Yue, Xinrui Zhang, Zhengliang Yang, Zhenzhong Lan}, | |||||
| journal = {arXiv preprint arXiv:2004.05986}, | |||||
| year = {2020}, | |||||
| howpublished = {https://github.com/CLUEbenchmark/CLUE}, | |||||
| description = {CLUE, a Chinese Language Understanding Evaluation benchmark. It contains eight different | |||||
| tasks, including single-sentence classification, sentence pair classification, and machine | |||||
| reading comprehension.} | |||||
| } | |||||
| Args: | Args: | ||||
| dataset_files (str or list[str]): String or list of files to be read or glob strings to search for a pattern of | dataset_files (str or list[str]): String or list of files to be read or glob strings to search for a pattern of | ||||
| files. The list will be sorted in a lexicographical order. | files. The list will be sorted in a lexicographical order. | ||||
| @@ -4564,24 +4582,6 @@ class CLUEDataset(SourceDataset): | |||||
| shard_id (int, optional): The shard ID within num_shards (default=None). This | shard_id (int, optional): The shard ID within num_shards (default=None). This | ||||
| argument should be specified only when num_shards is also specified. | argument should be specified only when num_shards is also specified. | ||||
| Citation of CLUE dataset. | |||||
| .. code-block:: | |||||
| @article{CLUEbenchmark, | |||||
| title = {CLUE: A Chinese Language Understanding Evaluation Benchmark}, | |||||
| author = {Liang Xu, Xuanwei Zhang, Lu Li, Hai Hu, Chenjie Cao, Weitang Liu, Junyi Li, Yudong Li, | |||||
| Kai Sun, Yechen Xu, Yiming Cui, Cong Yu, Qianqian Dong, Yin Tian, Dian Yu, Bo Shi, Jun Zeng, | |||||
| Rongzhao Wang, Weijian Xie, Yanting Li, Yina Patterson, Zuoyu Tian, Yiwen Zhang, He Zhou, | |||||
| Shaoweihua Liu, Qipeng Zhao, Cong Yue, Xinrui Zhang, Zhengliang Yang, Zhenzhong Lan}, | |||||
| journal = {arXiv preprint arXiv:2004.05986}, | |||||
| year = {2020}, | |||||
| howpublished = {https://github.com/CLUEbenchmark/CLUE}, | |||||
| description = {CLUE, a Chinese Language Understanding Evaluation benchmark. It contains eight different | |||||
| tasks, including single-sentence classification, sentence pair classification, and machine | |||||
| reading comprehension.} | |||||
| } | |||||
| Examples: | Examples: | ||||
| >>> import mindspore.dataset as ds | >>> import mindspore.dataset as ds | ||||
| >>> dataset_files = ["/path/to/1", "/path/to/2"] # contains 1 or multiple text files | >>> dataset_files = ["/path/to/1", "/path/to/2"] # contains 1 or multiple text files | ||||