|
|
|
@@ -560,9 +560,9 @@ class Dataset: |
|
|
|
|
|
|
|
Note: |
|
|
|
1. If count is greater than the number of element in dataset or equal to -1, |
|
|
|
all the element in dataset will be taken. |
|
|
|
all the element in dataset will be taken. |
|
|
|
2. The order of using take and batch effects. If take before batch operation, |
|
|
|
then taken given number of rows, otherwise take given number of batches. |
|
|
|
then taken given number of rows, otherwise take given number of batches. |
|
|
|
|
|
|
|
Args: |
|
|
|
count (int, optional): Number of elements to be taken from the dataset (default=-1). |
|
|
|
@@ -590,7 +590,7 @@ class Dataset: |
|
|
|
# here again |
|
|
|
dataset_size = self.get_dataset_size() |
|
|
|
|
|
|
|
if(dataset_size is None or dataset_size <= 0): |
|
|
|
if dataset_size is None or dataset_size <= 0: |
|
|
|
raise RuntimeError("dataset size unknown, unable to split.") |
|
|
|
|
|
|
|
all_int = all(isinstance(item, int) for item in sizes) |
|
|
|
@@ -640,8 +640,8 @@ class Dataset: |
|
|
|
Note: |
|
|
|
1. Dataset cannot be sharded if split is going to be called. |
|
|
|
2. It is strongly recommended to not shuffle the dataset, but use randomize=True instead. |
|
|
|
Shuffling the dataset may not be deterministic, which means the data in each split |
|
|
|
will be different in each epoch. |
|
|
|
Shuffling the dataset may not be deterministic, which means the data in each split |
|
|
|
will be different in each epoch. |
|
|
|
|
|
|
|
Raises: |
|
|
|
RuntimeError: If get_dataset_size returns None or is not supported for this dataset. |
|
|
|
@@ -1173,6 +1173,7 @@ class SourceDataset(Dataset): |
|
|
|
def is_sharded(self): |
|
|
|
raise NotImplementedError("SourceDataset must implement is_sharded.") |
|
|
|
|
|
|
|
|
|
|
|
class MappableDataset(SourceDataset): |
|
|
|
""" |
|
|
|
Abstract class to represent a source dataset which supports use of samplers. |
|
|
|
@@ -1253,13 +1254,13 @@ class MappableDataset(SourceDataset): |
|
|
|
|
|
|
|
Note: |
|
|
|
1. Dataset should not be sharded if split is going to be called. Instead, create a |
|
|
|
DistributedSampler and specify a split to shard after splitting. If dataset is |
|
|
|
sharded after a split, it is strongly recommended to set the same seed in each instance |
|
|
|
of execution, otherwise each shard may not be part of the same split (see Examples) |
|
|
|
DistributedSampler and specify a split to shard after splitting. If dataset is |
|
|
|
sharded after a split, it is strongly recommended to set the same seed in each instance |
|
|
|
of execution, otherwise each shard may not be part of the same split (see Examples) |
|
|
|
2. It is strongly recommended to not shuffle the dataset, but use randomize=True instead. |
|
|
|
Shuffling the dataset may not be deterministic, which means the data in each split |
|
|
|
will be different in each epoch. Furthermore, if sharding occurs after split, each |
|
|
|
shard may not be part of the same split. |
|
|
|
Shuffling the dataset may not be deterministic, which means the data in each split |
|
|
|
will be different in each epoch. Furthermore, if sharding occurs after split, each |
|
|
|
shard may not be part of the same split. |
|
|
|
|
|
|
|
Raises: |
|
|
|
RuntimeError: If get_dataset_size returns None or is not supported for this dataset. |
|
|
|
|