|
|
|
@@ -613,7 +613,7 @@ class Dataset: |
|
|
|
# if we still need more rows, give them to the first split. |
|
|
|
# if we have too many rows, remove the extras from the first split that has |
|
|
|
# enough rows. |
|
|
|
size_difference = dataset_size - absolute_sizes_sum |
|
|
|
size_difference = int(dataset_size - absolute_sizes_sum) |
|
|
|
if size_difference > 0: |
|
|
|
absolute_sizes[0] += size_difference |
|
|
|
else: |
|
|
|
@@ -647,10 +647,14 @@ class Dataset: |
|
|
|
Datasets of size round(f1*K), round(f2*K), …, round(fn*K) where K is the size of the |
|
|
|
original dataset. |
|
|
|
If after rounding: |
|
|
|
-Any size equals 0, an error will occur. |
|
|
|
-The sum of split sizes < K, the difference will be added to the first split. |
|
|
|
-The sum of split sizes > K, the difference will be removed from the first large |
|
|
|
enough split such that it will have atleast 1 row after removing the difference. |
|
|
|
|
|
|
|
- Any size equals 0, an error will occur. |
|
|
|
|
|
|
|
- The sum of split sizes < K, the difference will be added to the first split. |
|
|
|
|
|
|
|
- The sum of split sizes > K, the difference will be removed from the first large |
|
|
|
enough split such that it will have atleast 1 row after removing the difference. |
|
|
|
|
|
|
|
randomize (bool, optional): determines whether or not to split the data randomly (default=True). |
|
|
|
If true, the data will be randomly split. Otherwise, each split will be created with |
|
|
|
consecutive rows from the dataset. |
|
|
|
@@ -1282,10 +1286,14 @@ class MappableDataset(SourceDataset): |
|
|
|
Datasets of size round(f1*K), round(f2*K), …, round(fn*K) where K is the size of the |
|
|
|
original dataset. |
|
|
|
If after rounding: |
|
|
|
-Any size equals 0, an error will occur. |
|
|
|
-The sum of split sizes < K, the difference will be added to the first split. |
|
|
|
-The sum of split sizes > K, the difference will be removed from the first large |
|
|
|
enough split such that it will have atleast 1 row after removing the difference. |
|
|
|
|
|
|
|
- Any size equals 0, an error will occur. |
|
|
|
|
|
|
|
- The sum of split sizes < K, the difference will be added to the first split. |
|
|
|
|
|
|
|
- The sum of split sizes > K, the difference will be removed from the first large |
|
|
|
enough split such that it will have atleast 1 row after removing the difference. |
|
|
|
|
|
|
|
randomize (bool, optional): determines whether or not to split the data randomly (default=True). |
|
|
|
If true, the data will be randomly split. Otherwise, each split will be created with |
|
|
|
consecutive rows from the dataset. |
|
|
|
|