| @@ -613,7 +613,7 @@ class Dataset: | |||
| # if we still need more rows, give them to the first split. | |||
| # if we have too many rows, remove the extras from the first split that has | |||
| # enough rows. | |||
| size_difference = dataset_size - absolute_sizes_sum | |||
| size_difference = int(dataset_size - absolute_sizes_sum) | |||
| if size_difference > 0: | |||
| absolute_sizes[0] += size_difference | |||
| else: | |||
| @@ -647,10 +647,14 @@ class Dataset: | |||
| Datasets of size round(f1*K), round(f2*K), …, round(fn*K) where K is the size of the | |||
| original dataset. | |||
| If after rounding: | |||
| -Any size equals 0, an error will occur. | |||
| -The sum of split sizes < K, the difference will be added to the first split. | |||
| -The sum of split sizes > K, the difference will be removed from the first large | |||
| enough split such that it will have atleast 1 row after removing the difference. | |||
| - Any size equals 0, an error will occur. | |||
| - The sum of split sizes < K, the difference will be added to the first split. | |||
| - The sum of split sizes > K, the difference will be removed from the first large | |||
| enough split such that it will have atleast 1 row after removing the difference. | |||
| randomize (bool, optional): determines whether or not to split the data randomly (default=True). | |||
| If true, the data will be randomly split. Otherwise, each split will be created with | |||
| consecutive rows from the dataset. | |||
| @@ -1282,10 +1286,14 @@ class MappableDataset(SourceDataset): | |||
| Datasets of size round(f1*K), round(f2*K), …, round(fn*K) where K is the size of the | |||
| original dataset. | |||
| If after rounding: | |||
| -Any size equals 0, an error will occur. | |||
| -The sum of split sizes < K, the difference will be added to the first split. | |||
| -The sum of split sizes > K, the difference will be removed from the first large | |||
| enough split such that it will have atleast 1 row after removing the difference. | |||
| - Any size equals 0, an error will occur. | |||
| - The sum of split sizes < K, the difference will be added to the first split. | |||
| - The sum of split sizes > K, the difference will be removed from the first large | |||
| enough split such that it will have atleast 1 row after removing the difference. | |||
| randomize (bool, optional): determines whether or not to split the data randomly (default=True). | |||
| If true, the data will be randomly split. Otherwise, each split will be created with | |||
| consecutive rows from the dataset. | |||