all_labelsall_distinct_values metadata field which can contain all values (labels)ContinueFitMixin in a primitive which might require knowledgeget_params/set_params to retain and reuse only primitive's parameters.MONTHS to column's time_granularity metadata.https://metadata.datadrivendiscovery.org/types/Rank semantic typerank_for metadata field. PerformanceMetric classes have nowrequires_rank method.NESTED task keyword.file_columns_count metadata field and updated file_columns metadata fieldname to column_name and addedcolumn_index sub-fields to file_columns metadata.--logging-level argument to configure which loggingPerformanceMetric.register_metric class method.register_value class method. This allows one to add valuespython3 -m d3m index validate).targets.csv file doesfit and producefit-score CLI command whentime_granularity metadata is saved when saving a D3M dataset.Removed type annotations from docstrings. Python type annotations are now used instead when rendering documentation.
#239
!371
Renamed blacklist in d3m.index.load_all and primitives_blacklist in d3m.metadata.pipeline.Resolver
to blocklist and primitives_blocklist, respectively.
Backwards incompatible.
Removed https://metadata.datadrivendiscovery.org/types/GPUResourcesUseParameter
semantic type. Added can_use_gpus primitive metadata field to signal that
the primitive can use GPUs if available.
#448
!369
Backwards incompatible.
Clarified that hyper-parameters using https://metadata.datadrivendiscovery.org/types/CPUResourcesUseParameter
should have 1 as default value.
!369
Clarified that it is not necessary to call fit before calling
continue_fit.
index CLI command has been renamed to primitive CLI command.
#437
!363
numpy.matrix has been removed as an allowed container type, as it
was deprecated by NumPy.
#230
!362
Backwards incompatible.
CLI has now --version command which returns the version of the d3m
core package itself.
#378
!359
Upgraded schemas to JSON Schema draft 7, and upgraded Python jsonschema
dependency to version 3.
#392
!342
Added a Primitive Good Citizen Checklist to documentation, documenting
some best practices when writing a primitive.
#127
!347
!355
Updated upper bounds of core dependencies to latest available versions.
!337
Added to algorithm_types:
SAMPLE_SELECTIONSAMPLE_MERGINGMOMENTUM_CONTRASTCAUSAL_ANALYSIS.gz for decompression to happenvalidate CLI commands to work on YAML files.tests/data git submodule.Dataset object to D3M dataset format, metadata is now preserved.Primitive family REMOTE_SENSING has been added.
!310
Added support for version 4.0.0 of D3M dataset schema:
NODE and EDGE references (used in graph datasets),NODE_ATTRIBUTE and EDGE_ATTRIBUTE.time_granularity can now be present on a column.forecasting_horizon can now be present in a problem description.task_type and task_subtype have been merged into task_keywords.TaskType and TaskSubtype were replacedTaskKeyword.Added OpenML dataset loader. Now you can pass an URL to a OpenML dataset
and it will be downloaded and converted to a Dataset compatible object,
with including many of available meta-features. Combined with support
for saving datasets, this now allows easy conversion between OpenML
datasets and D3M datasets, e.g., python3 -m d3m dataset convert -i https://www.openml.org/d/61 -o out/datasetDoc.json.
#252
!309
When saving and loading D3M datasets, metadata is now preserved.
#227
!265
Metadata can now be converted to a JSON compatible structure in a
reversible manner.
#373
!308
Pipeline run now records if a pipeline was run as a standard pipeline
under run.is_standard_pipeline field.
#396
!249
"meta" files have been replaced with support for rerunning pipeline runs.
Instead of configuring a "meta" file with configuration how to run a
pipeline, simply provide an example pipeline run which demonstrates how
the pipeline was run. Runtime does not have --meta argument anymore,
but has now --input-run argument instead.
#202
!249
Backwards incompatible.
Changed LossFunctionMixin to support multiple loss functions.
#386
!305
Backwards incompatible.
Pipeline equality and hashing functions now have only_control_hyperparams
argument which can be set to use only control hyper-parameters when doing
comparisons.
!289
Pipelines and other YAML files are now recognized with both .yml and
.yaml file extensions.
#375
!302
F1Metric, F1MicroMetric, and F1MacroMetric can now operate on
multiple target columns and average scores for all of them.
#400
!298
Pipelines and pipeline runs can now be serialized with Arrow.
#381
!290
describe CLI commands now accept --output argument to control where
their output is saved to.
!279
source.from metadata in datasets and problem descriptionspip3 install -e ...) when being installed from theFew top-level runtime functions had some of their arguments moved
to keyword-only arguments:
fit: problem_descriptionscore: scoring_pipeline, problem_description, metrics, predictions_random_seedprepare_data: data_pipeline, problem_description, data_paramsevaluate: data_pipeline, scoring_pipeline, problem_description, data_params, metricscan_accept method has been removed from primitive interfaces.
#334
!300
Backwards incompatible.
NetworkX objects are not anymore container types and are not allowed
anymore to be passed as values between primitives. Dataset loader now
does not convert a GML file to a NetworkX object but represents it
as a files collection resource. A primitive should then convert that
resource into a normalized edge-list graph representation.
#349
!299
Backwards incompatible.
JACCARD_SIMILARITY_SCORE metric is now a binary metric and requires
pos_label parameter.
!299
Backwards incompatible.
Updated core dependencies. Some important packages are now at versions:
tensorflow: 2.0.0keras: 2.3.1torch: 1.3.0.post2theano: 1.0.4scikit-learn: 0.21.3numpy: 1.17.3pandas: 0.25.2networkx: 2.4pyarrow: 0.15.1scipy: 1.3.1Primitive family DIMENSIONALITY_REDUCTION has been added.
!284
Added to algorithm_types:
POLYNOMIAL_REGRESSIONIMAGENETRETINANET--process-dependency-link is not anymore suggested to be used when
installing primitives.
sample_rate metadata field inside dimension has been renamed to
sampling_rate to make it consistent across metadata. This field
should contain a sampling rate used for the described dimension,
when values in the dimension are sampled.
Backwards incompatible.
-E to the d3m runtime CLI now exposes really all outputsPipeline's get_exposable_outputs method has been renamed to get_producing_outputs.
!270
Updating columns from DataFrame returned from DataFrame.select_columns
does not raise a warning anymore.
!268
Added scipy==1.2.1 as core dependency.
!266
Added code style guide to the repository.
!260
Added to algorithm_types:
ITERATIVE_LABELINGD3MDatasetSaver so Dataset objectssave method into D3M dataset format.List hyper-parameter which allowsSet).https://metadata.datadrivendiscovery.org/types/ConstructedAttributepython3 -m d3m.python3 -m d3m.--expose-produced-outputs argument runtime CLI to allow savingd3mIndex column if one does not exist already.--not-standard-pipeline argument to fit, produce, and fit-produceBounded and base Hyperparameter hyper-parameter now issuesBounded hyper-parameter with both bounds now samples from uniformSortedSet, List, and SortedList.Dataset objects can now be saved to D3M dataset format bysave method on them.NormalizeMutualInformationMetric implementation.Union hyper-parameter values and otherNaN and infinity floating-point values--strict-digest/strict_digestevaluate runtime function, if errormetadata attribute whenmetadata..meta file resolving when --datasets runtime argumentget_relations_graph resolving of column names (used in DenormalizeOther validation functions for metalearning documents. This includes
also CLI to validate.
#220
!233
Pipeline run schema now requires scoring dataset inputs to be recorded
if a data preparation pipeline has not been used.
!243
Backwards incompatible.
Core package now provides standard scoring primitive and scoring pipeline
which are used by runtime by default.
#307
!231
Pipeline run can now be generated also for a subset of non-standard
pipelines: those which have all inputs of Dataset type.
!232
Pipeline run now also records a normalized score, if available.
!230
Pipeline context field has been removed from schema and implementation.
!229
Added pure_primitive field to primitive's metadata so that primitives
can mark themselves as not pure (by default all primitives are seen as pure).
#331
!226
Metadata methods to_json_structure and to_simple_structure has been
modified to not return anymore internal metadata representation but
metadata representation equivalent to what you get from query call.
To obtain internal representation use to_internal_json_structure
and to_internal_simple_structure.
!225
Backwards incompatible.
NeuralNetworkModuleMixin and NeuralNetworkObjectMixin have been
added to primitive interfaces to support representing neural networks
as pipelines.
#174
!87
get_loss_function has been renamed to get_loss_metric in
LossFunctionMixin.
!87
Backwards incompatible.
UniformInt, Uniform, and LogUniform hyper-parameter classes now
subclass Bounded class.
!216
Metrics do not have default parameter values anymore, cleaned legacy
parts of code assuming so.
!212
Added new semantic types:
https://metadata.datadrivendiscovery.org/types/EdgeSource
https://metadata.datadrivendiscovery.org/types/DirectedEdgeSource
https://metadata.datadrivendiscovery.org/types/UndirectedEdgeSource
https://metadata.datadrivendiscovery.org/types/SimpleEdgeSource
https://metadata.datadrivendiscovery.org/types/MultiEdgeSource
https://metadata.datadrivendiscovery.org/types/EdgeTarget
https://metadata.datadrivendiscovery.org/types/DirectedEdgeTarget
https://metadata.datadrivendiscovery.org/types/UndirectedEdgeTarget
https://metadata.datadrivendiscovery.org/types/SimpleEdgeTarget
https://metadata.datadrivendiscovery.org/types/MultiEdgeTarget
https://metadata.datadrivendiscovery.org/types/ConstructedAttribute
https://metadata.datadrivendiscovery.org/types/SuggestedGroupingKey
https://metadata.datadrivendiscovery.org/types/GroupingKey
Updated core dependencies. Some important packages are now at versions:
scikit-learn: 0.20.3pyarrow: 0.13.0Clarified in primitive interface documentation that if primitive should have been
fitted before calling its produce method, but it has not been, primitive should
raise a PrimitiveNotFittedError exception.
!204
Added to algorithm_types:
EQUI_JOINDATA_RETRIEVALDATA_MAPPINGMAPINFORMATION_THEORETIC_METAFEATURE_EXTRACTIONLANDMARKING_METAFEATURE_EXTRACTIONMODEL_BASED_METAFEATURE_EXTRACTIONSTATISTICAL_METAFEATURE_EXTRACTIONVECTORIZATIONBERTPrimitive family METAFEATURE_EXTRACTION has been renamed to METALEARNING.
!160
Backwards incompatible.
generate_metadata constructor argument to True or call generate method on metadataset_for_value method has been deprecated: generally it can be replaced with generatevalue.metadata = value.metadata.set_for_value(value, generate_metadata=False) remove.value.metadata = new_metadata.set_for_value(value, generate_metadata=False) replace with value.metadata = new_metadata.value.metadata = new_metadata.set_for_value(value, generate_metadata=True) replace with value.metadata = new_metadata.generate(value).clear method has been deprecated: generally you can now instead simply createDataMetadata, potentially calling generate as well:
outputs_metadata = inputs_metadata.clear(new_metadata, for_value=outputs, generate_metadata=True) replace withoutputs_metadata = metadata_base.DataMetadata(metadata).generate(outputs).outputs_metadata = inputs_metadata.clear(for_value=outputs, generate_metadata=False) replace withoutputs_metadata = metadata_base.DataMetadata().container.List, container.ndarray,container.Dataset, container.DataFrame container types and explicitly setgenerate_metadata to True. Alternatively, you can also manually updatetemporary_directory constructor argument pointingDataset object your primitive might create and followup primitives in--scratch command line argumentscratch_dir constructor argument.steps and method_calls made optional in pipeline run schema to allow easier recording of failed pipelines.Metadata has two new methods to query metadata, query_field and query_field_with_exceptionsDataMetadata has a new method query_column_field.DataMetadata's generate method has now compact argument to control it automaticallyALL_ELEMENTS selector segment) or not (default).compact method available on Metadata to compact metadata on demand.ALL_ELEMENTS selector segment).generate_metadata argument of container types' constructors has been switchedTrue to default False to prevent unnecessary and unexpectedDataMetadata has now a method generate which can be used to explicitlyset_for_value and clear have been deprecated and cangenerate call, or creating a new metadataDataFrame.to_csv method to use by defaultDataFrame itself.d3m.metadata.problem.Problem class has--problem command line argument to reference runtimecheck method. This improves metadata performance.Top-level runtime functions now also return Result (or new MultiResult)
objects instead of raising special PipelineRunError exception (which has been
removed) and instead of returning just pipeline run (which is available
inside Result).
#297
!192
Backwards incompatible.
Metrics have been reimplemented to operate on whole predictions DataFrame.
#304
#311
!171
Backwards incompatible.
Pipeline run implementation has been refactored to be in a single class to
facilitate easier subclassing.
#255
#305
!164
Added new semantic types:
https://metadata.datadrivendiscovery.org/types/PrimaryMultiKeyhttps://metadata.datadrivendiscovery.org/types/BoundingPolygonhttps://metadata.datadrivendiscovery.org/types/UnknownTypeRemoved semantic types:
https://metadata.datadrivendiscovery.org/types/BoundingBoxhttps://metadata.datadrivendiscovery.org/types/BoundingBoxXMinhttps://metadata.datadrivendiscovery.org/types/BoundingBoxYMinhttps://metadata.datadrivendiscovery.org/types/BoundingBoxXMaxhttps://metadata.datadrivendiscovery.org/types/BoundingBoxYMaxBackwards incompatible.
Added to primitive_family:
SEMISUPERVISED_CLASSIFICATIONSEMISUPERVISED_REGRESSIONVERTEX_CLASSIFICATIONAdded to task_type:
SEMISUPERVISED_CLASSIFICATIONSEMISUPERVISED_REGRESSIONVERTEX_CLASSIFICATIONAdded to performance_metric:
HAMMING_LOSSRemoved from performance_metric:
ROOT_MEAN_SQUARED_ERROR_AVGBackwards incompatible.
Added https://metadata.datadrivendiscovery.org/types/GPUResourcesUseParameter and
https://metadata.datadrivendiscovery.org/types/CPUResourcesUseParameter semantic types for
primitive hyper-parameters which control the use of GPUs and CPUs (cores), respectively.
You can use these semantic types to mark which hyper-parameter defines a range of how many
GPUs or CPUs (cores), respectively, a primitive can and should use.
#39
!177
Added get_hyperparams and get_volumes helper methods to PrimitiveMetadata
so that it is easier to obtain hyper-parameters definitions class of a primitive.
#163
!175
Pipeline run schema now records the global seed used by the runtime to run the pipeline.
!187
Core package scores output now includes also a random seed column.
#299
!185
Metrics in core packages now take as input whole predictions DataFrame
objects and compute scores over them. So applicability_to_targets metric
method has been removed, and also code which handles the list of target
columns metric used to compute the score. This is not needed anymore
because now all columns are always used by all metrics. Moreover,
corresponding dataset_id and targets fields have been removed from
pipeline run schema.
Core package now requires pip 19 or later to be installed.
--process-dependency-links argument when installing the package is not needed
nor supported anymore.
Primitives should not require use of --process-dependency-links to install
them either. Instead use link dependencies as described in
PEP 508.
#285
!176
Backwards incompatible.
outputs field in parsed problem description has been removed.
#290
!174
Backwards incompatible.
Hyperparameter's value_to_json and value_from_json methods have been
renamed to value_to_json_structure and value_from_json_structure, respectively.
#122
#173
Moved utility functions from common primitives package to core package:
copy_metadata to Metadata.copy_to methodselect_columns to DataFrame.select_columns methodselect_columns_metadata to DataMetadata.select_columns methodlist_columns_with_semantic_types to DataMetadata.list_columns_with_semantic_types methodlist_columns_with_structural_types to DataMetadata.list_columns_with_structural_types methodremove_columns to DataFrame.remove_columns methodremove_columns_metadata to DataMetadata.remove_columns methodappend_columns to DataFrame.append_columns methodappend_columns_metadata to DataMetadata.append_columns methodinsert_columns to DataFrame.insert_columns methodinsert_columns_metadata to DataMetadata.insert_columns methodreplace_columns to DataFrame.replace_columns methodreplace_columns_metadata to DataMetadata.replace_columns methodget_index_columns to DataMetadata.get_index_columns methodhorizontal_concat to DataFrame.horizontal_concat methodhorizontal_concat_metadata to DataMetadata.horizontal_concat methodget_columns_to_use to d3m.base.utils.get_columns_to_use functioncombine_columns to d3m.base.utils.combine_columns functioncombine_columns_metadata to d3m.base.utils.combine_columns_metadata functionset_table_metadata to DataMetadata.set_table_metadata methodget_column_index_from_column_name to DataMetadata.get_column_index_from_column_name methodbuild_relation_graph to Dataset.get_relations_graph methodget_tabular_resource to d3m.base.utils.get_tabular_resource functionget_tabular_resource_metadata to d3m.base.utils.get_tabular_resource_metadata functioncut_dataset to Dataset.select_rows methodUpdated core dependencies. Some important packages are now at versions:
pyarrow: 0.12.1utils registers them.Runtime now makes sure that target columns are never marked as attributes.
#265
!131
When using runtime CLI, pipeline run output is made even in the case of an
exception. Moreover, exception thrown from Result.check_success contains
associated pipeline runs in its pipeline_runs attribute.
#245
!120
Made additional relaxations when reading D3M datasets and problem descriptions
to not require required fields which have defaults.
!128
When loading D3M datasets and problem descriptions, package now just warns
if they have an unsupported schema version and continues to load them.
#247
!119
Added to primitive_family:
NATURAL_LANGUAGE_PROCESSINGvolumes primitive constructor argument.Some enumeration classes were moved and renamed:
d3m.metadata.pipeline.ArgumentType to d3m.metadata.base.ArgumentTyped3m.metadata.pipeline.PipelineContext to d3m.metadata.base.Contextd3m.metadata.pipeline.PipelineStep to d3m.metadata.base.PipelineStepTypeBackwards incompatible.
Added pipeline_run.json JSON schema which describes the results of running a
pipeline as described by the pipeline.json JSON schema. Also implemented
a reference pipeline run output for reference runtime.
#165
!59
When computing primitive digests, primitive's ID is included in the
hash so that digest is not the same for all primitives from the same
package.
#154
When datasets are loaded, digest of their metadata and data can be
computed. To control when this is done, compute_digest argument
to Dataset.load can now take the following ComputeDigest
enumeration values: ALWAYS, ONLY_IF_MISSING (default), and NEVER.
!75
Added digest field to pipeline descriptions. Digest is computed based
on the pipeline document and it helps differentiate between pipelines
with same id. When loading a pipeline, if there
is a digest mismatch a warning is issued. You can use
strict_digest argument to request an exception instead.
#190
!75
Added digest field to problem description metadata.
This digest field is computed based on the problem description document
and it helps differentiate between problem descriptions with same id.
#190
!75
Moved id, version, name, other_names, and description fields
in problem schema to top-level of the problem description. Moreover, made
id required. This aligns it more with the structure of other descriptions we have.
!75
Backwards incompatible.
Pipelines can now provide multiple inputs to the same primitive argument.
In such case runtime wraps those inputs into a List container type, and then
passes the list to the primitive.
#200
!112
Primitives now have a method fit_multi_produce which primitive author can
override to implement an optimized version of both fitting and producing a primitive on same data.
The default implementation just calls set_training_data, fit and produce methods.
If your primitive has non-standard additional arguments in its produce method(s) then you
will have to implement fit_multi_produce method to accept those additional arguments
as well, similarly to how you have had to do for multi_produce.
#117
!110
Could be backwards incompatible.
source, timestamp, and check arguments to all metadata functions and container types'
constructors have been deprecated. You do not have to and should not be providing them anymore.
#171
#172
#173
!108
!109
Primitive's constructor is not run anymore during importing of primitive's class
which allows one to use constructor to load things and do any resource
allocation/reservation. Constructor is now the preferred place to do so.
#158
!107
foreign_key metadata has been extended with RESOURCE type which allows
referencing another resource in the same dataset.
#221
!105
Updated supported D3M dataset and problem schema both to version 3.2.0.
Problem description parsing supports data augmentation metadata.
A new approach for LUPI datasets and problems is now supported,
including runtime support.
Moreover, if dataset's resource name is learningData, it is marked as a
dataset entry point.
#229
#225
#226
!97
A warning is issued if a primitive does not provide a description through
its docstring.
#167
!101
A warning is now issued if an installable primitive is lacking contact or bug
tracker URI metadata.
#178
!81
Pipeline class now has also equals and hash methods which can help
determining if two pipelines are equal in the sense of isomorphism.
!53
Pipeline and pipeline steps classes now has get_all_hyperparams
method to return all hyper-parameters defined for a pipeline and steps.
#222
!104
Implemented a check for primitive Python paths to assure that they adhere
to the new standard of all of them having to be in the form d3m.primitives.primitive_family.primitive_name.kind
(e.g., d3m.primitives.classification.random_forest.SKLearn).
Currently there is a warning if a primitive has a different Python path,
and after January 2019 it will be an error.
For primitive_name segment there is a primitive_names.py
file containing a list of all allowed primitive names.
Everyone is encouraged to help currate this list and suggest improvements (merging, removals, additions)
of values in that list. Initial version was mostly automatically made from an existing list of
values used by current primitives.
#3
!67
Added to semantic types:
https://metadata.datadrivendiscovery.org/types/TokenizableIntoNumericAndAlphaTokenshttps://metadata.datadrivendiscovery.org/types/TokenizableByPunctuationhttps://metadata.datadrivendiscovery.org/types/AmericanPhoneNumberhttps://metadata.datadrivendiscovery.org/types/UnspecifiedStructurehttp://schema.org/emailhttp://schema.org/URLhttp://schema.org/addresshttp://schema.org/Statehttp://schema.org/Cityhttp://schema.org/Countryhttp://schema.org/addressCountryhttp://schema.org/postalCodehttp://schema.org/latitudehttp://schema.org/longitudeUpdated core dependencies. Some important packages are now at versions:
scikit-learn: 0.20.2numpy: 1.15.4pandas: 0.23.4networkx: 2.2pyarrow: 0.11.1Added to algorithm_types:
IDENTITY_FUNCTIONDATA_SPLITTINGBREADTH_FIRST_SEARCHMoved a major part of README to Sphinx documentation which is built
and available at http://docs.datadrivendiscovery.org/.
Added a produce_methods argument to Primitive hyper-parameter class
which allows one to limit matching primitives only to those providing all
of the listed produce methods.
#124
!56
Fixed sample_multiple method of the Hyperparameter class.
#157
!50
Added count to aggregate values in metafeatures.
!52
Clarified and generalized some metafeatures, mostly renamed so that it can be
used on attributes as well:
number_of_classes to number_distinct_valuesclass_entropy to entropy_of_valuesmajority_class_ratio to value_probabilities_aggregate.maxminority_class_ratio to value_probabilities_aggregate.minmajority_class_size to value_counts_aggregate.maxminority_class_size to value_counts_aggregate.minclass_probabilities to value_probabilities_aggregatetarget_values to values_aggregatemeans_of_attributes to mean_of_attributesstandard_deviations_of_attributes to standard_deviation_of_attributescategorical_joint_entropy to joint_entropy_of_categorical_attributesnumeric_joint_entropy to joint_entropy_of_numeric_attributespearson_correlation_of_attributes to pearson_correlation_of_numeric_attributesspearman_correlation_of_attributes to spearman_correlation_of_numeric_attributescanonical_correlation to canonical_correlation_of_numeric_attributesAdded metafeatures:
default_accuracyonerjripnaive_bayes_treenumber_of_string_attributesratio_of_string_attributesnumber_of_other_attributesratio_of_other_attributesattribute_counts_by_structural_typeattribute_ratios_by_structural_typeattribute_counts_by_semantic_typeattribute_ratios_by_semantic_typevalue_counts_aggregatenumber_distinct_values_of_discrete_attributesentropy_of_discrete_attributesjoint_entropy_of_discrete_attributesjoint_entropy_of_attributesmutual_information_of_discrete_attributesequivalent_number_of_discrete_attributesdiscrete_noise_to_signal_ratioAdded special handling when reading scoring D3M datasets (those with true targets in a separate
file targets.csv). When such dataset is detected, the values from the separate file are now
merged into the dataset, and its ID is changed to finish with SCORE suffix. Similarly, an
ID of a scoring problem description gets its suffix changed to SCORE.
#176
Organized semantic types and add to some of them parent semantic types to organize/structure
them better. New parent semantic types added: https://metadata.datadrivendiscovery.org/types/ColumnRole,
https://metadata.datadrivendiscovery.org/types/DimensionType, https://metadata.datadrivendiscovery.org/types/HyperParameter.
Fixed that dateTime column type is mapped to http://schema.org/DateTime semantic
type and not https://metadata.datadrivendiscovery.org/types/Time.
Backwards incompatible.
Updated generated site for metadata and
generate sites describing semantic types.
#33
#114
!37
Optimized resolving of primitives in Resolver to not require loading of
all primitives when loading a pipeline, in the common case.
#162
!38
Added NotFoundError, AlreadyExistsError, and PermissionDeniedError
exceptions to d3m.exceptions.
Pipeline's to_json_structure, to_json, and to_yaml now have nest_subpipelines
argument which allows conversion with nested sub-pipelines instead of them
being only referenced.
Made sure that Arrow serialization of metadata does not pickle also linked
values (for_value).
Made sure enumerations are picklable.
PerformanceMetric class now has best_value and worst_value which
return the range of possible values for the metric. Moreover, normalize
method normalizes the metric's value to a range between 0 and 1.
Load D3M dataset qualities only after data is loaded. This fixes
lazy loading of datasets with qualities which was broken before.
Added load_all_primitives argument to the default pipeline Resolver
which allows one to control loading of primitives outside of the resolver.
Added primitives_blacklist argument to the default pipeline Resolver
which allows one to specify a collection of primitive path prefixes to not
(try to) load.
Fixed return value of the fit method in TransformerPrimitiveBase.
It now correctly returns CallResult instead of None.
Fixed a typo and renamed get_primitive_hyparparams to get_primitive_hyperparams
in PrimitiveStep.
Backwards incompatible.
Additional methods were added to the Pipeline class and step classes,
to support runtime and easier manipulation of pipelines programmatically
(get_free_hyperparams, get_input_data_references, has_placeholder,
replace_step, get_exposable_outputs).
Added reference implementation of the runtime. It is available
in the d3m.runtime module. This module also has an extensive
command line interface you can access through python3 -m d3m.runtime.
#115
!57
!72
GeneratorPrimitiveBase interface has been changed so that produce method
accepts a list of non-negative integers as an input instead of a list of None values.
This allows for batching and control by the caller which outputs to generate.
Previously outputs would depend on number of calls to produce and number of outputs
requested in each call. Now these integers serve as an index into the set of potential
outputs.
Backwards incompatible.
We now try to preserve metadata log in default implementation of can_accept.
Added sample_rate field to dimension metadata.
python3 -m d3m.index download command now accepts --prefix argument to limit the
primitives for which static files are downloaded. Useful for testing.
Added check argument to DataMetadata's update and remove methods which allows
one to control if selector check against for_value should be done or not. When
it is known that selector is valid, not doing the check can speed up those methods.
Defined metadata field file_columns which allows to store known columns metadata for
tables referenced from columns. This is now used by a D3M dataset reader to store known
columns metadata for collections of CSV files. Previously, this metadata was lost despite
being available in Lincoln Labs dataset metadata.
OBJECT_DETECTION_AVERAGE_PRECISION metric supports operation onParams fails to type check, a name of the parameter is nowpython3 -m d3m.index has now additional command download which downloads all staticvolumeslog_likelihoods, log_likelihood, losses, and losscan_accept receives primitive arguments and not just method arguments.https://metadata.datadrivendiscovery.org/types/FilesCollection for resources which arehttps://metadata.datadrivendiscovery.org/types/Confidence semantic type for columnsconfidence_for metadata which can help confidence column refercan_accept implementation to return type unwrapped from CallResult.DataMetadata.remove to preserve for_value value (and allow it to be set through the call).query_with_exceptions metadata method to correctly return exceptions forprimitive_family:
SCHEMA_DISCOVERYDATA_AUGMENTATIONalgorithm_types:
HEURISTICMARKOV_RANDOM_FIELDLEARNING_USING_PRIVILEGED_INFORMATIONAPPROXIMATE_DATA_AUGMENTATIONPrimitiveNotFittedError, DimensionalityMismatchError, and MissingValueErrord3m.exceptions.video/avi media type to lists of known media types.Union type.{}) from metadata when empty dicts werepython_path matches the path under whichd3m.primitives. This also preventsd3m.primitives.<name1>.<name2>.<primitive>).Metadata class got additional methods to manipulate metadata:
remove(selector) removes metadata at selector.query_with_exceptions(selector) to return metadata for selectors whichALL_ELEMENTS.add_semantic_type, has_semantic_type, remove_semantic_type,get_elements_with_semantic_type to help with semantic types.query_column, update_column, remove_column, get_columns_with_semantic_typeContainer List now inherits from a regular Python list and not from typing.List.
It does not have anymore a type variable. Typing information is stored in metadata
anyway (structural_type). This simplifies type checking (and improves performance)
and fixes pickling issues.
Backwards incompatible.
Hyperparams class' defaults method now accepts optional path argument which
allows one to fetch defaults from nested hyper-parameters.
Hyperparameters class and its subclasses now have get_default method instead
of a property default.
Backwards incompatible.
Hyperparams class got a new method replace which makes it easier to modify
hyper-parameter values.
Set hyper-parameter can now accept also a hyper-parameters configuration as elements
which allows one to define a set of multiple hyper-parameters per each set element.
#94
Pipeline's check method now checks structural types of inputs and outputs and assures
they match.
!19
Set hyper-parameter now uses tuple of unique elements instead of set to represent the set.
This assures that the order of elements is preserved to help with reproducibility when
iterating over a set.
Backwards incompatible.
#109
Set hyper-parameter can now be defined without max_samples argument to allow a set
without an upper limit on the number of elements.
min_samples and max_samples arguments to Set constructor have been switched as
a consequence, to have a more intuitive order.
Similar changes have been done to sample_multiple method of hyper-parameters.
Backwards incompatible.
#110
Core dependencies have been upgraded: numpy==1.14.3. pytypes is now a released version.
When converting a numpy array with more than 2 dimensions to a DataFrame, higher dimensions are
automatically converted to nested numpy arrays inside a DataFrame.
#80
Metadata is now automatically preserved when converting between container types.
#76
Basic metadata for data values is now automatically generated when using D3M container types.
Value is traversed over its structure and structural_type and dimension with its length
keys are populated. Some semantic_types are added in simple cases, and dimension's
name as well. In some cases analysis of all data to generate metadata can take time,
so you might consider disabling automatic generation by setting generate_metadata
to False in container's constructor or set_for_value calls and then manually populating
necessary metadata.
#35
!6
!11
When reading D3M datasets, media_types metadata now includes proper media types
for the column, and also media type for each particular row (file).
D3M dataset and problem description parsing has been updated to 3.1.2 version:
Dataset class now supports loading edgeList resources.primitive_family now includes OBJECT_DETECTION.task_type now includes OBJECT_DETECTION.performance_metrics now includes PRECISION, RECALL, OBJECT_DETECTION_AVERAGE_PRECISION.targets of a problem description now includes clusters_number.boundary_for can now describe for which other columnrealVector, json and geojson column types.boundingBox column role.https://metadata.datadrivendiscovery.org/types/EdgeListhttps://metadata.datadrivendiscovery.org/types/FloatVectorhttps://metadata.datadrivendiscovery.org/types/JSONhttps://metadata.datadrivendiscovery.org/types/GeoJSONhttps://metadata.datadrivendiscovery.org/types/Intervalhttps://metadata.datadrivendiscovery.org/types/IntervalStarthttps://metadata.datadrivendiscovery.org/types/IntervalEndhttps://metadata.datadrivendiscovery.org/types/BoundingBoxhttps://metadata.datadrivendiscovery.org/types/BoundingBoxXMinhttps://metadata.datadrivendiscovery.org/types/BoundingBoxYMinhttps://metadata.datadrivendiscovery.org/types/BoundingBoxXMaxhttps://metadata.datadrivendiscovery.org/types/BoundingBoxYMaxUnified the naming of attributes/features metafeatures to attributes.
Backwards incompatible.
!13
Unified the naming of categorical/nominal metafeatures to categorical.
Backwards incompatible.
!12
Added more metafeatures:
pcarandom_treedecision_stumpnaive_bayeslinear_discriminant_analysisknn_1_neighborc45_decision_treerep_treecategorical_joint_entropynumeric_joint_entropynumber_distinct_values_of_numeric_featuresclass_probabilitiesnumber_of_featuresnumber_of_instancescanonical_correlationentropy_of_categorical_featuresentropy_of_numeric_featuresequivalent_number_of_categorical_featuresequivalent_number_of_numeric_featuresmutual_information_of_categorical_featuresmutual_information_of_numeric_featurescategorical_noise_to_signal_rationumeric_noise_to_signal_ratioAdded metafeatures for present values:
number_of_instances_with_present_valuesratio_of_instances_with_present_valuesnumber_of_present_valuesratio_of_present_valuesImplemented interface for saving datasets.
#31
To remove a key in metadata, instead of using None value one should now use
special NO_VALUE value.
Backwards incompatible.
None is now serialized to JSON as null instead of string "None".
Could be backwards incompatible.
Unified naming and behavior of methods dealing with JSON and JSON-related
data. Now across the package:
to_json_structure returns a structure with values fully compatible with JSON and serializable with default JSON serializerto_simple_structure returns a structure similar to JSON, but with values left as Python valuesto_json returns serialized value as JSON stringBackwards incompatible.
Hyper-parameters are now required to specify at least one
semantic type from: https://metadata.datadrivendiscovery.org/types/TuningParameter,
https://metadata.datadrivendiscovery.org/types/ControlParameter,
https://metadata.datadrivendiscovery.org/types/ResourcesUseParameter,
https://metadata.datadrivendiscovery.org/types/MetafeatureParameter.
Backwards incompatible.
Made type strings in primitive annotations deterministic.
#93
Reimplemented primitives loading code to load primitives lazily.
#74
d3m.index module now has new and modified functions:
search now returns a list of Python paths of all potentialget_primitive loads and returns a primitive given its Python pathget_primitive_by_id returns a primitive given its ID, but a primitiveget_loaded_primitives returns a list of all currently loaded primitivesload_all tries to load all primitivesregister_primitive now accepts full Python path instead of just suffixBackwards incompatible.
#74
Defined model_features primitive metadata to describe features supported
by an underlying model. This is useful to allow easy matching between
problem's subtypes and relevant primitives.
#88
Made hyper-parameter space of an existing Hyperparams subclass immutable.
#91
d3m.index describe command now accept -s/--sort-keys argument which
makes all keys in the JSON output sorted, making output JSON easier to
diff and compare.
can_accept now gets a hyperparams object with hyper-parameters under
which to check a method call. This allows can_accept to return a result
based on control hyper-parameters.
Backwards incompatible.
#81
Documented that all docstrings should be made according to
numpy docstring format.
#85
Added to semantic types:
https://metadata.datadrivendiscovery.org/types/MissingDatahttps://metadata.datadrivendiscovery.org/types/InvalidDatahttps://metadata.datadrivendiscovery.org/types/RedactedTargethttps://metadata.datadrivendiscovery.org/types/RedactedPrivilegedDataAdded to primitive_family:
TIME_SERIES_EMBEDDINGAdded to algorithm_types:
IVECTOR_EXTRACTIONRemoved SparseDataFrame from standard container types because it is being
deprecated in Pandas.
Backwards incompatible.
#95
Defined other_names metadata field for any other names a value might have.
Optimized primitives loading time.
#87
Made less pickling of values when hyper-parameter has Union structural type.
#83
DataMetadata.set_for_value now first checks new value against the metadata, by default.
Could be backwards incompatible.
Added NO_NESTED_VALUES primitive precondition and effect.
This allows primitive to specify if it cannot handle values where a container value
contains nested other values with dimensions.
Added pipeline.json JSON schema to this package. Made problem.json JSON schema
describing parsed problem description's schema. There is also a d3m.metadata.pipeline
parser for pipelines in this schema and Python object to represent a pipeline.
#53
Updated README to make it explicit that for tabular data the first dimension
is always rows and the second always columns, even in the case of a DataFrame
container type.
#54
Made Dataset container type return Pandas DataFrame instead of numpy ndarray
and in generaly suggest to use Pandas DataFrame as a default container type.
Backwards incompatible.
#49
Added UniformBool hyper-parameter class.
Renamed FeaturizationPrimitiveBase to FeaturizationLearnerPrimitiveBase.
Backwards incompatible.
Defined ClusteringTransformerPrimitiveBase and renamed ClusteringPrimitiveBase
to ClusteringLearnerPrimitiveBase.
Backwards incompatible.
#20
Added inputs_across_samples decorator to mark which method arguments
are inputs which compute across samples.
#19
Converted SingletonOutputMixin to a singleton decorator. This allows
each produce method separately to be marked as a singleton produce method.
Backwards incompatible.
#17
can_accept can also raise an exception with information why it cannot accept.
#13
Added Primitive hyper-parameter to describe a primitive or primitives.
Additionally, documented in docstrings better how to define hyper-parameters which
use primitives for their values and how should such primitives-as-values be passed
to primitives as their hyper-parameters.
#51
Hyper-parameter values can now be converted to and from JSON-compatible structure
using values_to_json and values_from_json methods. Non-primitive values
are pickled and stored as base64 strings.
#67
Added Choice hyper-parameter which allows one to define
combination of hyper-parameters which should exists together.
#28
Added Set hyper-parameter which samples multiple times another hyper-parameter.
#52
Added https://metadata.datadrivendiscovery.org/types/MetafeatureParameter
semantic type for hyper-parameters which control which meta-features are
computed by the primitive.
#41
Added supported_media_types primitive metadata to describe
which media types a primitive knows how to manipulate.
#68
Renamed metadata property mime_types to media_types.
Backwards incompatible.
Made pyarrow dependency a package extra. You can depend on it using
d3m[arrow].
#66
Added multi_produce method to primitive interface which allows primitives
to optimize calls to multiple produce methods they might have.
#21
Added d3m.utils.redirect_to_logging context manager which can help
redirect primitive's output to stdout and stderr to primitive's logger.
#65
Primitives can now have a dependency on static files and directories.
One can use FILE and TGZ entries in primitive's installation
metadata to ask the caller to provide paths those files and/or
extracted directories through new volumes constructor argument.
#18
Core dependencies have been upgraded: numpy==1.14.2, networkx==2.1.
LUPI quality in D3M datasets is now parsed into
https://metadata.datadrivendiscovery.org/types/SuggestedPrivilegedData
semantic type for a column.
#61
Support for primitives using Docker containers has been put on hold.
We are keeping a way to pass information about running containers to a
primitive and defining dependent Docker images in metadata, but currently
it is not expected that any runtime running primitives will run
Docker containers for a primitive.
#18
Primitives do not have to define all constructor arguments anymore.
This allows them to ignore arguments they do not use, e.g.,
docker_containers.
On the other side, when creating an instance of a primitive, one
has now to check which arguments the constructor accepts, which is
available in primitive's metadata:
primitive.metadata.query()['primitive_code'].get('instance_methods', {})['__init__']['arguments'].
#63
Information about running primitive's Docker container has changed
from just its address to a DockerContainer tuple containing both
the address and a map of all exposed ports.
At the same time, support for Docker has been put on hold so you
do not really have to upgrade for this change anything and can simply
remove the docker_containers argument from primitive's constructor.
Backwards incompatible.
#14
Multiple exception classes have been defined in d3m.exceptions
module and are now in use. This allows easier and more precise
handling of exceptions.
#12
Fixed inheritance of Hyperparams class.
#44
Each primitive's class now automatically gets an instance of
Python's logging
logger stored into its logger class attribute. The instance is made
under the name of primitive's python_path metadata value. Primitives
can use this logger to log information at various levels (debug, warning,
error) and even associate extra data with log record using the extra
argument to the logger calls.
#10
Made sure container data types can be serialized with Arrow/Plasma
while retaining their metadata.
#29
Scores in GradientCompositionalityMixin replaced with Gradients.
Scores only makes sense in a probabilistic context.
Renamed TIMESERIES_CLASSIFICATION, TIMESERIES_FORECASTING, and
TIMESERIES_SEGMENTATION primitives families to
TIME_SERIES_CLASSIFICATION, TIME_SERIES_FORECASTING, and
TIME_SERIES_SEGMENTATION, respectively, to match naming
pattern used elsewhere.
Similarly, renamed UNIFORM_TIMESERIES_SEGMENTATION algorithm type
to UNIFORM_TIME_SERIES_SEGMENTATION.
Compound words using hyphens are separated, but hyphens for prefixes
are not separated. So "Time-series" and "Root-mean-squared error"
become TIME_SERIES and ROOT_MEAN_SQUARED_ERROR
but "Non-overlapping" and "Multi-class" are NONOVERLAPPING and MULTICLASS.
Backwards incompatible.
Updated performance metrics to include PRECISION_AT_TOP_K metric.
Added to problem description parsing support for additional metric
parameters and updated performance metric functions to use them.
#42
Merged d3m_metadata, primitive_interfaces and d3m repositories
into d3m repository. This requires the following changes of
imports in existing code:
d3m_metadata to d3m.metadataprimitive_interfaces to d3m.primitive_interfacesd3m_metadata.container to d3m.containerd3m_metadata.metadata to d3m.metadata.based3m_metadata.metadata.utils to d3m.utilsd3m_metadata.metadata.types to d3m.typesBackwards incompatible.
#11
Fixed computation of sampled values for LogUniform hyper-parameter class.
#47
When copying or slicing container values, metadata is now copied over
instead of cleared. This makes it easier to propagate metadata.
This also means one should make sure to update the metadata in the
new container value to reflect changes to the value itself.
Could be backwards incompatible.
DataMetadata now has set_for_value method to make a copy of
metadata and set new for_value value. You can use this when you
made a new value and you want to copy over metadata, but you also
want this value to be associated with metadata. This is done by
default for container values.
Metadata now includes SHA256 digest for primitives and datasets.
It is computed automatically during loading. This should allow one to
track exact version of primitive and datasets used.
d3m.container.dataset.get_d3m_dataset_digest is a reference
implementation of computing digest for D3M datasets.
You can set compute_digest to False to disable this.
You can set strict_digest to True to raise an exception instead
of a warning if computed digest does not match one in metadata.
Datasets can be now loaded in "lazy" mode: only metadata is loaded
when creating a Dataset object. You can use is_lazy method to
check if dataset iz lazy and data has not yet been loaded. You can use
load_lazy to load data for a lazy object, making it non-lazy.
There is now an utility metaclass d3m.metadata.utils.AbstractMetaclass
which makes classes which use it automatically inherit docstrings
for methods from the parent. Primitive base class and some other D3M
classes are now using it.
d3m.metadata.base.CONTAINER_SCHEMA_VERSION and
d3m.metadata.base.DATA_SCHEMA_VERSION were fixed to point to the
correct URI.
Many data_metafeatures properties in metadata schema had type
numeric which does not exist in JSON schema. They were fixed to
number.
Added to a list of known semantic types:
https://metadata.datadrivendiscovery.org/types/Target,
https://metadata.datadrivendiscovery.org/types/PredictedTarget,
https://metadata.datadrivendiscovery.org/types/TrueTarget,
https://metadata.datadrivendiscovery.org/types/Score,
https://metadata.datadrivendiscovery.org/types/DatasetEntryPoint,
https://metadata.datadrivendiscovery.org/types/SuggestedPrivilegedData,
https://metadata.datadrivendiscovery.org/types/PrivilegedData.
Added to algorithm_types: ARRAY_CONCATENATION, ARRAY_SLICING,
ROBUST_PRINCIPAL_COMPONENT_ANALYSIS, SUBSPACE_CLUSTERING,
SPECTRAL_CLUSTERING, RELATIONAL_ALGEBRA, MULTICLASS_CLASSIFICATION,
MULTILABEL_CLASSIFICATION, OVERLAPPING_CLUSTERING, SOFT_CLUSTERING,
STRICT_PARTITIONING_CLUSTERING, STRICT_PARTITIONING_CLUSTERING_WITH_OUTLIERS,
UNIVARIATE_REGRESSION, NONOVERLAPPING_COMMUNITY_DETECTION,
OVERLAPPING_COMMUNITY_DETECTION.
location_uris metadata.#egg= package URI suffix in metadata.get_params and set_params in__getstate__ and __setstate__ methods.RandomPrimitive test primitive.numpy dependency to 1.14 and pandas to 0.22.https://metadata.datadrivendiscovery.org/types/ResourcesUseParameter as a known URIsemantic_types to help convey which hyper-parameters control the use of resources by thenumpy values in Params and Hyperparams.upper_inclusive argument to UniformInt, Uniform, and LogUniform classessemantic_types and description keyword-only arguments in hyper-parameter description classes.Hyperparams subclasses can be pickled and unpickled.NO_JAGGED_VALUES to preconditions and effects.algorithm_types: AGGREGATE_FUNCTION, AUDIO_STREAM_MANIPULATION, BACKWARD_DIFFERENCE_CODING,BAYESIAN_LINEAR_REGRESSION, CATEGORY_ENCODER, CROSS_VALIDATION, DISCRETIZATION, ENCODE_BINARY,ENCODE_ORDINAL, FEATURE_SCALING, FORWARD_DIFFERENCE_CODING, FREQUENCY_TRANSFORM, GAUSSIAN_PROCESS,HASHING, HELMERT_CODING, HOLDOUT, K_FOLD, LEAVE_ONE_OUT, MERSENNE_TWISTER, ORTHOGONAL_POLYNOMIAL_CODING,PASSIVE_AGGRESSIVE, PROBABILISTIC_DATA_CLEANING, QUADRATIC_DISCRIMINANT_ANALYSIS, RECEIVER_OPERATING_CHARACTERISTIC,RELATIONAL_DATA_MINING, REVERSE_HELMERT_CODING, SEMIDEFINITE_EMBEDDING, SIGNAL_ENERGY, SOFTMAX_FUNCTION,SPRUCE, STOCHASTIC_GRADIENT_DESCENT, SUM_CODING, TRUNCATED_NORMAL_DISTRIBUTION, UNIFORM_DISTRIBUTION.primitive_family: DATA_GENERATION, DATA_VALIDATION, DATA_WRANGLING, VIDEO_PROCESSING.NoneType to the list of data types allowed inside container types.PIP dependencies specified by a package_uri git URI, an #egg=package_name URI suffix is--process-dependency-links argument during installation.learning_rate and weight_decay in GradientCompositionalityMixin renamed tofine_tune_learning_rate and fine_tune_weight_decay, respectively.learning_rate is a common hyper-parameter name.https://metadata.datadrivendiscovery.org/types/TuningParameter andhttps://metadata.datadrivendiscovery.org/types/ControlParameter as two known URIs forsemantic_types to help convey which hyper-parameters are true tuning parameters (should beinstallation metadata optional. This allows local-only primitives.d3m.index.register_primitive.q argument.PIP dependency package has to be installed with --process-dependency-link argumentpackage_uri with both --process-dependency-link and --editable, so that primitives can have accessgit+http and git+https URI schemes are allowed for git repository URIs for package_uri.algorithm_types: AUDIO_MIXING, CANONICAL_CORRELATION_ANALYSIS, DATA_PROFILING, DEEP_FEATURE_SYNTHESIS,INFORMATION_ENTROPY, MFCC_FEATURE_EXTRACTION, MULTINOMIAL_NAIVE_BAYES, MUTUAL_INFORMATION, PARAMETRIC_TRAJECTORY_MODELING,SIGNAL_DITHERING, SIGNAL_TO_NOISE_RATIO, STATISTICAL_MOMENT_ANALYSIS, UNIFORM_TIMESERIES_SEGMENTATION.primitive_family: SIMILARITY_MODELING, TIMESERIES_CLASSIFICATION, TIMESERIES_SEGMENTATION.produce method for ClusteringPrimitiveBase and addedClusteringDistanceMatrixMixin.can_accept class method to primitive base class and implemented itsParams should now be a subclass of d3m.metadata.params.Params, which is aGraph class. There is no need for it anymore because we can identifytimeout and iterations arguments to more methods.forward and backward backprop methods to GradientCompositionalityMixinlog_likelihoods method to ProbabilisticCompositionalityMixin.docker_containers argument with addresses of runningCallMetadata and get_call_metadata and changed so that some methodsCallResult.produce but different output types.SingletonOutputMixin to signal that primitive's output containsget_loss_primitive to allow accessing to the loss primitiveset_training_data back to the base class.__metadata__ to metadata attribute.set_random_seed method has been removed and replaced with arandom_seed argument to the constructor, which is also exposed as an attribute.hyperparams attribute which returns aParams and Hyperparams are now required to be pickable and copyable.Hyperparams type variable as well.LossFunctionMixin's get_loss_function method now returns a value fromMetric enumeration.LossFunctionMixin has now a loss and losses methods which allows oneParams class.Graph class in favor of NetworkX Graph class.Metadata class with subclasses and documented the use of selectors.Hyperparams class.Dataset class.d3m.container and not under d3m.metadata.sequence anymore.__metadata__ attribute was renamed to metadata.d3m_types to d3m_metadata.d3m.metadata.problem module.d3m.index command tool rewritten to support three commands: search, discover,describe. See details by running python -m d3m.index -h.d3m.index module with API to register primitives into a d3m.primitives moduled3m.index is also a command-line tool to list available primitives and automaticallyd3m.primitives module which automatically populates itself with primitives