You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

interfaces.rst 12 kB

first commit Former-commit-id: 08bc23ba02cffbce3cf63962390a65459a132e48 [formerly 0795edd4834b9b7dc66db8d10d4cbaf42bbf82cb] [formerly b5010b42541add7e2ea2578bf2da537efc457757 [formerly a7ca09c2c34c4fc8b3d8e01fcfa08eeeb2cae99d]] [formerly 615058473a2177ca5b89e9edbb797f4c2a59c7e5 [formerly 743d8dfc6843c4c205051a8ab309fbb2116c895e] [formerly bb0ea98b1e14154ef464e2f7a16738705894e54b [formerly 960a69da74b81ef8093820e003f2d6c59a34974c]]] [formerly 2fa3be52c1b44665bc81a7cc7d4cea4bbf0d91d5 [formerly 2054589f0898627e0a17132fd9d4cc78efc91867] [formerly 3b53730e8a895e803dfdd6ca72bc05e17a4164c1 [formerly 8a2fa8ab7baf6686d21af1f322df46fd58c60e69]] [formerly 87d1e3a07a19d03c7d7c94d93ab4fa9f58dada7c [formerly f331916385a5afac1234854ee8d7f160f34b668f] [formerly 69fb3c78a483343f5071da4f7e2891b83a49dd18 [formerly 386086f05aa9487f65bce2ee54438acbdce57650]]]] Former-commit-id: a00aed8c934a6460c4d9ac902b9a74a3d6864697 [formerly 26fdeca29c2f07916d837883983ca2982056c78e] [formerly 0e3170d41a2f99ecf5c918183d361d4399d793bf [formerly 3c12ad4c88ac5192e0f5606ac0d88dd5bf8602dc]] [formerly d5894f84f2fd2e77a6913efdc5ae388cf1be0495 [formerly ad3e7bc670ff92c992730d29c9d3aa1598d844e8] [formerly 69fb3c78a483343f5071da4f7e2891b83a49dd18]] Former-commit-id: 3c19c9fae64f6106415fbc948a4dc613b9ee12f8 [formerly 467ddc0549c74bb007e8f01773bb6dc9103b417d] [formerly 5fa518345d958e2760e443b366883295de6d991c [formerly 3530e130b9fdb7280f638dbc2e785d2165ba82aa]] Former-commit-id: 9f5d473d42a435ec0d60149939d09be1acc25d92 [formerly be0b25c4ec2cde052a041baf0e11f774a158105d] Former-commit-id: 9eca71cb73ba9edccd70ac06a3b636b8d4093b04
5 years ago
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248
  1. TA1 API for primitives
  2. ====================================
  3. A collection of standard Python interfaces for TA1 primitives. All
  4. primitives should extend one of the base classes available and
  5. optionally implement available mixins.
  6. Design principles
  7. -----------------
  8. Standard TA1 primitive interfaces have been designed to be possible for
  9. TA2 systems to call primitives automatically and combine them into
  10. pipelines.
  11. Some design principles applied:
  12. - Use of a de facto standard language for "glue" between different
  13. components and libraries, Python.
  14. - Use of keyword-only arguments for all methods so that caller does not
  15. have to worry about the order of arguments.
  16. - Every primitive should implement only one functionality, more or less
  17. a function, with clear inputs and outputs. All parameters of the
  18. function do not have to be known in advance and function can be
  19. "fitted" as part of the training step of the pipeline.
  20. - Use of Python 3 typing extensions to annotate methods and classes
  21. with typing information to make it easier for TA2 systems to prune
  22. incompatible combinations of inputs and outputs and to reuse existing
  23. Python type-checking tooling.
  24. - Typing information can serve both detecting issues and
  25. incompatibilities in primitive implementations and help with pipeline
  26. construction.
  27. - All values being passed through a primitive have metadata associated
  28. with them.
  29. - Primitives can operate only at a metadata level to help guide the
  30. pipeline construction process without having to operate on data
  31. itself.
  32. - Primitive metadata is close to the source, primitive code, and not in
  33. separate files to minimize chances that it is goes out of sync.
  34. Metadata which can be automatically determined from the code should
  35. be automatically determined from the code. Similarly for data
  36. metadata.
  37. - All randomness of primitives is captured by a random seed argument to
  38. assure reproducibility.
  39. - Operations can work in iterations, under time budgets, and caller
  40. might not always want to compute values fully.
  41. - Through use of mixins primitives can signal which capabilities they
  42. support.
  43. - Primitives are to be composed and executed in a data-flow manner.
  44. Main concepts
  45. -------------
  46. Interface classes, mixins, and methods are documented in detail through
  47. use of docstrings and typing annotations. Here we note some higher-level
  48. concept which can help understand basic ideas behind interfaces and what
  49. they are trying to achieve, the big picture. This section is not
  50. normative.
  51. A primitive should extend one of the base classes available and
  52. optionally mixins as well. Not all mixins apply to all primitives. That
  53. being said, you probably do not want to subclass ``PrimitiveBase``
  54. directly, but instead one of other base classes to signal to a caller
  55. more about what your primitive is doing. If your primitive belong to a
  56. larger set of primitives no exiting non-\ ``PrimitiveBase`` base class
  57. suits well, consider suggesting that a new base class is created by
  58. opening an issue or making a merge request.
  59. Base class and mixins have generally four type arguments you have to
  60. provide: ``Inputs``, ``Outpus``, ``Params``, and ``Hyperparams``. One
  61. can see a primitive as parameterized by those four type arguments. You
  62. can access them at runtime through metadata:
  63. .. code:: python
  64. FooBarPrimitive.metadata.query()['class_type_arguments']
  65. ``Inputs`` should be set to a primary input type of a primitive.
  66. Primary, because you can define additional inputs your primitive might
  67. need, but we will go into these details later. Similarly for
  68. ``Outputs``. ``produce`` method then produces outputs from inputs. Other
  69. primitive methods help the primitive (and its ``produce`` method)
  70. achieve that, or help the runtime execute the primitive as a whole, or
  71. optimize its behavior.
  72. Both ``Inputs`` and ``Outputs`` should be of a
  73. :ref:`container_types`. We allow a limited set of value types being
  74. passed between primitives so that both TA2 and TA3 systems can
  75. implement introspection for those values if needed, or user interface
  76. for them, etc. Moreover this allows us also to assure that they can be
  77. efficiently used with Arrow/Plasma store.
  78. Container values can then in turn contain values of an :ref:`extended but
  79. still limited set of data types <data_types>`.
  80. Those values being passed between primitives also hold metadata.
  81. Metadata is available on their ``metadata`` attribute. Metadata on
  82. values is stored in an instance of
  83. :class:`~d3m.metadata.base.DataMetadata` class. This is a
  84. reason why we have :ref:`our own versions of some standard container
  85. types <container_types>`: to have the ``metadata`` attribute.
  86. All metadata is immutable and updating a metadata object returns a new,
  87. updated, copy. Metadata internally remembers the history of changes, but
  88. there is no API yet to access that. But the idea is that you will be
  89. able to follow the whole history of change to data in a pipeline through
  90. metadata. See :ref:`metadata API <metadata_api>` for more information
  91. how to manipulate metadata.
  92. Primitives have a similar class ``PrimitiveMetadata``, which when
  93. created automatically analyses its primitive and populates parts of
  94. metadata based on that. In this way author does not have to have
  95. information in two places (metadata and code) but just in code and
  96. metadata is extracted from it. When possible. Some metadata author of
  97. the primitive stil has to provide directly.
  98. Currently most standard interface base classes have only one ``produce``
  99. method, but design allows for multiple: their name has to be prefixed
  100. with ``produce_``, have similar arguments and same semantics as all
  101. produce methods. The main motivation for this is that some primitives
  102. might be able to expose same results in different ways. Having multiple
  103. produce methods allow the caller to pick which type of the result they
  104. want.
  105. To keep primitive from outside simple and allow easier compositionality
  106. in pipelines, primitives have arguments defined per primitive and not
  107. per their method. The idea here is that once a caller satisfies
  108. (computes a value to be passed to) an argument, any method which
  109. requires that argument can be called on a primitive.
  110. There are three types of arguments:
  111. - pipeline – arguments which are provided by the pipeline, they are
  112. required (otherwise caller would be able to trivially satisfy them by
  113. always passing ``None`` or another default value)
  114. - runtime – arguments which caller provides during pipeline execution
  115. and they control various aspects of the execution
  116. - hyper-parameter – a method can declare that primitive's
  117. hyper-parameter can be overridden for the call of the method, they
  118. have to match hyper-parameter definition
  119. Methods can accept additional pipeline and hyper-parameter arguments and
  120. not just those from the standard interfaces.
  121. Produce methods and some other methods return results wrapped in
  122. ``CallResult``. In this way primitives can expose information about
  123. internal iterative or optimization process and allow caller to decide
  124. how long to run.
  125. When calling a primitive, to access ``Hyperparams`` class you can do:
  126. .. code:: python
  127. hyperparams_class = FooBarPrimitive.metadata.query()['class_type_arguments']['Hyperparams']
  128. You can now create an instance of the class by directly providing values
  129. for hyper-parameters, use available simple sampling, or just use default
  130. values:
  131. .. code:: python
  132. hp1 = hyperparams_class({'threshold': 0.01})
  133. hp2 = hyperparams_class.sample(random_state=42)
  134. hp3 = hyperparams_class.defaults
  135. You can then pass those instances as the ``hyperparams`` argument to
  136. primitive's constructor.
  137. Author of a primitive has to define what internal parameters does the
  138. primitive have, if any, by extending the ``Params`` class. It is just a
  139. fancy dict, so you can both create an instance of it in the same way,
  140. and access its values:
  141. .. code:: python
  142. class Params(params.Params):
  143. coefficients: numpy.ndarray
  144. ps = Params({'coefficients': numpy.array[1, 2, 3]})
  145. ps['coefficients']
  146. ``Hyperparams`` class and ``Params`` class have to be pickable and
  147. copyable so that instances of primitives can be serialized and restored
  148. as needed.
  149. Primitives (and some other values) are uniquely identified by their ID
  150. and version. ID does not change through versions.
  151. Primitives should not modify in-place any input argument but always
  152. first make a copy before any modification.
  153. Checklist for creating a new primitive
  154. --------------------------------------
  155. 1. Implement as many interfaces as are applicable to your
  156. primitive. An up-to-date list of mixins you can implement can be
  157. found at
  158. <https://gitlab.com/datadrivendiscovery/d3m/blob/devel/d3m/primitive_interfaces/base.py>
  159. 2. Create unit tests to test all methods you implement
  160. 3. Include all relevant hyperparameters and use appropriate
  161. ``Hyperparameter`` subclass for specifying the range of values a
  162. hyperparameter can take. Try to provide good default values where
  163. possible. Also include all relevant ``semantic_types``
  164. <https://metadata.datadrivendiscovery.org/types/>
  165. 4. Include ``metadata`` and ``__author__`` fields in your class
  166. definition. The ``__author__`` field should include a name or team
  167. as well as email. The ``metadata`` object has many fields which should
  168. be filled in:
  169. * id, this is a uuid unique to this primitive. It can be generated with :code:`import uuid; uuid.uuid4()`
  170. * version
  171. * python_path, the name you want to be import this primitive through
  172. * keywords, keywords you want your primitive to be discovered by
  173. * installation, how to install the package which has this primitive. This is easiest if this is just a python package on PyPI
  174. * algorithm_types, specify which PrimitiveAlgorithmType the algorithm is, a complete list can be found in TODO
  175. * primitive_family, specify the broad family a primitive falls under, a complete list can be found in TODO
  176. * hyperparameters_to_tune, specify which hyperparameters you would prefer a TA2 system tune
  177. 5. Make sure primitive uses the correct container type
  178. 6. If container type is a dataframe, specify which column is the
  179. target value, which columns are the input values, and which columns
  180. are the output values.
  181. 7. Create an example pipeline which includes this primitive and uses one of the seed datasets as input.
  182. Examples
  183. --------
  184. Examples of simple primitives using these interfaces can be found `in
  185. this
  186. repository <https://gitlab.com/datadrivendiscovery/tests-data/tree/master/primitives>`__:
  187. - `MonomialPrimitive <https://gitlab.com/datadrivendiscovery/tests-data/blob/master/primitives/test_primitives/monomial.py>`__
  188. is a simple regressor which shows how to use ``container.List``,
  189. define and use ``Params`` and ``Hyperparams``, and implement multiple
  190. methods needed by a supervised learner primitive
  191. - `IncrementPrimitive <https://gitlab.com/datadrivendiscovery/tests-data/blob/master/primitives/test_primitives/increment.py>`__
  192. is a transformer and shows how to have ``container.ndarray`` as
  193. inputs and outputs, and how to set metadata for outputs
  194. - `SumPrimitive <https://gitlab.com/datadrivendiscovery/tests-data/blob/master/primitives/test_primitives/sum.py>`__
  195. is a transformer as well, but it is just a wrapper around a Docker
  196. image, it shows how to define Docker image in metadata and how to
  197. connect to a running Docker container, moreover, it also shows how
  198. inputs can be a union type of multiple other types
  199. - `RandomPrimitive <https://gitlab.com/datadrivendiscovery/tests-data/blob/master/primitives/test_primitives/random.py>`__
  200. is a generator which shows how to use ``random_seed``, too.

全栈的自动化机器学习系统,主要针对多变量时间序列数据的异常检测。TODS提供了详尽的用于构建基于机器学习的异常检测系统的模块,它们包括:数据处理(data processing),时间序列处理( time series processing),特征分析(feature analysis),检测算法(detection algorithms),和强化模块( reinforcement module)。这些模块所提供的功能包括常见的数据预处理、时间序列数据的平滑或变换,从时域或频域中抽取特征、多种多样的检测算