You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

submit.rst 7.1 kB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187
  1. .. _submit:
  2. ==========================================
  3. Learnware Preparation and Submission
  4. ==========================================
  5. In this section, we provide a comprehensive guide on submitting your custom learnware to the Learnware Market.
  6. We will first discuss the necessary components of a valid learnware, followed by a detailed explanation on how to upload and remove learnwares within ``Learnware Market``.
  7. Prepare Learnware
  8. ====================
  9. A valid learnware is encapsulated in a zipfile, comprising four essential components.
  10. Below, we illustrate the detailed structure of a learnware zipfile.
  11. ``__init__.py``
  12. ---------------
  13. Within ``Learnware Market``, every uploader must provide a unified set of interfaces for their model,
  14. facilitating easy utilization for future users.
  15. The ``__init__.py`` file serves as the Python interface for your model's fitting, prediction, and fine-tuning processes.
  16. For example, the code snippet below is used to train and save a SVM model for a sample dataset on sklearn digits classification:
  17. .. code-block:: python
  18. import joblib
  19. from sklearn.datasets import load_digits
  20. from sklearn.model_selection import train_test_split
  21. X, y = load_digits(return_X_y=True)
  22. data_X, _, data_y, _ = train_test_split(X, y, test_size=0.3, shuffle=True)
  23. # input dimension: (64, ), output dimension: (10, )
  24. clf = svm.SVC(kernel="linear", probability=True)
  25. clf.fit(data_X, data_y)
  26. joblib.dump(clf, "svm.pkl") # model is stored as file "svm.pkl"
  27. Then the corresponding ``__init__.py`` for this SVM model should be structured as follows:
  28. .. code-block:: python
  29. import os
  30. import joblib
  31. import numpy as np
  32. from learnware.model import BaseModel
  33. class SVM(BaseModel):
  34. def __init__(self):
  35. super(SVM, self).__init__(input_shape=(64,), output_shape=(10,))
  36. dir_path = os.path.dirname(os.path.abspath(__file__))
  37. self.model = joblib.load(os.path.join(dir_path, "svm.pkl"))
  38. def fit(self, X: np.ndarray, y: np.ndarray):
  39. pass
  40. def predict(self, X: np.ndarray) -> np.ndarray:
  41. return self.model.predict_proba(X)
  42. def finetune(self, X: np.ndarray, y: np.ndarray):
  43. pass
  44. Please remember to specify the ``input_shape`` and ``output_shape`` corresponding to your model.
  45. In our sklearn digits classification example, these would be (64,) and (10,) respectively.
  46. ``stat.json``
  47. -------------
  48. To accurately and effectively match users with appropriate learnwares for their tasks, we require information about your training dataset.
  49. Specifically, you are required to provide a statistical specification
  50. stored as a json file, such as ``stat.json``, which contains the statistical information of the dataset.
  51. This json file meets all our requirements regarding your training data, so you don't need to upload the local original data.
  52. There are various methods to generate a statistical specification.
  53. If you choose to use Reduced Kernel Mean Embedding (RKME) as your statistical specification,
  54. the following code snippet offers guidance on how to construct and store the RKME of a dataset:
  55. .. code-block:: python
  56. from learnware.specification import generate_rkme_spec
  57. # generate rkme specification for digits dataset
  58. spec = generate_rkme_spec(X=data_X)
  59. spec.save("stat.json")
  60. Significantly, the RKME generation process is entirely conducted on your local machine, without any involvement of cloud services,
  61. guaranteeing the security and privacy of your local original data.
  62. ``learnware.yaml``
  63. ------------------
  64. Additionally, you are asked to prepare a configuration file in YAML format.
  65. The file should detail your model's class name, the type of statistical specification(e.g. Reduced Kernel Mean Embedding, ``RKMETableSpecification``), and
  66. the file name of your statistical specification file. The following ``learnware.yaml`` provides an example of
  67. how your learnware configuration file should be structured, based on our previous discussion:
  68. .. code-block:: yaml
  69. model:
  70. class_name: SVM
  71. kwargs: {}
  72. stat_specifications:
  73. - module_path: learnware.specification
  74. class_name: RKMETableSpecification
  75. file_name: stat.json
  76. kwargs: {}
  77. ``environment.yaml`` or ``requirements.txt``
  78. --------------------------------------------
  79. In order to allow others to execute your learnware, it's necessary to specify your model's dependencies.
  80. You can do this by providing either an ``environment.yaml`` file or a ``requirements.txt`` file.
  81. - ``environment.yaml`` for conda:
  82. If you provide an ``environment.yaml``, a new conda environment will be created based on this file
  83. when users install your learnware. You can generate this yaml file using the following command:
  84. - For Windows users:
  85. .. code-block::
  86. conda env export | findstr /v "^prefix: " > environment.yaml
  87. - For macOS and Linux users:
  88. .. code-block::
  89. conda env export | grep -v "^prefix: " > environment.yaml
  90. - ``requirements.txt`` for pip:
  91. If you provide a ``requirements.txt``, the dependent packages will be installed using the `-r` option of pip.
  92. You can find more information about ``requirements.txt`` in
  93. `pip documentation <https://pip.pypa.io/en/stable/user_guide/#requirements-files>`_.
  94. We recommend using ``environment.yaml`` as it can help minimize conflicts between different packages.
  95. .. note::
  96. Whether you choose to use ``environment.yaml`` or ``requirements.txt``,
  97. it's important to keep your dependencies as minimal as possible.
  98. This may involve manually opening the file and removing any unnecessary packages.
  99. Upload Learnware
  100. ==================
  101. After preparing the four required files mentioned above,
  102. you can bundle them into your own learnware zipfile. Along with the generated semantic specification that
  103. succinctly describes the features of your task and model (for more details, please refer to :ref:`semantic specification<components/spec:Semantic Specification>`),
  104. you can effortlessly upload your learnware to the ``Learnware Market`` using a single line of code:
  105. .. code-block:: python
  106. import learnware
  107. from learnware.market import EasyMarket
  108. learnware.init()
  109. # EasyMarket: most basic set of functions in a Learnware Market
  110. easy_market = EasyMarket(market_id="demo", rebuild=True)
  111. # single line uploading
  112. easy_market.add_learnware(zip_path, semantic_spec)
  113. Here, ``zip_path`` refers to the directory of your learnware zipfile.
  114. Remove Learnware
  115. ==================
  116. As administrators of the ``Learnware Market``, it's crucial to remove learnwares that exhibit suspicious uploading motives.
  117. Once you have the necessary permissions and approvals, you can use the following code to remove a learnware
  118. from the ``Learnware Market``:
  119. .. code-block:: python
  120. easy_market.delete_learnware(learnware_id)
  121. Here, ``learnware_id`` refers to the market ID of the learnware to be removed.