You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

upload.rst 14 kB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284
  1. .. _submit:
  2. ==========================================
  3. Learnware Preparation and Uoloading
  4. ==========================================
  5. In this section, we provide a comprehensive guide on submitting your custom learnware to the ``Learnware Market``.
  6. We will first discuss the necessary components of a valid learnware, followed by a detailed explanation on how to upload and remove learnwares within ``Learnware Market``.
  7. Prepare Learnware
  8. ====================================
  9. In the ``learnware`` package, each learnware is encapsulated in a ``zip`` package, which should contain at least the following four files:
  10. - ``learnware.yaml``: learnware configuration file.
  11. - ``__init__.py``: methods for using the model.
  12. - ``stat.json``: the statistical specification of the learnware. Its filename can be customized and recorded in learnware.yaml.
  13. - ``environment.yaml`` or ``requirements.txt``: specifies the environment for the model.
  14. To facilitate the construction of a learnware, we provide a `Learnware Template <https://www.bmwu.cloud/static/learnware-template.zip>`_ that you can use as a basis for building your own learnware.
  15. Next, we will provide detailed explanations for the content of these four files.
  16. Model Invocation File ``__init__.py``
  17. -------------------------------------
  18. To ensure that the uploaded learnware can be used by subsequent users, you need to provide interfaces for model fitting ``fit(X, y)``, prediction ``predict(X)``, and fine-tuning ``finetune(X, y)`` in ``__init__.py``. Among these interfaces, only the ```predict(X)``` interface is mandatory, while the others depend on the functionality of your model.
  19. Below is a reference template for the ```__init__.py``` file. Please make sure that the input parameter format (the number of parameters and parameter names) for each interface in your model invocation file matches the template below.
  20. .. code-block:: python
  21. import os
  22. import pickle
  23. import numpy as np
  24. from learnware.model import BaseModel
  25. class MyModel(BaseModel):
  26. def __init__(self):
  27. super(MyModel, self).__init__(input_shape=(37,), output_shape=(1,))
  28. dir_path = os.path.dirname(os.path.abspath(__file__))
  29. model_path = os.path.join(dir_path, "model.pkl")
  30. with open(model_path, "rb") as f:
  31. self.model = pickle.load(f)
  32. def fit(self, X: np.ndarray, y: np.ndarray):
  33. self.model = self.model.fit(X)
  34. def predict(self, X: np.ndarray) -> np.ndarray:
  35. return self.model.predict(X)
  36. def finetune(self, X: np.ndarray, y: np.ndarray):
  37. pass
  38. Please ensure that the ``MyModel`` class inherits from ``BaseModel`` in the ``learnware.model`` module, and specify the class name (e.g., ``MyModel``) in the ``learnware.yaml`` file later.
  39. Input and Output Dimensions
  40. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  41. ``input_shape`` and ``output_shape`` represent the input and output dimensions of the model, respectively. You can refer to the following guidelines when filling them out:
  42. - ``input_shape`` specifies a single input sample's dimension, and ``output_shape`` refers to the model's output dimension for a single sample.
  43. - When the data type being processed is text data, there are no specific requirements for the value of ``input_shape``, and it can be filled in as ``None``.
  44. - When the ``output_shape`` corresponds to tasks with variable outputs (such as object detection, text segmentation, etc.), there are no specific requirements for the value of ``output_shape``, and it can be filled in as ``None``.
  45. - For classification tasks, ``output_shape`` should be (1, ) if the model directly outputs predicted labels, and the sample labels need to start from 0. If the model outputs logits, ``output_shape`` should be specified as the number of classes, i.e., (class_num, ).
  46. File Path
  47. ^^^^^^^^^^^^^^^^^^
  48. If you need to load certain files within the zip package in the ``__init__.py`` file (and any other Python files that may be involved), please follow the method shown in the template above about obtaining the ``model_path``:
  49. - First, obtain the root directory path of the entire package by getting ``dir_path``.
  50. - - Then, based on the specific file's relative location within the package, obtain the specific file's path, ``model_path``.
  51. Module Imports
  52. ^^^^^^^^^^^^^^^^^^
  53. Please note that module imports between Python files within the zip package should be done using **relative imports**. For instance:
  54. .. code-block:: python
  55. from .package_name import *
  56. from .package_name import module_name
  57. Learnware Statistical Specification ``stat.json``
  58. ---------------------------------------------------
  59. A learnware consists of a model and a specification. Therefore, after preparing the model, you need to generate a statistical specification for it. Specifically, using the previously installed ``learnware`` package, you can use the training data ``train_x`` (supported types include numpy.ndarray, pandas.DataFrame, and torch.Tensor) as input to generate the statistical specification of the model.
  60. Here is an example of the code:
  61. .. code-block:: python
  62. from learnware.specification import generate_stat_spec
  63. data_type = "table" # Data types: ["table", "image", "text"]
  64. spec = generate_stat_spec(type=data_type, X=train_x)
  65. spec.save("stat.json")
  66. It's worth noting that the above code only runs on your local computer and does not interact with any cloud servers or leak any local private data.
  67. Additionally, if the model's training data is too large, causing the above code to fail, you can consider sampling the training data to ensure it's of a suitable size before proceeding with reduction generation.
  68. Learnware Configuration File ``learnware.yaml``
  69. -------------------------------------------------
  70. This file is used to specify the class name (``MyModel``) in the model invocation file ``__init__.py``, the module called for generating the statistical specification (``learnware.specification``), the category of the statistical specification (``RKMETableSpecification``), and the specific filename (``stat.json``):
  71. .. code-block:: yaml
  72. model:
  73. class_name: MyModel
  74. kwargs: {}
  75. stat_specifications:
  76. - module_path: learnware.specification
  77. class_name: RKMETableSpecification
  78. file_name: stat.json
  79. kwargs: {}
  80. Please note that the statistical specification class name for different data types ``['table', 'image', 'text']`` is ``[RKMETableSpecification, RKMEImageSpecification, RKMETextSpecification]``, respectively.
  81. Model Runtime Dependent File
  82. --------------------------------------------
  83. To ensure that your uploaded learnware can be used by other users, the ``zip`` package of the uploaded learnware should specify the model's runtime dependencies. The Beimingwu System supports the following two ways to specify runtime dependencies:
  84. - Provide an ``environment.yaml`` file supported by ``conda``.
  85. - Provide a ``requirements.txt`` file supported by ``pip``.
  86. You can choose either method, but please try to remove unnecessary dependencies to keep the dependency list as minimal as possible.
  87. Using ``environment.yaml`` File
  88. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  89. You can export the `environment.yaml` file directly from the `conda` virtual environment using the following command:
  90. - For Linux and macOS systems
  91. .. code-block:: bash
  92. conda env export | grep -v "^prefix: " > environment.yaml
  93. - For Windows systems:
  94. .. code-block:: bash
  95. conda env export | findstr /v "^prefix: " > environment.yaml
  96. Note that the ``environment.yaml`` file in the ``zip`` package needs to be encoded in ``UTF-8`` format. Please check the encoding format of the ``environment.yaml`` file after using the above command. Due to the ``conda`` version and system differences, you may not get a ``UTF-8`` encoded file (e.g. get a ``UTF-16LE`` encoded file). You'll need to manually convert the file to ``UTF-8``, which is supported by most text editors. The following ``Python`` code for encoding conversion is also for reference:
  97. .. code-block:: python
  98. import codecs
  99. # Read the output file from the 'conda env export' command
  100. # Assuming the file name is environment.yaml and the export format is UTF-16LE
  101. with codecs.open('environment.yaml', 'r', encoding='utf-16le') as file:
  102. content = file.read()
  103. # Convert the content to UTF-8 encoding
  104. output_content = content.encode('utf-8')
  105. # Write to UTF-8 encoded file
  106. with open('environment.yaml', 'wb') as file:
  107. file.write(output_content)
  108. Additionally, due to the complexity of users' local ``conda`` virtual environments, you can execute the following command before uploading to confirm that there are no dependency conflicts in the ``environment.yaml`` file:
  109. .. code-block:: bash
  110. conda env create --name test_env --file environment.yaml
  111. The above command will create a virtual environment based on the ``environment.yaml`` file, and if successful, it indicates that there are no dependency conflicts. You can delete the created virtual environment using the following command:
  112. .. code-block:: bash
  113. conda env remove --name test_env
  114. Using `requirements.txt` File
  115. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  116. The ``requirements.txt`` file should list the packages required for running the ``__init__.py`` file and their specific versions. You can obtain these version details by executing the ``pip show <package_name>`` or ``conda list <package_name>`` command. Here is an example file:
  117. .. code-block:: text
  118. numpy==1.23.5
  119. scikit-learn==1.2.2
  120. Manually listing these dependencies can be cumbersome, so you can also use the ``pipreqs`` package to automatically scan your entire project and export the packages used along with their specific versions (though some manual verification may be required):
  121. .. code-block:: bash
  122. pip install pipreqs
  123. pipreqs ./ # Run this command in the project's root directory
  124. Please note that if you use the ``requirements.txt`` file to specify runtime dependencies, the system will by default install these dependencies in a ``conda`` virtual environment running ``Python 3.8`` during the learnware deployment.
  125. Furthermore, for version-sensitive packages like ``torch``, it's essential to specify package versions in the ``requirements.txt`` file to ensure successful deployment of the uploaded learnware on other machines.
  126. Upload Learnware
  127. ==================================
  128. After preparing the four required files mentioned above, you can bundle them into your own learnware ``zip`` package.
  129. Prepare Sematic Specifcation
  130. -----------------------------
  131. The semantic specification succinctly describes the features of your task and model. For uploading learnware ``zip`` package, the user need to prepare the semantic specification. Here is an example of a "Table Data" for a "Classification Task":
  132. .. code-block:: python
  133. from learnware.specification import generate_semantic_spec
  134. # Prepare input description when data_type="Table"
  135. input_description = {
  136. "Dimension": 5,
  137. "Description": {
  138. "0": "age",
  139. "1": "weight",
  140. "2": "body length",
  141. "3": "animal type",
  142. "4": "claw length"
  143. },
  144. }
  145. # Prepare output description when task_type in ["Classification", "Regression"]
  146. output_description = {
  147. "Dimension": 3,
  148. "Description": {
  149. "0": "cat",
  150. "1": "dog",
  151. "2": "bird",
  152. },
  153. }
  154. # Create semantic specification
  155. semantic_spec = generate_semantic_spec(
  156. name="learnware_example",
  157. description="Just an example for uploading learnware",
  158. data_type="Table",
  159. task_type="Classification",
  160. library_type="Scikit-learn",
  161. scenarios=["Business", "Financial"],
  162. input_description=input_description,
  163. output_description=output_description,
  164. )
  165. For more details, please refer to :ref:`semantic specification<components/spec:Semantic Specification>`,
  166. Uploading
  167. --------------
  168. you can effortlessly upload your learnware to the ``Learnware Market`` as follows.
  169. .. code-block:: python
  170. from learnware.market import BaseChecker
  171. from learnware.market import instantiate_learnware_market
  172. # instantiate a demo market
  173. demo_market = instantiate_learnware_market(market_id="demo", name="hetero", rebuild=True)
  174. # upload the learnware into the market
  175. learnware_id, learnware_status = demo_market.add_learnware(zip_path, semantic_spec)
  176. # assert whether the learnware passed the check and was uploaded successfully.
  177. assert learnware_status != BaseChecker.INVALID_LEARNWARE, "Insert learnware failed!"
  178. Here, ``zip_path`` refers to the directory of your learnware ``zip`` package. ``learnware_id`` indicates the id assigned by ``Learnware Market``, and the ``learnware_status`` indicates the check status for learnware.
  179. .. note::
  180. The learnware ``zip`` package uploaded into ``LearnwareMarket`` will be checked semantically and statistically, and ``add_learnware`` will return the concrete check status. The check status ``BaseChecker.INVALID_LEARNWARE`` indicates the learnware did not pass the check. For more details about learnware checker, please refer to `Learnware Market <../components/market.html#easy-checker>`
  181. Remove Learnware
  182. ==================
  183. As administrators of the ``Learnware Market``, it's crucial to remove learnwares that exhibit suspicious uploading motives.
  184. Once you have the necessary permissions and approvals, you can use the following code to remove a learnware
  185. from the ``Learnware Market``:
  186. .. code-block:: python
  187. easy_market.delete_learnware(learnware_id)
  188. Here, ``learnware_id`` refers to the market ID of the learnware to be removed.