beimingwu
/
learnware

 
			
							.. _submit:
==========================================
Learnware Preparation and Uploading
==========================================

In this section, we provide a comprehensive guide on submitting your custom learnware to the ``Learnware Market``.
We will first discuss the necessary components of a valid learnware, followed by a detailed explanation on how to upload and remove learnwares within ``Learnware Market``.


Prepare Learnware
====================================

In the ``learnware`` package, each learnware is encapsulated in a ``zip`` package, which should contain at least the following four files:

- ``learnware.yaml``: learnware configuration file.
- ``__init__.py``: methods for using the model.
- ``stat.json``: the statistical specification of the learnware. Its filename can be customized and recorded in learnware.yaml.
- ``environment.yaml`` or ``requirements.txt``: specifies the environment for the model.

To facilitate the construction of a learnware, we provide a `Learnware Template <https://www.bmwu.cloud/static/learnware-template.zip>`_ that you can use as a basis for building your own learnware.

Next, we will provide detailed explanations for the content of these four files.

Model Invocation File ``__init__.py``
-------------------------------------

To ensure that the uploaded learnware can be used by subsequent users, you need to provide interfaces for model fitting ``fit(X, y)``, prediction ``predict(X)``, and fine-tuning ``finetune(X, y)`` in ``__init__.py``. Among these interfaces, only the ```predict(X)``` interface is mandatory, while the others depend on the functionality of your model. 

Below is a reference template for the ``__init__.py`` file. Please make sure that the input parameter format (the number of parameters and parameter names) for each interface in your model invocation file matches the template below.

.. code-block:: python

    import os
    import pickle
    import numpy as np
    from learnware.model import BaseModel

    class MyModel(BaseModel):
        def __init__(self):
            super(MyModel, self).__init__(input_shape=(37,), output_shape=(1,))
            dir_path = os.path.dirname(os.path.abspath(__file__))
            model_path = os.path.join(dir_path, "model.pkl")
            with open(model_path, "rb") as f:
                self.model = pickle.load(f)

        def fit(self, X: np.ndarray, y: np.ndarray):
            self.model = self.model.fit(X)

        def predict(self, X: np.ndarray) -> np.ndarray:
            return self.model.predict(X)

        def finetune(self, X: np.ndarray, y: np.ndarray):
            pass


Please ensure that the ``MyModel`` class inherits from ``BaseModel`` in the ``learnware.model`` module, and specify the class name (e.g., ``MyModel``) in the ``learnware.yaml`` file later. 

Input and Output Dimensions
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

``input_shape`` and ``output_shape`` represent the input and output dimensions of the model, respectively. You can refer to the following guidelines when filling them out:
  - ``input_shape`` specifies a single input sample's dimension, and ``output_shape`` refers to the model's output dimension for a single sample.
  - When the data type being processed is text data, there are no specific requirements for the value of ``input_shape``, and it can be filled in as ``None``.
  - When the ``output_shape`` corresponds to tasks with variable outputs (such as object detection, text segmentation, etc.), there are no specific requirements for the value of ``output_shape``, and it can be filled in as ``None``.
  - For classification tasks, ``output_shape`` should be (1, ) if the model directly outputs predicted labels, and the sample labels need to start from 0. If the model outputs logits, ``output_shape`` should be specified as the number of classes, i.e., (class_num, ).

File Path
^^^^^^^^^^^^^^^^^^
If you need to load certain files within the ``zip`` package in the ``__init__.py`` file (and any other Python files that may be involved), please follow the method shown in the template above about obtaining the ``model_path``:
  - First, obtain the root directory path of the entire package by getting ``dir_path``.
  - Then, based on the specific file's relative location within the package, obtain the specific file's path, ``model_path``.

Module Imports
^^^^^^^^^^^^^^^^^^
Please note that module imports between Python files within the ``zip`` package should be done using **relative imports**. For instance:

.. code-block:: python

    from .package_name import *
    from .package_name import module_name


Learnware Statistical Specification ``stat.json``
---------------------------------------------------

A learnware consists of a model and a specification. Therefore, after preparing the model, you need to generate a statistical specification for it. Specifically, using the previously installed ``learnware`` package, you can use the training data ``train_x`` (supported types include numpy.ndarray, pandas.DataFrame, and torch.Tensor) as input to generate the statistical specification of the model.

Here is an example of the code:

.. code-block:: python

    from learnware.specification import generate_stat_spec

    data_type = "table" # Data types: ["table", "image", "text"]
    spec = generate_stat_spec(type=data_type, X=train_x)
    spec.save("stat.json")

It's worth noting that the above code only runs on your local computer and does not interact with any cloud servers or leak any local private data.

Additionally, if the model's training data is too large, causing the above code to fail, you can consider sampling the training data to ensure it's of a suitable size before proceeding with reduction generation.


Learnware Configuration File ``learnware.yaml``
-------------------------------------------------

This file is used to specify the class name (``MyModel``) in the model invocation file ``__init__.py``, the module called for generating the statistical specification (``learnware.specification``), the category of the statistical specification (``RKMETableSpecification``), and the specific filename (``stat.json``):

.. code-block:: yaml

    model:
      class_name: MyModel
      kwargs: {}
    stat_specifications:
      - module_path: learnware.specification
        class_name: RKMETableSpecification
        file_name: stat.json
        kwargs: {}

Please note that the statistical specification class name for different data types ``['table', 'image', 'text']`` is ``[RKMETableSpecification, RKMEImageSpecification, RKMETextSpecification]``, respectively. ``kwargs`` are reserved ports and do not need to be entered.

Model Runtime Dependent File
--------------------------------------------

To ensure that your uploaded learnware can be used by other users, the ``zip`` package of the uploaded learnware should specify the model's runtime dependencies. The Beimingwu System supports the following two ways to specify runtime dependencies:
  - Provide an ``environment.yaml`` file supported by ``conda``.
  - Provide a ``requirements.txt`` file supported by ``pip``.

You can choose either method, but please try to remove unnecessary dependencies to keep the dependency list as minimal as possible.

Using ``environment.yaml`` File
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

You can export the `environment.yaml` file directly from the `conda` virtual environment using the following command:

- For Linux and macOS systems

.. code-block:: bash
    
    conda env export | grep -v "^prefix: " > environment.yaml

- For Windows systems:

.. code-block:: bash
    
    conda env export | findstr /v "^prefix: " > environment.yaml

Note that the ``environment.yaml`` file in the ``zip`` package needs to be encoded in ``UTF-8`` format. Please check the encoding format of the ``environment.yaml`` file after using the above command. Due to the ``conda`` version and system differences, you may not get a ``UTF-8`` encoded file (e.g. get a ``UTF-16LE`` encoded file). You'll need to manually convert the file to ``UTF-8``, which is supported by most text editors. The following ``Python`` code for encoding conversion is also for reference:

.. code-block:: python

    import codecs

    # Read the output file from the 'conda env export' command
    # Assuming the file name is environment.yaml and the export format is UTF-16LE
    with codecs.open('environment.yaml', 'r', encoding='utf-16le') as file:
        content = file.read()

    # Convert the content to UTF-8 encoding
    output_content = content.encode('utf-8')

    # Write to UTF-8 encoded file
    with open('environment.yaml', 'wb') as file:
        file.write(output_content)


Additionally, due to the complexity of users' local ``conda`` virtual environments, you can execute the following command before uploading to confirm that there are no dependency conflicts in the ``environment.yaml`` file:

.. code-block:: bash
    
    conda env create --name test_env --file environment.yaml

The above command will create a virtual environment based on the ``environment.yaml`` file, and if successful, it indicates that there are no dependency conflicts. You can delete the created virtual environment using the following command:

.. code-block:: bash

    conda env remove --name test_env

Using `requirements.txt` File
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The ``requirements.txt`` file should list the packages required for running the ``__init__.py`` file and their specific versions. You can obtain these version details by executing the ``pip show <package_name>`` or ``conda list <package_name>`` command. Here is an example file:

.. code-block:: text

    numpy==1.23.5
    scikit-learn==1.2.2

Manually listing these dependencies can be cumbersome, so you can also use the ``pipreqs`` package to automatically scan your entire project and export the packages used along with their specific versions (though some manual verification may be required):

.. code-block:: bash

    pip install pipreqs
    pipreqs ./  # Run this command in the project's root directory

Please note that if you use the ``requirements.txt`` file to specify runtime dependencies, the system will by default install these dependencies in a ``conda`` virtual environment running ``Python 3.8`` during the learnware deployment.

Furthermore, for version-sensitive packages like ``torch``, it's essential to specify package versions in the ``requirements.txt`` file to ensure successful deployment of the uploaded learnware on other machines.

Upload Learnware
==================================

After preparing the four required files mentioned above, you can bundle them into your own learnware ``zip`` package.

Prepare Sematic Specifcation
-----------------------------

The semantic specification succinctly describes the features of your task and model. For uploading learnware ``zip`` package, the user need to prepare the semantic specification. Here is an example of a "Table Data" for a "Classification Task":

.. code-block:: python

    from learnware.specification import generate_semantic_spec

    # Prepare input description when data_type="Table"
    input_description = {
        "Dimension": 5,
        "Description": {
            "0": "age",
            "1": "weight",
            "2": "body length",
            "3": "animal type",
            "4": "claw length"
        },
    }

    # Prepare output description when task_type in ["Classification", "Regression"]
    output_description = {
        "Dimension": 3,
        "Description": {
            "0": "cat",
            "1": "dog",
            "2": "bird",
        },
    }

    # Create semantic specification
    semantic_spec = generate_semantic_spec(
        name="learnware_example",
        description="Just an example for uploading learnware",
        data_type="Table",
        task_type="Classification",
        library_type="Scikit-learn",
        scenarios=["Business", "Financial"],
        license="MIT",
        input_description=input_description,
        output_description=output_description,
    )

For more details, please refer to :ref:`semantic specification<components/spec:Semantic Specification>`, 

Uploading
--------------

You can effortlessly upload your learnware to the ``Learnware Market`` as follows.

.. code-block:: python

    from learnware.market import BaseChecker
    from learnware.market import instantiate_learnware_market

    # instantiate a demo market
    demo_market = instantiate_learnware_market(market_id="demo", name="hetero", rebuild=True) 

    # upload the learnware into the market
    learnware_id, learnware_status = demo_market.add_learnware(zip_path, semantic_spec) 
    
    # assert whether the learnware passed the check and was uploaded successfully.
    assert learnware_status != BaseChecker.INVALID_LEARNWARE, "Insert learnware failed!"

Here, ``zip_path`` refers to the directory of your learnware ``zip`` package. ``learnware_id`` indicates the id assigned by ``Learnware Market``, and the ``learnware_status`` indicates the check status for learnware.

.. note:: 
    The learnware ``zip`` package uploaded into ``LearnwareMarket`` will be checked semantically and statistically, and ``add_learnware`` will return the concrete check status. The check status ``BaseChecker.INVALID_LEARNWARE`` indicates the learnware did not pass the check. For more details about learnware checker, please refer to `Learnware Market <../components/market.html#easy-checker>`_

Remove Learnware
==================

As administrators of the ``Learnware Market``, it's crucial to remove learnwares that exhibit suspicious uploading motives.
Once you have the necessary permissions and approvals, you can use the following code to remove a learnware 
from the ``Learnware Market``:

.. code-block:: python

    demo_market.delete_learnware(learnware_id)

Here,  ``learnware_id`` is a string that refers to the market ID of the learnware to be removed.