beimingwu
/
learnware

.. _submit:
==========================================
Learnware Preparation and Submission
==========================================

In this section, we provide a comprehensive guide on submitting your custom learnware to the Learnware Market.
We will first discuss the necessary components of a valid learnware, followed by a detailed explanation on how to upload and remove learnwares within ``Learnware Market``.


Prepare Learnware
====================

A valid learnware is encapsulated in a zipfile, comprising four essential components.
Below, we illustrate the detailed structure of a learnware zipfile.

``__init__.py``
---------------

Within ``Learnware Market``, every uploader must provide a unified set of interfaces for their model, 
facilitating easy utilization for future users.
The ``__init__.py`` file serves as the Python interface for your model's fitting, prediction, and fine-tuning processes.
For example, the code snippet below is used to train and save a SVM model for a sample dataset on sklearn digits classification:

.. code-block:: python

    import joblib
    from sklearn.datasets import load_digits
    from sklearn.model_selection import train_test_split

    X, y = load_digits(return_X_y=True) 
    data_X, _, data_y, _ = train_test_split(X, y, test_size=0.3, shuffle=True)

    # input dimension: (64, ), output dimension: (10, )
    clf = svm.SVC(kernel="linear", probability=True)
    clf.fit(data_X, data_y)

    joblib.dump(clf, "svm.pkl") # model is stored as file "svm.pkl"


Then the corresponding ``__init__.py`` for this SVM model should be structured as follows:

.. code-block:: python
    
    import os
    import joblib
    import numpy as np
    from learnware.model import BaseModel


    class SVM(BaseModel):
        def __init__(self):
            super(SVM, self).__init__(input_shape=(64,), output_shape=(10,))
            dir_path = os.path.dirname(os.path.abspath(__file__))
            self.model = joblib.load(os.path.join(dir_path, "svm.pkl"))

        def fit(self, X: np.ndarray, y: np.ndarray):
            pass

        def predict(self, X: np.ndarray) -> np.ndarray:
            return self.model.predict_proba(X)

        def finetune(self, X: np.ndarray, y: np.ndarray):
            pass
    
Please remember to specify the ``input_shape`` and ``output_shape`` corresponding to your model. 
In our sklearn digits classification example, these would be (64,) and (10,) respectively.


``stat.json``
-------------

To accurately and effectively match users with appropriate learnwares for their tasks, we require information about your training dataset.
Specifically, you are required to provide a statistical specification 
stored as a json file, such as ``stat.json``, which contains the statistical information of the dataset. 
This json file meets all our requirements regarding your training data, so you don't need to upload the local original data.

There are various methods to generate a statistical specification.
If you choose to use Reduced Kernel Mean Embedding (RKME) as your statistical specification, 
the following code snippet offers guidance on how to construct and store the RKME of a dataset:

.. code-block:: python
    
    from learnware.specification import generate_rkme_spec
    
    # generate rkme specification for digits dataset
    spec = generate_rkme_spec(X=data_X)
    spec.save("stat.json")

Significantly, the RKME generation process is entirely conducted on your local machine, without any involvement of cloud services, 
guaranteeing the security and privacy of your local original data.


``learnware.yaml``
------------------

Additionally, you are asked to prepare a configuration file in YAML format.
The file should detail your model's class name, the type of statistical specification(e.g. Reduced Kernel Mean Embedding, ``RKMETableSpecification``), and 
the file name of your statistical specification file. The following ``learnware.yaml`` provides an example of
how your learnware configuration file should be structured, based on our previous discussion:

.. code-block:: yaml

    model:
      class_name: SVM
      kwargs: {}
    stat_specifications:
      - module_path: learnware.specification
        class_name: RKMETableSpecification
        file_name: stat.json
        kwargs: {}  


``environment.yaml`` or ``requirements.txt``
--------------------------------------------

In order to allow others to execute your learnware, it's necessary to specify your model's dependencies. 
You can do this by providing either an ``environment.yaml`` file or a ``requirements.txt`` file.


- ``environment.yaml`` for conda:

   If you provide an ``environment.yaml``, a new conda environment will be created based on this file 
   when users install your learnware. You can generate this yaml file using the following command:
   
   - For Windows users:

    .. code-block::

        conda env export | findstr /v "^prefix: " > environment.yaml

   - For macOS and Linux users:

    .. code-block::

        conda env export | grep -v "^prefix: " > environment.yaml

- ``requirements.txt`` for pip:

    If you provide a ``requirements.txt``, the dependent packages will be installed using the `-r` option of pip.
    You can find more information about ``requirements.txt`` in 
    `pip documentation <https://pip.pypa.io/en/stable/user_guide/#requirements-files>`_.
    
        
We recommend using ``environment.yaml`` as it can help minimize conflicts between different packages.

.. note::
    Whether you choose to use ``environment.yaml`` or ``requirements.txt``, 
    it's important to keep your dependencies as minimal as possible. 
    This may involve manually opening the file and removing any unnecessary packages.


Upload Learnware 
==================

After preparing the four required files mentioned above, 
you can bundle them into your own learnware zipfile. Along with the generated semantic specification that 
succinctly describes the features of your task and model (for more details, please refer to :ref:`semantic specification<components/spec:Semantic Specification>`), 
you can effortlessly upload your learnware to the ``Learnware Market`` using a single line of code:

.. code-block:: python

    import learnware
    from learnware.market import EasyMarket

    learnware.init()
    
    # EasyMarket: most basic set of functions in a Learnware Market
    easy_market = EasyMarket(market_id="demo", rebuild=True) 
    
    # single line uploading
    easy_market.add_learnware(zip_path, semantic_spec) 

Here, ``zip_path`` refers to the directory of your learnware zipfile.


Remove Learnware
==================

As administrators of the ``Learnware Market``, it's crucial to remove learnwares that exhibit suspicious uploading motives.
Once you have the necessary permissions and approvals, you can use the following code to remove a learnware 
from the ``Learnware Market``:

.. code-block:: python

    easy_market.delete_learnware(learnware_id)

Here,  ``learnware_id`` refers to the market ID of the learnware to be removed.