Browse Source

[DOC] modify details in docs

tags/v0.3.2
liuht 2 years ago
parent
commit
0b28701494
3 changed files with 72 additions and 71 deletions
  1. +23
    -22
      docs/components/learnware.rst
  2. +35
    -35
      docs/components/market.rst
  3. +14
    -14
      docs/components/spec.rst

+ 23
- 22
docs/components/learnware.rst View File

@@ -4,31 +4,32 @@
Learnware & Reuser
==========================================

``Learnware`` is the most basic concept in the ``learnware paradigm``. In this section, we will introduce the concept and design of ``Learnware`` and its extension for ``Hetero Reuse``. Then we will introduce the ``Reuse Methods``, which applies one or several ``Learnware``\ s to solve the user's task.
``Learnware`` is the most basic concept in the ``learnware paradigm``. This section will introduce the concept and design of ``Learnware`` and its extension for ``Hetero Reuse``. Then, we will introduce the ``Reuse Methods``, which applies one or several ``Learnware``\ s to solve the user's task.

Concepts
===================
In the learnware paradigm, a learnware is a well-performed trained machine learning model with a specification which enables it to be adequately identified to reuse according to the requirement of future users who know nothing about the learnware in advance. The introduction of specifications are shown in `COMPONENTS: Specification <./spec.html>`_.
In the learnware paradigm, a learnware is a well-performed trained machine learning model with a specification that enables it to be adequately identified for reuse according to the requirement of future users who know nothing about the learnware in advance. Specifications are introduced in `COMPONENTS: Specification <./spec.html>`_.

In our implementation, the class ``Learnware`` has 3 important member variables:
In our implementation, the class ``Learnware`` has three important member variables:

- ``id``: The learnware id is generated by ``market``.
- ``model``: The model in the learnware, can be a ``BaseModel`` or a dict including model name and path. When it is a dict, the function ``Learnware.instantiate_model`` is used to transform it to a ``BaseModel``. The function ``Learnware.predict`` use the model to predict for an input ``X``. See more in `COMPONENTS: Model <./model.html>`_.
- ``specification``: The specification including the semantic specification and the statistic specification.
- ``model``: The model in the learnware, can be a ``BaseModel`` or a dict including model name and path. When it is a dict, the function ``Learnware.instantiate_model`` is used to transform it to a ``BaseModel``. The function ``Learnware.predict`` uses the model to predict for an input ``X``. See more in `COMPONENTS: Model <./model.html>`_.
- ``specification``: The specification includes the semantic specification and the statistical specification.

Learnware for Hetero Reuse
=======================================================================

In the Hetero Market(see `COMPONENTS: Hetero Market <./market.html#hetero-market>`_ for details), ``HeteroSearcher`` identifies and recommends helpful learnwares among all learnwares in the market,
including learnwares with feature/label spaces different from the user's task requirements(heterogeneous learnwares). ``FeatureAlignLearnware`` and ``HeteroMapLearnware``
are designed to enable the reuse of heterogeneous learnwares, which extends ``Learnware`` with the ability to align the feature space and label space of the learnware to the user's task requirements,
and provide basic interfaces for heterogeneous learnwares to be applied to tasks beyond their original purposes.
In the Hetero Market (refer to `COMPONENTS: Hetero Market <./market.html#hetero-market>`_ for more details), ``HeteroSearcher`` identifies and recommends valuable learnwares from the entire market. This includes learnwares with different feature/label spaces compared to the user's task requirements, known as "heterogeneous learnwares."

To enable the reuse of these heterogeneous learnwares, we have developed ``FeatureAlignLearnware`` and ``HeteroMapLearnware``.
These components expand the capabilities of standard ``Learnware`` by aligning the feature and label spaces to match the user's task requirements.
They also provide essential interfaces for effectively applying heterogeneous learnwares to tasks beyond their original purposes.

``FeatureAlignLearnware``
---------------------------

``FeatureAlignLearnware`` employs a neural network to align the feature space of the learnware to the user's task.
It is initialized with a ``Learnware``, and has the following methods to expand the applicable scope of this ``Learnware``:
It is initialized with a ``Learnware`` and has the following methods to expand the applicable scope of this ``Learnware``:

- **align**: Trains a neural network to align ``user_rkme``, which is the ``RKMETableSpecification`` of the user's data, with the learnware's statistical specification.
- **predict**: Predict the output for user data using the trained neural network and the original learnware's model.
@@ -39,7 +40,7 @@ It is initialized with a ``Learnware``, and has the following methods to expand

If user data is not only heterogeneous in feature space but also in label space, ``HeteroMapAlignLearnware`` uses the help of
a small amount of labeled data ``(x_train, y_train)`` required from the user task to align heterogeneous learnwares with the user task.
There are two key interfaces in ``HeteroMapAlignLearnware``:
There are two critical interfaces in ``HeteroMapAlignLearnware``:

- ``HeteroMapAlignLearnware.align(self, user_rkme: RKMETableSpecification, x_train: np.ndarray, y_train: np.ndarray)``

@@ -48,7 +49,7 @@ There are two key interfaces in ``HeteroMapAlignLearnware``:

- ``HeteroMapAlignLearnware.predict(self, user_data)``

- If input space and output space alignment are both performed, use the ``FeatureAugmentReuser`` to predict the output for ``user_data``.
- If input space and output space alignment are performed, use the ``FeatureAugmentReuser`` to predict the output for ``user_data``.


All Reuse Methods
@@ -56,7 +57,7 @@ All Reuse Methods

In addition to applying ``Learnware``, ``FeatureAlignLearnware`` or ``HeteroMapAlignLearnware`` objects directly by calling their ``predict`` interface,
the ``learnware`` package also provides a set of ``Reuse Methods`` for users to further customize a single or multiple learnwares, with the hope of enabling learnwares to be
helpful beyond their original purposes, and eliminating the need for users to build models from scratch.
helpful beyond their original purposes and eliminating the need for users to build models from scratch.

There are two main categories of ``Reuse Methods``: (1) direct reuse and (2) reuse based on a small amount of labeled data.

@@ -87,7 +88,7 @@ The most important methods of ``JobSelectorReuser`` are ``job_selector`` and ``p
- Estimate the mixture weight based on user raw data and the statistical specifications of learnwares in ``learnware_list``
- Use the mixture weight to generate ``herding_num`` auxiliary data points which mimic the user task's distribution through the kernel herding method
- Finally learns the ``job selector`` on the auxiliary data points.
- Finally, it learns the ``job selector`` on the auxiliary data points.
- **predict**: The ``job selector`` is essentially a multi-class classifier :math:`g(\boldsymbol{x}):\mathcal{X}\rightarrow \mathcal{I}` with :math:`\mathcal{I}=\{1,\ldots, C\}`, where :math:`C` is the size of ``learnware_list``. Given a testing sample :math:`\boldsymbol{x}`, the ``JobSelectorReuser`` predicts it by using the :math:`g(\boldsymbol{x})`-th learnware in ``learnware_list``.

@@ -95,7 +96,7 @@ The most important methods of ``JobSelectorReuser`` are ``job_selector`` and ``p
AveragingReuser
^^^^^^^^^^^^^^^^^^

``AveragingReuser`` uses an ensemble method to make predictions. It is initialized with a list of ``Learnware`` objects, and has a member variable ``mode`` which
``AveragingReuser`` uses an ensemble method to make predictions. It is initialized with a list of ``Learnware`` objects and has a member variable ``mode`` which
specifies the ensemble method(default is set to ``mean``).

- **predict**: The member variable ``mode`` provides different options for classification and regression tasks:
@@ -108,28 +109,28 @@ Reuse Learnware with Labeled Data
----------------------------------

When users have a small amount of labeled data available, the ``learnware`` package provides two methods: ``EnsemblePruningReuser`` and ``FeatureAugmentReuser`` to help reuse learnwares.
They are both initialized with a list of ``Learnware`` objects ``learnware_list``, and have different implementations of ``fit`` and ``predict`` methods.
They are both initialized with a list of ``Learnware`` objects ``learnware_list`` and have different implementations of ``fit`` and ``predict`` methods.

EnsemblePruningReuser
^^^^^^^^^^^^^^^^^^^^^^

The ``EnsemblePruningReuser`` class implements a selective ensemble approach inspired by the MDEP algorithm, as detailed in [1]_.
It selects a subset of learnwares from ``learnware_list``, utilizing user's labeled data for effective ensemble integration on user tasks.
The ``EnsemblePruningReuser`` class implements a selective ensemble approach inspired by the MDEP algorithm [1]_.
It selects a subset of learnwares from ``learnware_list``, utilizing the user's labeled data for effective ensemble integration on user tasks.
This method effectively balances validation error, margin ratio, and ensemble size, leading to a robust and optimized selection of learnwares for task-specific ensemble creation.

- **fit**: Effectively prunes the large set of learnwares ``learnware_list`` by evaluating and comparing the learnwares based on their performance on user's labeled validation data ``(val_X, val_y)``. Returns the most suitable subset of learnwares.
- **predict**: The ``mode`` member variable has two available options. Set ``mode`` to ``regression`` for regression tasks, and ``classification`` for classification tasks. The prediction is the average of the selected learnwares' outputs.
- **predict**: The ``mode`` member variable has two available options. Set ``mode`` to ``regression`` for regression tasks and ``classification`` for classification tasks. The prediction is the average of the selected learnwares' outputs.


FeatureAugmentReuser
^^^^^^^^^^^^^^^^^^^^^^

``FeatureAugmentReuser`` helps users reuse learnwares by augmenting features. In this method,
outputs of the learnwares from ``learnware_list`` on user's validation data ``val_X`` are taken as augmented features and are concatenated with original features ``val_X``.
The augmented data(concatenated features combined with validation labels ``val_y``) are then used to train a simple model ``augment_reuser`` which gives the final prediction
outputs of the learnwares from ``learnware_list`` on the user's validation data ``val_X`` are taken as augmented features and are concatenated with original features ``val_X``.
The augmented data(concatenated features combined with validation labels ``val_y``) are then used to train a simple model ``augment_reuser``, which gives the final prediction
on ``user_data``.

- **fit**: Trains the ``augment_reuser`` using augmented user validation data. For classification tasks, ``mode`` should be set to ``classification``, and ``augment_reuser`` is a ``LogisticRegression`` model. For regression tasks, mode should be set to ``classification``, and ``augment_reuser`` is a ``RidgeCV`` model.
- **fit**: Trains the ``augment_reuser`` using augmented user validation data. For classification tasks, ``mode`` should be set to ``classification``, and ``augment_reuser`` is a ``LogisticRegression`` model. For regression tasks, the mode should be set to ``classification``, and ``augment_reuser`` is a ``RidgeCV`` model.


References


+ 35
- 35
docs/components/market.rst View File

@@ -6,49 +6,49 @@ Learnware Market

The ``Learnware Market`` receives high-performance machine learning models from developers, incorporates them into the system, and provides services to users by identifying and reusing learnware to help users solve current tasks. Developers voluntarily submit various learnwares to the learnware market, and the market conducts quality checks and further organization of these learnwares. When users submit task requirements, the learnware market automatically selects whether to recommend a single learnware or a combination of multiple learnwares.

The ``Learnware Market`` will receive various kinds of learnwares, and learnwares from different feature/label spaces form numerous islands of specifications. All these islands together constitute the ``specification world`` in the learnware market. The market should discover and establish connections between different islands, and then merge them into a unified specification world. This further organization of learnwares support search learnwares among all learnwares, not just among learnwares which has the same feature space and label space with the user's task requirements.
The ``Learnware Market`` will receive various kinds of learnwares, and learnwares from different feature/label spaces form numerous islands of specifications. All these islands constitute the ``specification world`` in the learnware market. The market should discover and establish connections between different islands and merge them into a unified specification world. This further organization of learnwares supports search learnwares among all learnwares, not just among learnwares that have the same feature space and label space with the user's task requirements.

Framework
======================================

The ``Learnware Market`` is combined with a ``organizer``, a ``searcher``, and a list of ``checker``\ s.

The ``organizer`` can store and organize learnwares in the market. It supports ``add``, ``delete``, and ``update`` operations for learnwares. It also provides the interface for ``searcher`` to search learnwares based on user requirement.
The ``organizer`` can store and organize learnwares in the market. It supports ``add``, ``delete``, and ``update`` operations for learnwares. It also provides the interface for the ``searcher`` to search learnwares based on user requirements.

The ``searcher`` can search learnwares based on user requirement. The implementation of ``searcher`` is dependent on the concrete implementation and interface for ``organizer``, where usually an ``organizer`` can be compatible with multiple different ``searcher``\ s.
The ``searcher`` can search learnwares based on user requirements. The implementation of ``searcher`` depends on the concrete implementation and interface for ``organizer``, where usually an ``organizer`` can be compatible with multiple different ``searcher``\ s.

The ``checker`` is used for checking the learnware in some standards. It should check the utility of a learnware and is supposed to return the status and a message related to the learnware's check result. Only the learnwares who passed the ``checker`` could be able to be stored and added into the ``Learnware Market``.
The ``checker`` is used for checking the learnware in some standards. It should check the utility of a learnware and return the status and a message related to the learnware's check result. Only the learnwares who passed the ``checker`` could be able to be stored and added into the ``Learnware Market``.



Current Checkers
======================================

The ``learnware`` package provide two different implementation of ``market`` where both of them share the same ``checker`` list. So we first introduce the details of ``checker``\ s.
The ``learnware`` package provides two different implementations of ``Learnware Market`` where both share the same ``checker`` list. So we first introduce the details of ``checker``\ s.

The ``checker``s check a learnware object in different aspects, including environment configuration (``CondaChecker``), semantic specifications (``EasySemanticChecker``), and statistical specifications (``EasyStatChecker``). The ``__call__`` method of each checker is designed to be invoked as a function to conduct the respective checks on the learnware and return the outcomes. It defines three types of learnwares: ``INVALID_LEARNWARE`` denotes the learnware does not pass the check, ``NONUSABLE_LEARNWARE`` denotes the learnware pass the check but cannot make prediction, ``USABLE_LEARNWARE`` denotes the leanrware pass the check and can make prediction. Currently, we have three ``checker``\ s, which are described below.
The ``checker``\ s check a learnware object in different aspects, including environment configuration (``CondaChecker``), semantic specifications (``EasySemanticChecker``), and statistical specifications (``EasyStatChecker``). Each checker's ``__call__`` method is designed to be invoked as a function to conduct the respective checks on the learnware and return the outcomes. It defines three types of learnwares: ``INVALID_LEARNWARE`` denotes the learnware does not pass the check, ``NONUSABLE_LEARNWARE`` denotes the learnware passes the check but cannot make predictions, ``USABLE_LEARNWARE`` denotes the leanrware pass the check and can make predictions. Currently, we have three ``checker``\ s, which are described below.


``CondaChecker``
------------------
This ``checker`` checks a the environment of the learnware object. It creates a ``LearnwaresContainer`` instance to handle the Learnware and uses ``inner_checker`` to check the Learnware. If an exception occurs, it logs the error and returns ``NONUSABLE_LEARNWARE`` status and error message.
This ``checker`` checks the environment of the learnware object. It creates a ``LearnwaresContainer`` instance to handle the Learnware and uses ``inner_checker`` to check the Learnware. If an exception occurs, it logs the error and returns the ``NONUSABLE_LEARNWARE`` status and error message.


``EasySemanticChecker``
-------------------------
This ``checker`` checks the semantic specification of a learnware object. It checks if the given semantic specification conforms to predefined standards. It verifies each key in predefined dictionary. If the check fails, it logs the error and returns ``NONUSABLE_LEARNWARE`` status and error message.
This ``checker`` checks the semantic specification of a learnware object. It checks if the given semantic specification conforms to predefined standards. It verifies each key in a predefined dictionary. If the check fails, it logs the error and returns the ``NONUSABLE_LEARNWARE`` status and error message.


``EasyStatChecker``
---------------------

This ``checker`` checks the statistical specification and functionality of a learnware object. It performs multiple checks to validate the learnware. It checks for model instantiation, verifies input shape and statistical specifications, and test output shape using random generated data. In case of any exceptions, it logs the error and returns ``NONUSABLE_LEARNWARE`` status and error message.
This ``checker`` checks the statistical specification and functionality of a learnware object. It performs multiple checks to validate the learnware. It checks for model instantiation, verifies input shape and statistical specifications, and tests output shape using randomly generated data. In case of exceptions, it logs the error and returns the ``NONUSABLE_LEARNWARE`` status and error message.


Current Markets
======================================

The ``learnware`` package provide two different implementation of ``market``, i.e. ``Easy Market`` and ``Hetero Market``. They have different implementation of ``organizer`` and ``searcher``.
The ``learnware`` package provides two different implementations of ``market``, i.e., ``Easy Market`` and ``Hetero Market``. They have different implementations of ``organizer`` and ``searcher``.

Easy Market
-------------
@@ -61,9 +61,9 @@ Easy market is a basic realization of the learnware market. It consists of ``Eas

``EasyOrganizer`` mainly has the following methods to store learnwares, which is an easy way to organize learnwares.

- **reload_market**: Reload the learnware market when server restarted, and return a flag indicating whether the market is reloaded successfully.
- **add_learnware**: Add a learnware with ``learnware_id``, ``semantic_spec`` and model files in ``zip_path`` into the market. Return the ``learnware_id`` and ``learnwere_status``. The ``learnwere_status`` is set ``check_status`` if it is provided, else ``checker`` will be called to generate the ``learnwere_status``.
- **delete_learnware**: Delete the learnware with ``id`` from the market, return a flag of whether the deletion is successfully.
- **reload_market**: Reload the learnware market when the server restarts and return a flag indicating whether the market is reloaded successfully.
- **add_learnware**: Add a learnware with ``learnware_id``, ``semantic_spec`` and model files in ``zip_path`` into the market. Return the ``learnware_id`` and ``learnwere_status``. The ``learnwere_status`` is set to ``check_status`` if it is provided. Otherwise, the ``checker`` will be called to generate the ``learnwere_status``.
- **delete_learnware**: Delete the learnware with ``id`` from the market and return a flag indicating whether the deletion is successful.
- **update_learnware**: Update the learnware's ``zip_path``, ``semantic_spec``, ``check_status``. If None, the corresponding item is not updated. Return a flag indicating whether it passed the ``checker``.
- **get_learnwares**: Similar to **get_learnware_ids**, but return list of learnwares instead of ids.
- **reload_learnware**: Reload all the attributes of the learnware with ``learnware_id``.
@@ -74,14 +74,14 @@ Easy market is a basic realization of the learnware market. It consists of ``Eas
``EasySearcher`` consists of ``EasyFuzzsemanticSearcher`` and ``EasyStatSearcher``. ``EasyFuzzsemanticSearcher`` is a kind of ``Semantic Specification Searcher``, while ``EasyStatSearcher`` is a kind of ``Statistical Specification Searcher``. All these searchers return helpful learnwares based on ``BaseUserInfo`` provided by users.

``BaseUserInfo`` is a ``Python API`` for users to provide enough information to identify helpful learnwares.
When initializing ``BaseUserInfo``, three optional information can be provided: ``id``, ``semantic_spec`` and ``stat_info``. The introductions of these specifications is shown in `COMPONENTS: Specification <./spec.html>`_.
When initializing ``BaseUserInfo``, three optional information can be provided: ``id``, ``semantic_spec`` and ``stat_info``. These specifications' introductions are shown in `COMPONENTS: Specification <./spec.html>`_.


The semantic specification search and statistical specification search have been integrated into the same interface ``EasySearcher``.

- **EasySearcher.__call__(self, user_info: BaseUserInfo, check_status: int = None, max_search_num: int = 5, search_method: str = "greedy",) -> SearchResults**

- It conducts the semantic searcher ``EasyFuzzsematicSearcher`` on all the learnwares from the ``organizer`` with the same ``check_status`` (All learnwares if ``check_status`` is None). If the result is not empty and the ``stat_info`` is provided in ``user_info``, then it conducts ``EasyStatSearcher``, and return the ``SearchResults``.
- It conducts the semantic searcher ``EasyFuzzsematicSearcher`` on all the learnwares from the ``organizer`` with the same ``check_status`` (All learnwares if ``check_status`` is None). If the result is not empty and the ``stat_info`` is provided in ``user_info``, it conducts ``EasyStatSearcher`` and returns the ``SearchResults``.


``Semantic Specification Searcher``
@@ -89,28 +89,28 @@ The semantic specification search and statistical specification search have been

``Semantic Specification Searcher`` is the first-stage search based on ``user_semantic``, identifying potentially helpful learnwares whose models solve tasks similar to your requirements. There are two types of Semantic Specification Search: ``EasyExactSemanticSearcher`` and ``EasyFuzzSemanticSearcher``.

In these two searchers, each learnware in the ``learnware_list`` is compared with ``user_info`` according to their ``semantic_spec``, and added to the search result if mathched. Two semantic_spec are matched when all the key words are matched or empty in ``user_info``. Different keys have different matching rules. Their ``__call__`` functions are the same:
In these two searchers, each learnware in the ``learnware_list`` is compared with ``user_info`` according to their ``semantic_spec`` and added to the search result if matched. Two semantic_spec are matched when all the key words are matched or empty in ``user_info``. Different keys have different matching rules. Their ``__call__`` functions are the same:

- **EasyExactSemanticSearcher/EasyFuzzSemanticSearcher.__call__(self, learnware_list: List[Learnware], user_info: BaseUserInfo)-> SearchResults**

- For keys ``Data``, ``Task``, ``Library`` and ``license``, two``semantic_spec`` keys are matched only if these values(only one value foreach key) of learnware ``semantic_spec`` exists in values(may be muliplevalues for one key) of user ``semantic_spec``.
- For the key ``Scenario``, two ``semantic_spec`` keys are matched iftheir values have nonempty intersections.
- For keys ``Name`` and ``Description``, the values are strings and caseis ignored. In ``EasyExactSemanticSearcher``, two ``semantic_spec`` keysare matched if these values of learnware ``semantic_spec`` is a substringof user ``semantic_spec``; In ``EasyFuzzSemanticSearcher``, first theexact semantic searcher is conducted like ``EasyExactSemanticSearcher``.If the result is empty, the fuzz semantic searcher is activated: the``learnware_list`` is sorted according to the fuzz score function ``fuzzpartial_ratio`` in ``rapidfuzz``.
- For keys ``Name`` and ``Description``, the values are strings and caseis ignored. In ``EasyExactSemanticSearcher``, two ``semantic_spec`` keys are matched if these values of learnware ``semantic_spec`` is a substring of user ``semantic_spec``; In ``EasyFuzzSemanticSearcher``, first the exact semantic searcher is conducted like ``EasyExactSemanticSearcher``.If the result is empty, the fuzz semantic searcher is activated: the ``learnware_list`` is sorted according to the fuzz score function ``fuzzpartial_ratio`` in ``rapidfuzz``.

The results are returned storing in ``single_results`` of ``SearchResults``.
The results are returned and stored in ``single_results`` of ``SearchResults``.


``Statistical Specification Searcher``
''''''''''''''''''''''''''''''''''''''''''

If user's statistical specification ``stat_info`` is provided, the learnware market can perform a more accurate leanware selection using ``EasyStatSearcher``.
If the user's statistical specification ``stat_info`` is provided, the learnware market can perform a more accurate learnware selection using ``EasyStatSearcher``.

- **EasyStatSearcher.__call__(self, learnware_list: List[Learnware], user_info: BaseUserInfo, max_search_num: int = 5, search_method: str = "greedy",) -> SearchResults**
- It searches for helpful learnwares from ``learnware_list`` based on the ``stat_info`` in ``user_info``.
- The result ``SingleSearchItem`` and ``MultipleSearchItem`` are both stored in ``SearchResults``. In ``SingleSearchItem``, it searches for single learnwares that could solve the user task; scores are also provided to represent the fitness of each single learnware and user task. In ``MultipleSearchItem``, it searches for a mixture of learnwares that could solve the user task better; the mixture learnware list and a score for the mixture is returned.
- The parameter ``search_method`` provides two choice of search strategies for mixture learnwares: ``greedy`` and ``auto``. For the search method ``greedy``, each time it chooses a learnware to make their mixture closer to the user's ``stat_info``; for the search method ``auto``, it directly calculates a best mixture weight for the ``learnware_list``.
- For single learnware search, we only return the learnwares with score larger than 0.6; For multiple learnware search, the parameter ``max_search_num`` specifies the maximum length of the returned mixture learnware list.
- The result ``SingleSearchItem`` and ``MultipleSearchItem`` are both stored in ``SearchResults``. In ``SingleSearchItem``, it searches for single learnwares that could solve the user task; scores are also provided to represent the fitness of each single learnware and user task. In ``MultipleSearchItem``, it searches for a mixture of learnwares that could solve the user task better; the mixture learnware list and a score for the mixture are returned.
- The parameter ``search_method`` provides two choice of search strategies for mixture learnwares: ``greedy`` and ``auto``. For the search method ``greedy``, each time it chooses a learnware to make their mixture closer to the user's ``stat_info``; for the search method ``auto``, it directly calculates the best mixture weight for the ``learnware_list``.
- For single learnware search, we only return the learnwares with a score larger than 0.6; For multiple learnware search, the parameter ``max_search_num`` specifies the maximum length of the returned mixture learnware list.


``Easy Checker``
@@ -118,35 +118,35 @@ If user's statistical specification ``stat_info`` is provided, the learnware ma

``EasySemanticChecker`` and ``EasyStatChecker`` are used to check the validity of the learnwares. They are used as:

- ``EasySemanticChecker`` mainly check the integrity and legitimacy of the ``semantic_spec`` in the learnware. A legal ``semantic_spec`` should includes all the keys, and the type of each key should meet our requirements. For keys with type ``Class``, the values should be unique and in our ``valid_list``; for keys with type ``Tag``, the values should not be empty; for keys with type ``String``, a non-empty string is expected as the value; for a table learnware, the dimensions and description of inputs is needed; for ``classification`` or ``regression`` learnwares, the dimensions and description of outputs is indispensable. The learnwares that pass the ``EasySemanticChecker`` is marked as ``NONUSABLE_LEARNWARE``; otherwise, it is ``INVALID_LEARNWARE`` and error information will be returned.
- ``EasySemanticChecker`` mainly check the integrity and legitimacy of the ``semantic_spec`` in the learnware. A legal ``semantic_spec`` should include all the keys, and the type of each key should meet our requirements. For keys with type ``Class``, the values should be unique and in our ``valid_list``; for keys with type ``Tag``, the values should not be empty; for keys with type ``String``, a non-empty string is expected as the value; for a table learnware, the dimensions and description of inputs are needed; for ``classification`` or ``regression`` learnwares, the dimensions and description of outputs are indispensable. The learnwares that pass the ``EasySemanticChecker`` is marked as ``NONUSABLE_LEARNWARE``; otherwise, it is ``INVALID_LEARNWARE``, and error information will be returned.
- ``EasyStatChecker`` mainly check the ``model`` and ``stat_spec`` of the learnwares. It includes the following steps:

- **Check model instantiation**: ``learnware.instantiate_model`` to instantiate the model and transform it to a ``BaseModel``.
- **Check input shape**: Check whether the shape of ``semantic_spec`` input(if exists), ``learnware.input_shape`` and shape of ``stat_spec`` are consistent, and then generate an example input with that shape.
- **Check model prediction**: Use the model to predict the label of the example input, and record the output shape.
- **Check input shape**: Check whether the shape of ``semantic_spec`` input(if it exists), ``learnware.input_shape``, and the shape of ``stat_spec`` are consistent, and then generate an example input with that shape.
- **Check model prediction**: Use the model to predict the label of the example input and record the output shape.
- **Check output shape**: For ``Classification``, ``Regression`` and ``Feature Extraction`` tasks, the output shape should be consistent with that in ``semantic_spec`` and ``learnware.output_shape``. Besides, for ``Regression`` tasks, the output should be a legal class in ``semantic_spec``.

If any step above fails or meets a error, the learnware will be marked as ``INVALID_LEARNWARE``. The learnwares that pass the ``EasyStatChecker`` is marked as ``USABLE_LEARNWARE``.
If any step above fails or meets an error, the learnware will be marked as ``INVALID_LEARNWARE``. The learnwares that pass the ``EasyStatChecker`` are marked as ``USABLE_LEARNWARE``.


Hetero Market
-------------

Hetero Market consists of ``HeteroMapTableOrganizer``, ``HeteroSearcher``, and the checker list ``[EasySemanticChecker, EasyStatChecker]``.
It is an extended version of the Easy Market which accommodates table learnwares from different feature spaces(heterogeneous table learnwares), expanding the applicable scope of learnware paradigm.
This market trains a heterogeneous engine based on existing learnware specifications in the market to merge different specification islands and assign new specifications(``HeteroMapTableSpecification``) to learnwares.
With more learnwares submitted, the heterogeneous engine will continuously update and is expected to build a more precise specification world.
The Hetero Market encompasses ``HeteroMapTableOrganizer``, ``HeteroSearcher``, and the checker list ``[EasySemanticChecker, EasyStatChecker]``.
It represents an extended version of the Easy Market, capable of accommodating table learnwares from diverse feature spaces (referred to as heterogeneous table learnwares), thereby broadening the applicable scope of the learnware paradigm.
This market trains a heterogeneous engine by utilizing existing learnware specifications to merge distinct specification islands and assign new specifications, referred to as ``HeteroMapTableSpecification``, to learnwares.
As more learnwares are submitted, the heterogeneous engine will undergo continuous updates, with the aim of constructing a more precise specification world.


``HeteroMapTableOrganizer``
+++++++++++++++++++++++++++

``HeteroMapTableOrganizer`` overrides methods from ``EasyOrganizer`` and implements new methods to support organization of heterogeneous table learnwares. Key features include:
``HeteroMapTableOrganizer`` overrides methods from ``EasyOrganizer`` and implements new methods to support the organization of heterogeneous table learnwares. Key features include:

- **reload_market**: Reloads the heterogeneous engine if there is one, otherwise initializes an engine with default configurations. Returns a flag indicating whether the market is reloaded successfully.
- **reload_market**: Reloads the heterogeneous engine if there is one. Otherwise, initialize an engine with default configurations. Returns a flag indicating whether the market is reloaded successfully.
- **reset**: Resets the heterogeneous market with specific settings regarding the heterogeneous engine such as ``auto_update``, ``auto_update_limit`` and ``training_args`` configurations.
- **add_learnware**: Add a learnware into the market, meanwhile assigning ``HeteroMapTableSpecification`` to the learnware using the heterogeneous engine. The engine's update process will be triggered if ``auto_update`` is set to True and the number of learnwares in the market with ``USABLE_LEARNWARE`` status exceeds ``auto_update_limit``. Return the ``learnware_id`` and ``learnwere_status``.
- **delete_learnware**: Removes the learnware with ``id`` from the market, also remove its new specification if there is one. Return a flag of whether the deletion is successful.
- **delete_learnware**: Removes the learnware with ``id`` from the market and also removes its new specification if there is one. Return a flag of whether the deletion is successful.
- **update_learnware**: Update the learnware's ``zip_path``, ``semantic_spec``, ``check_status`` and its new specification if there is one. Return a flag indicating whether it passed the ``checker``.
- **generate_hetero_map_spec**: Generate ``HeteroMapTableSpecification`` for users based on the information provided in ``user_info``.
- **train**: Build the heterogeneous engine using learnwares from the market that supports heterogeneous market training.
@@ -158,10 +158,10 @@ With more learnwares submitted, the heterogeneous engine will continuously updat
``HeteroSearcher`` builds upon ``EasySearcher`` with additional support for searching among heterogeneous table learnwares, returning helpful learnwares with feature space and label space different from the user's task requirements.
The semantic specification search and statistical specification search have been integrated into the same interface ``HeteroSearcher``.

- **HeteroSearcher.__call__(self, user_info: BaseUserInfo, check_status: int = None, max_search_num: int = 5, search_method: str = "greedy",) -> SearchResults**
- **HeteroSearcher.__call__(self, user_info: BaseUserInfo, check_status: int = None, max_search_num: int = 5, search_method: str = "greedy") -> SearchResults**

- It conducts the semantic searcher ``EasyFuzzsematicSearcher`` on all the learnwares from the ``HeteroOrganizer`` with the same ``check_status`` (All learnwares if ``check_status`` is None).
- If the ``stat_info`` is provided in ``user_info``, it conducts one of the following two types of statistical specification search using ``EasySearcher``, depending on whether heterogeneous learnware search is enabled. If enabled, ``stat_info`` will be updated with a ``HeteroMapTableSpecification`` generated for the user, and the Hetero Market performs heterogeneous learnware selection based on the updated ``stat_info``. If not enabled, the Hetero Market performs homogeneous learnware selection based on the original ``stat_info``.
- If ``stat_info`` is provided within ``user_info``, it conducts one of two types of statistical specification searches using ``EasySearcher``, depending on whether heterogeneous learnware search is enabled. If enabled, ``stat_info`` will be updated with a user-specific ``HeteroMapTableSpecification``, and the Hetero Market performs heterogeneous learnware search based on the updated ``stat_info``. If not enabled, the Hetero Market performs homogeneous learnware search based on the original ``stat_info``.
.. note::
The heterogeneous learnware search is enabled when ``user_info`` contains valid heterogeneous search information. Please refer to `WORKFLOWS: Hetero Search <../workflows/search.html#hetero-search>`_ for details.

+ 14
- 14
docs/components/spec.rst View File

@@ -3,10 +3,10 @@
Specification
================================

Learnware specification is the core component of the learnware paradigm, linking all processes about learnwares, including uploading, organizing, searching, deploying and reusing.
Learnware specification is the core component of the learnware paradigm, linking all processes about learnwares, including uploading, organizing, searching, deploying, and reusing.

In this section, we will introduce the concept and design of learnware specification in the ``learnware`` package.
We will then explore ``regular specification``\ s tailored for different data types such as tables, images and texts.
We will then explore ``regular specification``\ s tailored for different data types such as tables, images, and texts.
Lastly, we cover a ``system specification`` specifically assigned to table learnwares by the learnware market, aimed at accommodating all available table learnwares into a unified "specification world" despite their heterogeneity.

Concepts & Types
@@ -19,7 +19,7 @@ The ``learnware`` package employs a highly extensible specification design, whic
- **Statistical specification** characterizes the statistical information contained in the model using various machine learning techniques. It plays a crucial role in locating the appropriate place for the model within the specification island.

When searching in the learnware market, the system first locates specification islands based on the semantic specification of the user's task,
then pinpoints highly beneficial learnwares on theses islands based on the statistical specification of the user's task.
then pinpoints highly beneficial learnwares on these islands based on the statistical specification of the user's task.

Statistical Specification
---------------------------
@@ -28,8 +28,8 @@ We employ the ``Reduced Kernel Mean Embedding (RKME) Specification`` as the foun
with adjustments made according to the characteristics of each data type.
The RKME specification is a recent development in learnware specification design, which represents the distribution of a model's training data in a privacy-preserving manner.

Within the ``learnware`` package, you'll find two types of statistical specifications: ``regular specification`` and ``system specification``. The former is generated locally
by users to express their model's statistical information, while the latter is assigned by the learnware market to accommodate and organize heterogeneous learnwares.
Within the ``learnware`` package, you will find two types of statistical specifications: ``regular specification`` and ``system specification``. The former is generated locally
by users to express their model's statistical information, while the learnware market assigns the latter to accommodate and organize heterogeneous learnwares.

Semantic Specification
-----------------------
@@ -56,7 +56,7 @@ as shown in the following code:
regular_spec = generate_stat_spec(type=data_type, x=train_x)
regular_spec.save("stat.json")

It's worth noting that the above code only runs on user's local computer and does not interact with any cloud servers or leak any local private data.
It is worth noting that the above code only runs on the user's local computer and does not interact with cloud servers or leak local private data.

.. note::

@@ -72,7 +72,7 @@ Image Specification

Image data lives in a higher dimensional space than other data types. Unlike lower dimensional spaces, metrics defined based on Euclidean distances (or similar distances) will fail in higher dimensional spaces. This means that measuring the similarity between image samples becomes difficult.

To address these issues, we use the Neural Tangent Kernel (NTK) based on Convolutional Neural Networks (CNN) to measure the similarity of image samples. As we all know, CNN has greatly advanced the field of computer vision and is still a mainstream deep learning technique.
To address these issues, we use the Neural Tangent Kernel (NTK) based on Convolutional Neural Networks (CNN) to measure the similarity of image samples. As we all know, CNN has greatly advanced the field of computer vision and is still a mainstream deep-learning technique.

Usage & Example
^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -82,7 +82,7 @@ Note that the Image Specification is generated on a subset of the CIFAR-10 datas
Then, it is saved to file "cifar10.json" using ``spec.save``.

In many cases, it is difficult to construct Image Specification on the full dataset.
By randomly sampling a subset of the dataset, we can construct Image Specification based on it efficiently, with a strong enough statistical description of the full dataset.
By randomly sampling a subset of the dataset, we can efficiently construct Image Specification based on it, with a strong enough statistical description of the full dataset.

.. tip::
Typically, sampling 3,000 to 10,000 images is sufficient to generate the Image Specification.
@@ -105,11 +105,11 @@ By randomly sampling a subset of the dataset, we can construct Image Specificati
Privacy Protection
^^^^^^^^^^^^^^^^^^^^^^^^^^

In the third row of the figure, we show the eight pseudo-data with the largest weights :math:`\beta` in the Image Specification generated on the CIFAR-10 dataset.
In the third row of the figure, we show the eight pseudo-data with the largest weights`\beta` in the Image Specification generated on the CIFAR-10 dataset.
Notice that the Image Specification generated based on Neural Tangent Kernel (NTK) protects the user's privacy very well.

In contrast, we show the performance of the RBF kernel on image dat in the first row of the figure below.
The RBF not only exposes the real data (plotted in the corresponding position in the second row), but also fails to fully utilise the weights :math:`\beta`.
In contrast, we show the performance of the RBF kernel on image data in the first row of the figure below.
The RBF not only exposes the real data (plotted in the corresponding position in the second row) but also fails to fully utilize the weights :math:`\beta`.

.. image:: ../_static/img/image_spec.png
:align: center
@@ -117,18 +117,18 @@ The RBF not only exposes the real data (plotted in the corresponding position in
Text Specification
--------------------------

Different from tabular data, each text input is a string of different length, so we should first transform them to equal-length arrays. Sentence embedding is used here to complete this transformation. We choose the model ``paraphrase-multilingual-MiniLM-L12-v2``, a lightweight multilingual embedding model. Then, we calculate the RKME specification on the embedding, just like we do with tabular data. Besides, we use the package ``langdetect`` to detect and store the language of the text inputs for further search. We hope to search for the learnware which supports the language of the user task.
Different from tabular data, each text input is a string of different length, so we should first transform them to equal-length arrays. Sentence embedding is used here to complete this transformation. We choose the model ``paraphrase-multilingual-MiniLM-L12-v2``, a lightweight multilingual embedding model. Then, we calculate the RKME specification on the embedding, just like we do with tabular data. Besides, we use the package ``langdetect`` to detect and store the language of the text inputs for further search. We hope to search for the learnware that supports the language of the user task.

System Specification
======================================

In contrast to ``regular specification``\ s which are generated solely by users,
In contrast to ``regular specification``\ s, which are generated solely by users,
``system specification``\ s are higher-level statistical specifications assigned by learnware markets
to effectively accommodate and organize heterogeneous learnwares.
This implies that ``regular specification``\ s are usually applicable across different markets, while ``system specification``\ s are generally closely associated
with particular learnware market implementations.

``system specification`` play a critical role in heterogeneous markets such as the ``Hetero Market``:
``system specification`` plays a critical role in heterogeneous markets such as the ``Hetero Market``:

- Learnware organizers use these specifications to connect isolated specification islands into unified "specification world"s.
- Learnware searchers perform helpful learnware recommendations among all table learnwares in the market, leveraging the ``system specification``\ s generated for users.


Loading…
Cancel
Save