@@ -340,15 +342,15 @@ feature_augment_predict_y = reuse_feature_augment.predict(user_data=test_x)
## 图像场景实验
-其次,我们在图像数据集上评估了我们的算法。值得注意的是,不同尺寸的图像可以通过调整大小进行标准化,无需考虑异构特征情况。
+接下来,我们对图像数据集进行了算法评估。由于图像尺寸的差异可以通过调整大小来标准化处理,因此不需要考虑特征异构的情况。
### 实验设置
-我们选择了经典的图像分类数据集 [CIFAR-10](https://www.cs.toronto.edu/~kriz/cifar.html),其中包含 10 个类别的 60000 张 32x32 的彩色图像。总共上传了 50 个学件:每个学件包含一个卷积神经网络,该网络在一个不平衡的子集上进行训练,包括来自四个类别的 12000 个样本,采样比例为 `0.4:0.4:0.1:0.1`。总共测试了 100 个用户任务,每个用户任务包含 3000 个 CIFAR-10 样本,分为六个类别,采样比例为 `0.3:0.3:0.1:0.1:0.1:0.1`。
+我们选用了经典的 [CIFAR-10](https://www.cs.toronto.edu/~kriz/cifar.html) 图像分类数据集进行实验,该数据集包括 10 个类别的 60000 张 32x32 彩色图像。我们上传了 50 个学件,每个学件包含一个在不平衡子集上训练的卷积神经网络模型,这个子集由四个类别的 12000 个样本组成,其采样比例为 `0.4:0.4:0.1:0.1`。我们设定了 100 个用户任务,每个任务由 CIFAR-10 的 3,000 个样本组成,这些样本涵盖六个类别,采样比例为 `0.3:0.3:0.1:0.1:0.1:0.1`。
### 实验结果
-我们使用 `1 - Accuracy` 作为损失度量来评估各种方法的平均性能。下述实验结果显示,当用户面临标记数据的稀缺或仅拥有有限数量的标记数据(少于 2000 个实例)时,利用学件市场可以获得更好的性能。
+我们使用 `1 - Accuracy` 作为损失度量来评估各种方法的平均性能。实验结果表明,在标记数据稀缺或仅有限数量(不超过 2000 个实例)的情况下,通过利用学件市场的资源,可以实现更优的性能表现。
@@ -368,15 +370,16 @@ feature_augment_predict_y = reuse_feature_augment.predict(user_data=test_x)
## 文本场景实验
-最后,我们在文本数据集上对我们的算法进行了评估。文本数据的特征天然异构,但这个问题可以通过使用句子嵌入提取器 (Sentence Embedding Extractor) 来解决。
+最后,我们在文本数据集上对我们的算法进行了评估。由于文本数据的特征天然异构,我们通过使用句子嵌入提取器(Sentence Embedding Extractor)来统一处理这一问题。
### 实验设置
-我们在经典的文本分类数据集上进行了实验:[20-newsgroup](http://qwone.com/~jason/20Newsgroups/),该数据集包含大约 20000 份新闻文档,包含 20 个不同的新闻组。与图像实验类似,我们一共上传了 50 个学件。每个学件都是在一个子集上进行训练,该子集仅包括三个超类中一半样本的数据,其中的模型为 `tf-idf` 特征提取器与朴素贝叶斯分类器的结合。我们定义了 10 个用户任务,每个任务包括两个超类。
+我们在经典的文本分类数据集 [20-newsgroup](http://qwone.com/~jason/20Newsgroups/) 上进行了实验,该数据集包含约 20000 篇新闻文档,涵盖 20 个不同的新闻组。与图像实验类似,我们一共上传了 50 个学件。每个学件的模型组合了 tf-idf 特征提取器与朴素贝叶斯分类器,在一个样本子集上进行训练。这些样本子集仅包括三个超类中一半的样本数据。我们设置了 10 个用户任务,每个任务包括两个超类。
+
### 实验结果
-结果如下表和图所示。同样地,即使没有提供标记数据,通过学件的识别和复用所达到的性能可以与市场上最佳学件相匹敌。此外,利用学件市场相对于从头训练模型可以减少约 2000 个样本。
+结果如下表和图所示。同样地,即使没有提供标记数据,通过学件的识别和复用所达到的性能可以与市场上最佳学件相匹敌。此外,相比于从头训练模型,利用学件市场可以节省大约 2000 个样本。
@@ -416,7 +419,7 @@ feature_augment_predict_y = reuse_feature_augment.predict(user_data=test_x)
## 如何贡献
-`learnware` 还很年轻,可能存在错误和问题。我们非常欢迎大家为 `learnware` 做出贡献。我们为所有的开发者提供了详细的[项目开发指南](https://learnware.readthedocs.io/en/latest/about/dev.html),并设置了相应的 commit 格式和 pre-commit 配置,请大家遵守。非常感谢大家的贡献!
+`learnware` 还很年轻,可能存在错误和问题。我们非常欢迎大家为 `learnware` 做出贡献。我们为所有的开发者提供了详细的[项目开发指南](https://learnware.readthedocs.io/en/latest/about/dev.html),并设置了相应的 commit 格式和 pre-commit 配置,请大家遵守。非常感谢大家的参与和支持!
## 关于我们
diff --git a/docs/components/learnware.rst b/docs/components/learnware.rst
index a98070d..7109604 100644
--- a/docs/components/learnware.rst
+++ b/docs/components/learnware.rst
@@ -19,60 +19,61 @@ In our implementation, the class ``Learnware`` has three important member variab
Learnware for Hetero Reuse
=======================================================================
-In the Hetero Market (refer to `COMPONENTS: Hetero Market <./market.html#hetero-market>`_ for more details), ``HeteroSearcher`` identifies and recommends valuable learnwares from the entire market. This includes learnwares with different feature/label spaces compared to the user's task requirements, known as "heterogeneous learnwares".
+In the Hetero Market (refer to `COMPONENTS: Hetero Market <./market.html#hetero-market>`_ for more details),
+``HeteroSearcher`` identifies and recommends valuable learnwares from the entire market, returning learnwares with different feature and prediction spaces compared to the user's task requirements,
+known as "heterogeneous learnwares".
-To enable the reuse of these heterogeneous learnwares, we have developed ``FeatureAlignLearnware`` and ``HeteroMapLearnware``.
-These components expand the capabilities of standard ``Learnware`` by aligning the feature and label spaces to match the user's task requirements.
-They also provide essential interfaces for effectively applying heterogeneous learnwares to tasks beyond their original purposes.
+``FeatureAlignLearnware`` and ``HeteroMapLearnware`` facilitate the deployment and reuse of heterogeneous learnwares.
+They extend the capabilities of standard ``Learnware`` by aligning the input and output domain of heterogeneous learnwares to match those of the user's task.
+These feature-aligned learnwares can then be utilized with either data-free reusers or data-dependent reusers.
``FeatureAlignLearnware``
---------------------------
-``FeatureAlignLearnware`` employs a neural network to align the feature space of the learnware to the user's task.
-It is initialized with a ``Learnware`` and has the following methods to expand the applicable scope of this ``Learnware``:
+``FeatureAlignLearnware`` utilizes a neural network to align the feature space of the learnware to the user's task.
+It is initialized with a ``Learnware`` and offers the following methods to extend the ability of this ``Learnware``:
-- **align**: Trains a neural network to align ``user_rkme``, which is the ``RKMETableSpecification`` of the user's data, with the learnware's statistical specification.
-- **predict**: Predict the output for user data using the trained neural network and the original learnware's model.
+- **align**: This method trains a neural network to align ``user_rkme``(the ``RKMETableSpecification`` of the user's data) with the learnware's statistical specification.
+- **predict**: Using the trained neural network and the original learnware's model, this method predicts the output for the user's data.
``HeteroMapAlignLearnware``
-----------------------------
-If user data is not only heterogeneous in feature space but also in label space, ``HeteroMapAlignLearnware`` uses the help of
-a small amount of labeled data ``(x_train, y_train)`` required from the user task to align heterogeneous learnwares with the user task.
-There are two critical interfaces in ``HeteroMapAlignLearnware``:
+If user data is heterogeneous not only in feature space but also in label space, ``HeteroMapAlignLearnware`` employs
+minor labeled data ``(x_train, y_train)`` from the user task to align heterogeneous learnwares with the user task.
+``HeteroMapAlignLearnware`` provides two key interfaces:
- ``HeteroMapAlignLearnware.align(self, user_rkme: RKMETableSpecification, x_train: np.ndarray, y_train: np.ndarray)``
- - **input space alignment**: Align the feature space of the learnware to the user task's statistical specification ``user_rkme`` using ``FeatureAlignLearnware``.
- - **output space alignment**: Further align the label space of the aligned learnware to the user task through supervised learning of ``FeatureAugmentReuser`` using ``(x_train, y_train)``.
+ - **Input space alignment**: Aligns the learnware's feature space to the user task's statistical specification ``user_rkme`` using ``FeatureAlignLearnware``.
+ - **Output space alignment**: Further aligns the label space of the aligned learnware to the user task through a simple model ``FeatureAugmentReuser``, which conduct feature augmentation and is trained on ``(x_train, y_train)``.
- ``HeteroMapAlignLearnware.predict(self, user_data)``
- - If input space and output space alignment are performed, use the ``FeatureAugmentReuser`` to predict the output for ``user_data``.
+ - If input space and output space alignment are performed, it uses ``FeatureAugmentReuser`` to predict the output for ``user_data``.
All Reuse Methods
===========================
-In addition to applying ``Learnware``, ``FeatureAlignLearnware`` or ``HeteroMapAlignLearnware`` objects directly by calling their ``predict`` interface,
-the ``learnware`` package also provides a set of ``Reuse Methods`` for users to further customize a single or multiple learnwares, with the hope of enabling learnwares to be
-helpful beyond their original purposes and eliminating the need for users to build models from scratch.
+In addition to directly applying ``Learnware``, ``FeatureAlignLearnware`` or ``HeteroMapAlignLearnware`` objects by calling their ``predict`` interface,
+the ``learnware`` package also provides a set of baseline ``Reuse Methods`` for users to further customize single or multiple learnwares, with the hope of enabling learnwares to be
+helpful beyond their original purposes and reducing the need for users to build models from scratch.
-There are two main categories of ``Reuse Methods``: (1) direct reuse and (2) reuse based on a small amount of labeled data.
+There are two main categories of ``Reuse Methods``: (1) data-free reusers which reuse learnwares directly and (2) data-dependent reusers which reuse learnwares with a small amount of labeled data.
.. note::
- Combine ``HeteroMapAlignLearnware`` with the following reuse methods to enable the reuse of heterogeneous learnwares. See `WORKFLOW: Hetero Reuse <../workflows/reuse.html#hetero-reuse>`_ for details.
-
-Direct Reuse of Learnware
---------------------------
+ Combine ``HeteroMapAlignLearnware`` with the following reuse methods to reuse heterogeneous learnwares conveniently. See `WORKFLOW: Hetero Reuse <../workflows/reuse.html#hetero-reuse>`_ for details.
+Data-Free Reusers
+------------------
Two methods for direct reuse of learnwares are provided: ``JobSelectorReuser`` and ``AveragingReuser``.
JobSelectorReuser
^^^^^^^^^^^^^^^^^^
-``JobSelectorReuser`` trains a classifier ``job selector`` that identifies the optimal learnware for each data point in user data.
+``JobSelectorReuser`` trains a classifier ``job selector`` that identifies the most suitable learnware for each data point in user data.
There are three member variables:
- ``learnware_list``: A list of ``Learnware`` objects for the ``JobSelectorReuser`` to choose from.
@@ -81,14 +82,14 @@ There are three member variables:
The most important methods of ``JobSelectorReuser`` are ``job_selector`` and ``predict``:
-- **job_selector**: Train a ``job selector`` based on user's data and the ``learnware_list``. Processions are different based on the value of ``use_herding``:
+- **job_selector**: Train a ``job selector`` based on user's data and the ``learnware_list``. The approaches varies based on the ``use_herding`` setting:
- - If ``use_herding`` is False: Statistical specifications of learnwares in ``learnware_list`` combined with the corresponding learnware index are used to train the ``job selector``.
+ - If ``use_herding`` is False: Statistical specifications of learnwares in ``learnware_list``, along with their respective indices, are used to train the ``job selector``.
- If ``use_herding`` is True:
- - Estimate the mixture weight based on user raw data and the statistical specifications of learnwares in ``learnware_list``
- - Use the mixture weight to generate ``herding_num`` auxiliary data points which mimic the user task's distribution through the kernel herding method
- - Finally, it learns the ``job selector`` on the auxiliary data points.
+ - The mixture weight is estimated based on user raw data and the statistical specifications of learnwares in ``learnware_list``
+ - The kernel herding method generates ``herding_num`` auxiliary data points to mimic the user task's distribution using the mixture weight
+ - The ``job selector`` is then trained on these auxiliary data points
- **predict**: The ``job selector`` is essentially a multi-class classifier :math:`g(\boldsymbol{x}):\mathcal{X}\rightarrow \mathcal{I}` with :math:`\mathcal{I}=\{1,\ldots, C\}`, where :math:`C` is the size of ``learnware_list``. Given a testing sample :math:`\boldsymbol{x}`, the ``JobSelectorReuser`` predicts it by using the :math:`g(\boldsymbol{x})`-th learnware in ``learnware_list``.
@@ -105,8 +106,8 @@ specifies the ensemble method(default is set to ``mean``).
- For classification tasks, ``mode`` has two available options. If ``mode`` is set to ``vote_by_label``, the prediction is the majority vote label based on learnwares' output labels. If ``mode`` is set to ``vote_by_prob``, the prediction is the mean vector of all learnwares' output label probabilities.
-Reuse Learnware with Labeled Data
-----------------------------------
+Data-Dependent Reusers:
+------------------------
When users have a small amount of labeled data available, the ``learnware`` package provides two methods: ``EnsemblePruningReuser`` and ``FeatureAugmentReuser`` to help reuse learnwares.
They are both initialized with a list of ``Learnware`` objects ``learnware_list`` and have different implementations of ``fit`` and ``predict`` methods.
@@ -115,8 +116,8 @@ EnsemblePruningReuser
^^^^^^^^^^^^^^^^^^^^^^
The ``EnsemblePruningReuser`` class implements a selective ensemble approach inspired by the MDEP algorithm [1]_.
-It selects a subset of learnwares from ``learnware_list``, utilizing the user's labeled data for effective ensemble integration on user tasks.
-This method effectively balances validation error, margin ratio, and ensemble size, leading to a robust and optimized selection of learnwares for task-specific ensemble creation.
+It selects a subset of learnwares from ``learnware_list`` using a multi-objective evolutionary algorithm and uses the ``AveragingReuser`` for average ensemble.
+This method effectively balances validation error, margin ratio, and ensemble size, leading to a robust selection of learnwares for specific user tasks.
- **fit**: Effectively prunes the large set of learnwares ``learnware_list`` by evaluating and comparing the learnwares based on their performance on user's labeled validation data ``(val_X, val_y)``. Returns the most suitable subset of learnwares.
- **predict**: The ``mode`` member variable has two available options. Set ``mode`` to ``regression`` for regression tasks and ``classification`` for classification tasks. The prediction is the average of the selected learnwares' outputs.
@@ -130,10 +131,10 @@ outputs of the learnwares from ``learnware_list`` on the user's validation data
The augmented data (concatenated features combined with validation labels ``val_y``) are then used to train a simple model ``augment_reuser``, which gives the final prediction
on ``user_data``.
-- **fit**: Trains the ``augment_reuser`` using augmented user validation data. For classification tasks, ``mode`` should be set to ``classification``, and ``augment_reuser`` is a ``LogisticRegression`` model. For regression tasks, the mode should be set to ``classification``, and ``augment_reuser`` is a ``RidgeCV`` model.
+- **fit**: Trains the ``augment_reuser`` using augmented user validation data. For classification tasks, ``mode`` should be set to ``classification``, and ``augment_reuser`` is a ``LogisticRegression`` model. For regression tasks, the mode should be set to ``regression``, and ``augment_reuser`` is a ``RidgeCV`` model.
References
-----------
-.. [1] Yu-Chang Wu, Yi-Xiao He, Chao Qian, and Zhi-Hua Zhou. Multi-objective Evolutionary Ensemble Pruning Guided by Margin Distribution. In: Proceedings of the 17th International Conference on Parallel Problem Solving from Nature (PPSN'22), Dortmund, Germany, 2022.
\ No newline at end of file
+.. [1] Yu-Chang Wu, Yi-Xiao He, Chao Qian, and Zhi-Hua Zhou. Multi-objective evolutionary ensemble pruning guided by margin distribution. In *Proceedings of the 17th International Conference on Parallel Problem Solving from Nature*, 2022.
\ No newline at end of file
diff --git a/docs/components/market.rst b/docs/components/market.rst
index f27a5cc..c394edd 100644
--- a/docs/components/market.rst
+++ b/docs/components/market.rst
@@ -4,34 +4,36 @@
Learnware Market
================================
-The ``Learnware Market`` receives high-performance machine learning models from developers, incorporates them into the system, and provides services to users by identifying and reusing learnware to help users solve current tasks. Developers voluntarily submit various learnwares to the learnware market, and the market conducts quality checks and further organization of these learnwares. When users submit task requirements, the learnware market automatically selects whether to recommend a single learnware or a combination of multiple learnwares.
+The ``Learnware Market``, serving as the implementation of the learnware doc system, receives high-performance machine learning models from developers, incorporates them into the system, and provides services to users by identifying and reusing learnware to help users solve current tasks. Developers voluntarily submit various learnwares to the learnware doc system, and the market conducts quality checks and further organization of these learnwares. When users submit task requirements, the learnware doc system automatically selects whether to recommend a single learnware or a combination of multiple learnwares.
-The ``Learnware Market`` will receive various kinds of learnwares, and learnwares from different feature/label spaces form numerous islands of specifications. All these islands constitute the ``specification world`` in the learnware market. The market should discover and establish connections between different islands and merge them into a unified specification world. This further organization of learnwares supports search learnwares among all learnwares, not just among learnwares that have the same feature space and label space with the user's task requirements.
+The ``Learnware Market`` will receive various kinds of learnwares, and learnwares from different feature and prediction spaces form numerous islands of specifications. Collectively, these islands constitute the ``specification world`` in the learnware doc system. The doc system should discover and establish connections between different islands and integrate them into a unified specification world, with the hope of broadening the search scope and preliminarily supporting learnware identification from the entire learnware collection, not just among learnwares that share the same feature and prediction space with the user's task requirements.
Framework
======================================
-The ``Learnware Market`` is combined with a ``organizer``, a ``searcher``, and a list of ``checker``\ s.
+The ``Learnware Market`` implements the market module which is designed for learnware organization, identification and usability testing. A single market module consists of one ``organizer`` module, one ``searcher`` module, and multiple ``checker`` modules.
-The ``organizer`` can store and organize learnwares in the market. It supports ``add``, ``delete``, and ``update`` operations for learnwares. It also provides the interface for the ``searcher`` to search learnwares based on user requirements.
+The ``organizer`` module oversees the storage and organization of learnware, supporting operations such as reloading the entire learnware collection and performing insertions, deletions and updates.
-The ``searcher`` can search learnwares based on user requirements. The implementation of ``searcher`` depends on the concrete implementation and interface for ``organizer``, where usually an ``organizer`` can be compatible with multiple different ``searcher``\ s.
+The ``searcher`` module conducts learnware identification based on user information, which encompasses statistical and semantic specifications. It implements several ``searcher``\ s to retrieve learnwares that meet user requirements and recommends them as search results, where each ``searcher`` employs a different search algorithm.
-The ``checker`` is used for checking the learnware in some standards. It should check the utility of a learnware and return the status and a message related to the learnware's check result. Only the learnwares who passed the ``checker`` could be able to be stored and added into the ``Learnware Market``.
+The ``checker`` module is responsible for checking the usability and quality of learnwares by verifying the availability of semantic and statistical specifications and creating a runtime environment to test learnware models based on the model container. The learnwares that pass the ``checker`` module are then inserted and stored by the organizer module, appearing in the ``Learnware Market``.
Current Checkers
======================================
-The ``learnware`` package provides two different implementations of ``Learnware Market`` where both share the same ``checker`` list. So we first introduce the details of ``checker``\ s.
-
-The ``checker``\ s check a learnware object in different aspects, including environment configuration (``CondaChecker``), semantic specifications (``EasySemanticChecker``), and statistical specifications (``EasyStatChecker``). Each checker's ``__call__`` method is designed to be invoked as a function to conduct the respective checks on the learnware and return the outcomes. It defines three types of learnwares: ``INVALID_LEARNWARE`` denotes the learnware does not pass the check, ``NONUSABLE_LEARNWARE`` denotes the learnware passes the check but cannot make predictions, ``USABLE_LEARNWARE`` denotes the leanrware pass the check and can make predictions. Currently, we have three ``checker``\ s, which are described below.
+The ``checker`` module checks a learnware from different aspects using different ``checker``\ s, including environment configuration (``CondaChecker``), semantic specifications (``EasySemanticChecker``), and statistical specifications (``EasyStatChecker``).
+Each checker's ``__call__`` method is designed to be invoked as a function to conduct the respective checks on the learnware and return the outcomes.
+Three types of learnware statuses are defined: ``INVALID_LEARNWARE`` indicates the learnware fails the check,
+``NONUSABLE_LEARNWARE`` indicates the learnware passes the check but is unable to make predictions, ``USABLE_LEARNWARE`` denotes the learnware passes the check and can make predictions.
+Currently, there are three implemented ``checker``\ s within this module, described as follows.
``CondaChecker``
------------------
-This ``checker`` checks the environment of the learnware object. It creates a ``LearnwaresContainer`` instance to handle the Learnware and uses ``inner_checker`` to check the Learnware. If an exception occurs, it logs the error and returns the ``NONUSABLE_LEARNWARE`` status and error message.
+This ``checker`` checks the environment of the learnware object. It creates a ``LearnwaresContainer`` instance to containerize the learnware and uses ``inner_checker`` to check the Learnware. If an exception occurs, it logs the error and returns the ``NONUSABLE_LEARNWARE`` status with error message.
``EasySemanticChecker``
@@ -48,12 +50,13 @@ This ``checker`` checks the statistical specification and functionality of a lea
Current Markets
======================================
-The ``learnware`` package provides two different implementations of ``market``, i.e., ``Easy Market`` and ``Hetero Market``. They have different implementations of ``organizer`` and ``searcher``.
+The ``learnware`` package provides two different implementations of ``market``, i.e., ``Easy Market`` and ``Hetero Market``.
+They share the same ``checker`` module and have different implementations of ``organizer`` and ``searcher``.
Easy Market
-------------
-Easy market is a basic realization of the learnware market. It consists of ``EasyOrganizer``, ``EasySearcher``, and the checker list ``[EasySemanticChecker, EasyStatChecker]``.
+Easy market is a basic realization of the learnware doc system. It consists of ``EasyOrganizer``, ``EasySearcher``, and the checker list ``[EasySemanticChecker, EasyStatChecker]``.
``Easy Organizer``
@@ -73,8 +76,7 @@ Easy market is a basic realization of the learnware market. It consists of ``Eas
``EasySearcher`` consists of ``EasyFuzzsemanticSearcher`` and ``EasyStatSearcher``. ``EasyFuzzsemanticSearcher`` is a kind of ``Semantic Specification Searcher``, while ``EasyStatSearcher`` is a kind of ``Statistical Specification Searcher``. All these searchers return helpful learnwares based on ``BaseUserInfo`` provided by users.
-``BaseUserInfo`` is a ``Python API`` for users to provide enough information to identify helpful learnwares.
-When initializing ``BaseUserInfo``, three optional information can be provided: ``id``, ``semantic_spec`` and ``stat_info``. These specifications' introductions are shown in `COMPONENTS: Specification <./spec.html>`_.
+``BaseUserInfo`` is a ``Python API`` for users to provide enough information to identify helpful learnwares. When initializing ``BaseUserInfo``, three optional information can be provided: ``id``, ``semantic_spec`` and ``stat_info``. These specifications' introductions are shown in `COMPONENTS: Specification <./spec.html>`_.
The semantic specification search and statistical specification search have been integrated into the same interface ``EasySearcher``.
@@ -89,13 +91,13 @@ The semantic specification search and statistical specification search have been
``Semantic Specification Searcher`` is the first-stage search based on ``user_semantic``, identifying potentially helpful learnwares whose models solve tasks similar to your requirements. There are two types of Semantic Specification Search: ``EasyExactSemanticSearcher`` and ``EasyFuzzSemanticSearcher``.
-In these two searchers, each learnware in the ``learnware_list`` is compared with ``user_info`` according to their ``semantic_spec`` and added to the search result if matched. Two semantic_spec are matched when all the key words are matched or empty in ``user_info``. Different keys have different matching rules. Their ``__call__`` functions are the same:
+In these two searchers, each learnware in the ``learnware_list`` is compared with ``user_info`` based on their ``semantic_spec``. A learnware is added to the search result if a match is found. Two ``semantic_spec``\ s are considered matched when all the key words either match or are empty in ``user_info``. Different keys follow different matching rules. The ``__call__`` function for these searchers are the same:
- **EasyExactSemanticSearcher/EasyFuzzSemanticSearcher.__call__(self, learnware_list: List[Learnware], user_info: BaseUserInfo)-> SearchResults**
- - For keys ``Data``, ``Task``, ``Library`` and ``license``, two``semantic_spec`` keys are matched only if these values(only one value foreach key) of learnware ``semantic_spec`` exists in values(may be muliplevalues for one key) of user ``semantic_spec``.
- - For the key ``Scenario``, two ``semantic_spec`` keys are matched iftheir values have nonempty intersections.
- - For keys ``Name`` and ``Description``, the values are strings and caseis ignored. In ``EasyExactSemanticSearcher``, two ``semantic_spec`` keys are matched if these values of learnware ``semantic_spec`` is a substring of user ``semantic_spec``. In ``EasyFuzzSemanticSearcher``, it starts with the same kind of exact semantic search as ``EasyExactSemanticSearcher``. If the result is empty, the fuzz semantic searcher is activated: the ``learnware_list`` is sorted according to the fuzz score function ``fuzzpartial_ratio`` in ``rapidfuzz``.
+ - For the keys ``Data``, ``Task``, ``Library``, and ``license`` in ``semantic_spec``, a match occurs only when the value (only one value for each key) in a learnware's ``semantic_spec`` is also found in the values (which may be multiple for one key) in the user's ``semantic_spec``.
+ - For the key ``Scenario``, two ``semantic_spec`` keys are matched if their values have nonempty intersections.
+ - For the keys ``Name`` and ``Description``, the values are strings and case sensitivity is ignored. In ``EasyExactSemanticSearcher``, two ``semantic_spec`` keys are matched if these values in the learnware ``semantic_spec`` is a substring of the corresponding values in the user ``semantic_spec``. ``EasyFuzzSemanticSearcher`` begins with the same exact semantic search as ``EasyExactSemanticSearcher``. If no results are found, it activates a fuzz semantic searcher: the ``learnware_list`` is then sorted according to the fuzz score function ``fuzzpartial_ratio`` provided by ``rapidfuzz``.
The results are returned and stored in ``single_results`` of ``SearchResults``.
@@ -103,28 +105,28 @@ The results are returned and stored in ``single_results`` of ``SearchResults``.
``Statistical Specification Searcher``
''''''''''''''''''''''''''''''''''''''''''
-If the user's statistical specification ``stat_info`` is provided, the learnware market can perform a more accurate learnware selection using ``EasyStatSearcher``.
+If the user's statistical specification ``stat_info`` is provided, the learnware doc system can perform more targeted learnware identification using ``EasyStatSearcher``.
- **EasyStatSearcher.__call__(self, learnware_list: List[Learnware], user_info: BaseUserInfo, max_search_num: int = 5, search_method: str = "greedy",) -> SearchResults**
- It searches for helpful learnwares from ``learnware_list`` based on the ``stat_info`` in ``user_info``.
- - The result ``SingleSearchItem`` and ``MultipleSearchItem`` are both stored in ``SearchResults``. In ``SingleSearchItem``, it searches for individual learnware solutions for the user's task, and it also assigns scores to indicate the compatibility of each learnware with the user's task. In ``MultipleSearchItem``, it searches for a mixture of learnwares that could solve the user task better; the mixture learnware list and a score for the mixture are returned.
- - The parameter ``search_method`` provides two choice of search strategies for mixture learnwares: ``greedy`` and ``auto``. For the search method ``greedy``, each time it chooses a learnware to make their mixture closer to the user's ``stat_info``; for the search method ``auto``, it directly calculates the best mixture weight for the ``learnware_list``.
- - For single learnware search, we only return the learnwares with a score larger than 0.6. For multiple learnware search, the parameter ``max_search_num`` specifies the maximum length of the returned mixture learnware list.
+ - ``SingleSearchItem`` and ``MultipleSearchItem`` are types of results stored in ``SearchResults`. ``SingleSearchItem``` contains single recommended learnwares for the user's task, along with scores indicating each learnware's compatibility with the task. ``MultipleSearchItem`` includes a combination of learnwares, attempting to address the task better, and provides an overall score for this mixture.
+ - The parameter ``search_method`` offers two options for search strategies of mixture learnwares: ``greedy`` and ``auto``. With the ``greedy`` method, it incrementally adds learnwares that significantly reduce the distribution distance, thereby bringing the mixture closer te the user's ``stat_info``. With the the search method ``auto``, it directly calculates the optimal mixture weights for the ``learnware_list``.
+ - For single learnware search, only learnwares with a score higher than 0.6 are returned. For multiple learnware search, the parameter ``max_search_num`` specifies the maximum number of learnwares in the returned mixture learnware list.
``Easy Checker``
++++++++++++++++++++
-``EasySemanticChecker`` and ``EasyStatChecker`` are used to check the validity of the learnwares. They are used as:
+``EasySemanticChecker`` and ``EasyStatChecker`` are used to verify the validity of the learnwares:
-- ``EasySemanticChecker`` mainly check the integrity and legitimacy of the ``semantic_spec`` in the learnware. A legal ``semantic_spec`` should include all the keys, and the type of each key should meet our requirements. For keys with type ``Class``, the values should be unique and in our ``valid_list``; for keys with type ``Tag``, the values should not be empty; for keys with type ``String``, a non-empty string is expected as the value; for a table learnware, the dimensions and description of inputs are needed; for ``classification`` or ``regression`` learnwares, the dimensions and description of outputs are indispensable. The learnwares that pass the ``EasySemanticChecker`` is marked as ``NONUSABLE_LEARNWARE``; otherwise, it is ``INVALID_LEARNWARE``, and error information will be returned.
-- ``EasyStatChecker`` mainly check the ``model`` and ``stat_spec`` of the learnwares. It includes the following steps:
+- ``EasySemanticChecker`` checks the integrity and legitimacy of the ``semantic_spec`` in learnware. (1) A valid ``semantic_spec`` must include all necessary keys, with each key's type conforming to specified requirements. For ``Class`` type keys, values should be unique and in the ``valid_list``; for ``Tag`` type keys, values should not be empty; for ``String`` type keys, a non-empty string is expected. (2) Tabular learnwares should include input dimensions and feature descriptions within their ``semantic_spec``; (3) ``Classification`` or ``Regression`` learnwares should provide output dimensions and descriptions. Learnwares passing the ``EasySemanticChecker`` are marked as ``NONUSABLE_LEARNWARE``; otherwise, as ``INVALID_LEARNWARE``, with error information returned.
+- ``EasyStatChecker`` checks the ``model`` and ``stat_spec`` of the learnwares, involving:
- - **Check model instantiation**: ``learnware.instantiate_model`` to instantiate the model and transform it to a ``BaseModel``.
- - **Check input shape**: Check whether the shape of ``semantic_spec`` input(if it exists), ``learnware.input_shape``, and the shape of ``stat_spec`` are consistent, and then generate an example input with that shape.
- - **Check model prediction**: Use the model to predict the label of the example input and record the output shape.
- - **Check output shape**: For ``Classification``, ``Regression`` and ``Feature Extraction`` tasks, the output shape should be consistent with that in ``semantic_spec`` and ``learnware.output_shape``. Besides, for ``Regression`` tasks, the output should be a legal class in ``semantic_spec``.
+ - **Model instantiation check**: Utilizing ``learnware.instantiate_model`` to instantiate the model as a ``BaseModel``.
+ - **Input shape check**: Checking whether the ``semantic_spec`` input shape (if present), ``learnware.input_shape``, and ``stat_spec`` shape are consistent, and then generating an example input of that shape.
+ - **Model prediction check**: Using the model to predict the label of the example input and recording the model output.
+ - **Output shape check**: For ``Classification``, ``Regression``, and ``Feature Extraction`` tasks, the output's shape should align with ``semantic_spec`` and ``learnware.output_shape``. For ``Regression`` tasks, the output's shape should also be consistent with the output dimension provided in the ``semantic_spec``. For ``Classification`` tasks, the output should either contain valid classification labels or match the output dimension provided in the ``semantic_spec``.
If any step above fails or meets an error, the learnware will be marked as ``INVALID_LEARNWARE``. The learnwares that pass the ``EasyStatChecker`` are marked as ``USABLE_LEARNWARE``.
@@ -133,29 +135,30 @@ Hetero Market
-------------
The Hetero Market encompasses ``HeteroMapTableOrganizer``, ``HeteroSearcher``, and the checker list ``[EasySemanticChecker, EasyStatChecker]``.
-It represents an extended version of the Easy Market, capable of accommodating table learnwares from diverse feature spaces (referred to as heterogeneous table learnwares), thereby broadening the applicable scope of the learnware paradigm.
-This market trains a heterogeneous engine by utilizing existing learnware specifications to merge distinct specification islands and assign new specifications, referred to as ``HeteroMapTableSpecification``, to learnwares.
-As more learnwares are submitted, the heterogeneous engine will undergo continuous updates, with the aim of constructing a more precise specification world.
+It represents an preliminary extension of the Easy Market, designed to support tabular tasks, with the aim of accommodating tabular learnwares from diverse feature spaces (referred to as heterogeneous table learnwares),
+This extension thereby broadens the search scope and facilitates learnware identification and reuse across the entire learnware selection.
+The Hetero Market utilizes existing learnware specifications to train a heterogeneous engine, which merges distinct specification islands and assigns new specifications, known as ``HeteroMapTableSpecification``, to learnwares.
+As more learnwares are submitted, this heterogeneous engine will continuously update, hopefully leading to a more precise specification world.
``HeteroMapTableOrganizer``
+++++++++++++++++++++++++++
-``HeteroMapTableOrganizer`` overrides methods from ``EasyOrganizer`` and implements new methods to support the organization of heterogeneous table learnwares. Key features include:
+``HeteroMapTableOrganizer`` overrides methods from ``EasyOrganizer`` and implements new methods to support the management of heterogeneous table learnwares. Key features include:
- **reload_market**: Reloads the heterogeneous engine if there is one. Otherwise, initialize an engine with default configurations. Returns a flag indicating whether the market is reloaded successfully.
- **reset**: Resets the heterogeneous market with specific settings regarding the heterogeneous engine such as ``auto_update``, ``auto_update_limit`` and ``training_args`` configurations.
-- **add_learnware**: Add a learnware into the market, meanwhile assigning ``HeteroMapTableSpecification`` to the learnware using the heterogeneous engine. The engine's update process will be triggered if ``auto_update`` is set to True and the number of learnwares in the market with ``USABLE_LEARNWARE`` status exceeds ``auto_update_limit``. Return the ``learnware_id`` and ``learnwere_status``.
+- **add_learnware**: Add a learnware into the market, meanwhile generating ``HeteroMapTableSpecification`` for the learnware using the heterogeneous engine. The engine's update process will be triggered if ``auto_update`` is set to True and the number of learnwares in the market with ``USABLE_LEARNWARE`` status exceeds ``auto_update_limit``. Return the ``learnware_id`` and ``learnwere_status``.
- **delete_learnware**: Removes the learnware with ``id`` from the market and also removes its new specification if there is one. Return a flag of whether the deletion is successful.
- **update_learnware**: Update the learnware's ``zip_path``, ``semantic_spec``, ``check_status`` and its new specification if there is one. Return a flag indicating whether it passed the ``checker``.
-- **generate_hetero_map_spec**: Generate ``HeteroMapTableSpecification`` for users based on the information provided in ``user_info``.
+- **generate_hetero_map_spec**: Generate ``HeteroMapTableSpecification`` for users based on the user's statistical specification provided in ``user_info``.
- **train**: Build the heterogeneous engine using learnwares from the market that supports heterogeneous market training.
``HeteroSearcher``
++++++++++++++++++
-``HeteroSearcher`` builds upon ``EasySearcher`` with additional support for searching among heterogeneous table learnwares, returning helpful learnwares with feature space and label space different from the user's task requirements.
+``HeteroSearcher`` builds upon ``EasySearcher`` with additional support for searching among heterogeneous table learnwares, returning potentially helpful learnwares with feature and prediction spaces different from the user's task requirements.
The semantic specification search and statistical specification search have been integrated into the same interface ``HeteroSearcher``.
- **HeteroSearcher.__call__(self, user_info: BaseUserInfo, check_status: int = None, max_search_num: int = 5, search_method: str = "greedy") -> SearchResults**
diff --git a/docs/components/model.rst b/docs/components/model.rst
index 76952ea..7c33674 100644
--- a/docs/components/model.rst
+++ b/docs/components/model.rst
@@ -5,19 +5,16 @@ Model
A learnware is a well-performed trained model with a specification, where the model is an indispensable component of the learnware.
-
-In this section, we will first introduce the ``BaseModel``, which defines the standard format for models in the learnware package.
-Following that, we will introduce the ``ModelContainer``, which implements model deployment in conda virtual environments and Docker containers.
+In this section, we will introduce the model module implemented within ``learnware`` package. We will first introduce the ``BaseModel``, which defines the standard format for models in the ``learnware`` package. Following that, we will introduce the ``ModelContainer``, which implements model deployment in conda virtual environments and Docker containers.
BaseModel
======================================
-The ``BaseModel`` class is a fundamental component of the learnware package and serves as a standard interface for defining machine learning models.
+The ``BaseModel`` class is a fundamental component of the learnware package which provides standardized interface for model training, prediction and fine-tuning.
This class is created to make it easier for users to submit learnwares to the market.
It helps ensure that submitted models follow a clear set of rules and requirements.
-The model in a learnware should inherit the ``BaseModel`` class.
-Here's a more detailed explanation of key components:
+All user models should inherit the ``BaseModel`` class. Here's a more detailed explanation of key components:
- ``input_shape``: Specify the shape of the input features your model expects.
- ``output_shape``: Define the shape of the output predictions generated by your model.
diff --git a/docs/components/spec.rst b/docs/components/spec.rst
index ed116f0..e3fe5f8 100644
--- a/docs/components/spec.rst
+++ b/docs/components/spec.rst
@@ -3,11 +3,11 @@
Specification
================================
-Learnware specification is the core component of the learnware paradigm, linking all processes about learnwares, including uploading, organizing, searching, deploying, and reusing.
+Learnware specification is the central component of the learnware paradigm, linking all processes related to learnwares, including uploading, organizing, searching, deploying, and reusing.
-In this section, we will introduce the concept and design of learnware specification in the ``learnware`` package.
-We will then explore ``regular specification``\ s tailored for different data types such as tables, images, and texts.
-Lastly, we cover a ``system specification`` specifically assigned to table learnwares by the learnware market, aimed at accommodating all available table learnwares into a unified "specification world" despite their heterogeneity.
+In this section, we will introduce the concept and design of learnware specification within the ``learnware`` package.
+We will then explore ``regular specification``\ s covering data types including tables, images, and texts.
+Lastly, we introduce a ``system specification`` specifically generated for tabular learnwares by the learnware doc system using its knowledge, enhancing learnware management and further characterizing their capabilities.
Concepts & Types
==================
@@ -18,18 +18,18 @@ The ``learnware`` package employs a highly extensible specification design, whic
- **Semantic specification** describes the model's type and functionality through a set of descriptions and tags. Learnwares with similar semantic specifications reside in the same specification island
- **Statistical specification** characterizes the statistical information contained in the model using various machine learning techniques. It plays a crucial role in locating the appropriate place for the model within the specification island.
-When searching in the learnware market, the system first locates specification islands based on the semantic specification of the user's task,
-then pinpoints highly beneficial learnwares on these islands based on the statistical specification of the user's task.
+When searching in the learnware doc system, the system first locates specification islands based on the semantic specification of the user's task,
+then pinpoints potentially beneficial learnwares on these islands based on the statistical specification of the user's task.
Statistical Specification
---------------------------
-We employ the ``Reduced Kernel Mean Embedding (RKME) Specification`` as the foundation for implementing statistical specification for diverse data types,
+We employ the ``Reduced Kernel Mean Embedding (RKME) Specification`` as the basis for implementing statistical specification for diverse data types,
with adjustments made according to the characteristics of each data type.
-The RKME specification is a recent development in learnware specification design, which represents the distribution of a model's training data in a privacy-preserving manner.
+The RKME specification is a recent development in learnware specification design, which captures the data distribution while not disclosing the raw data .
-Within the ``learnware`` package, you will find two types of statistical specifications: ``regular specification`` and ``system specification``. The former is generated locally
-by users to express their model's statistical information, while the learnware market assigns the latter to accommodate and organize heterogeneous learnwares.
+There are two types of statistical specifications within the ``learnware`` package: ``regular specification`` and ``system specification``. The former is generated locally
+by users to express their model's statistical information. In contrast, the latter is generated by the learnware doc system to enhance learnware management and further characterizing the learnwares' capabilities.
Semantic Specification
-----------------------
@@ -37,8 +37,8 @@ Semantic Specification
The semantic specification consists of a "dict" structure that includes keywords "Data", "Task", "Library", "Scenario", "License", "Description", and "Name".
In the case of table learnwares, users should additionally provide descriptions for each feature dimension and output dimension through the "Input" and "Output" keywords.
-- If "data_type" is "Table", you need to specify the semantics of each dimension of the model's input data to make the uploaded learnware suitable for tasks with heterogeneous feature spaces.
-- If "task_type" is "Classification", you need to provide the semantics of model output labels (prediction labels start from 0), making the uploaded learnware suitable for classification tasks with heterogeneous output spaces.
+- If "data_type" is "Table", you need to specify the semantics of each dimension of the model's input data for compatibility with tasks in heterogeneous feature spaces.
+- If "task_type" is "Classification", you need to provide the semantics of model output labels (prediction labels start from 0) for use in classification tasks with heterogeneous output spaces.
- If "task_type" is "Regression", you need to specify the semantics of each dimension of the model output, making the uploaded learnware suitable for regression tasks with heterogeneous output spaces.
Regular Specification
@@ -56,7 +56,7 @@ as shown in the following code:
regular_spec = generate_stat_spec(type=data_type, x=train_x)
regular_spec.save("stat.json")
-It is worth noting that the above code only runs on the user's local computer and does not interact with cloud servers or leak local private data.
+It is worth noting that the above code only runs on the user's local computer and does not interact with cloud servers or leak local raw data.
.. note::
@@ -65,14 +65,17 @@ It is worth noting that the above code only runs on the user's local computer an
Table Specification
--------------------------
-The ``regular specification`` for tabular learnware is essentially the RKME specification of the model's training table data. No additional adjustment is needed.
+``RKMETableSpecification`` implements the RKME specification, which is the basis of tabular learnwares. It facilitates learnware identification and reuse for homogeneous tasks with identical input and output domains.
Image Specification
--------------------------
-Image data lives in a higher dimensional space than other data types. Unlike lower dimensional spaces, metrics defined based on Euclidean distances (or similar distances) will fail in higher dimensional spaces. This means that measuring the similarity between image samples becomes difficult.
+Image data lives in a higher dimensional space than other data types. Unlike lower dimensional spaces,
+metrics defined based on Euclidean distances (or similar distances) will fail in higher dimensional spaces.
+This means that measuring the similarity between image samples becomes difficult.
-To address these issues, we use the Neural Tangent Kernel (NTK) based on Convolutional Neural Networks (CNN) to measure the similarity of image samples. As we all know, CNN has greatly advanced the field of computer vision and is still a mainstream deep-learning technique.
+The specification for image data ``RKMEImageSpecification`` introduces a new kernel function that transforms images implicitly before RKME calculation.
+It employs the Neural Tangent Kernel (NTK) [1]_, a theoretical tool that characterizes the training dynamics of deep neural networks in the infinite width limit, to enhance the measurement of image similarity in high-dimensional spaces.
Usage & Example
^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -102,14 +105,14 @@ By randomly sampling a subset of the dataset, we can efficiently construct Image
spec = generate_rkme_image_spec(X, sample_size=5000)
spec.save("cifar10.json")
-Privacy Protection
+Raw Data Protection
^^^^^^^^^^^^^^^^^^^^^^^^^^
-In the third row of the figure, we show the eight pseudo-data with the largest weights`\beta` in the Image Specification generated on the CIFAR-10 dataset.
-Notice that the Image Specification generated based on Neural Tangent Kernel (NTK) protects the user's privacy very well.
+In the third row of the figure, we show the eight pseudo-data with the largest weights :math:`\beta` in the ``RKMEImageSpecification`` generated on the CIFAR-10 dataset.
+Notice that the ``RKMEImageSpecification`` generated based on Neural Tangent Kernel (NTK) doesn't compromise raw data security.
In contrast, we show the performance of the RBF kernel on image data in the first row of the figure below.
-The RBF not only exposes the real data (plotted in the corresponding position in the second row) but also fails to fully utilize the weights :math:`\beta`.
+The RBF not only exposes the original data (plotted in the corresponding position in the second row) but also fails to fully utilize the weights :math:`\beta`.
.. image:: ../_static/img/image_spec.png
:align: center
@@ -122,19 +125,16 @@ Different from tabular data, each text input is a string of different length, so
System Specification
======================================
-In contrast to ``regular specification``\ s, which are generated solely by users,
-``system specification``\ s are higher-level statistical specifications assigned by learnware markets
-to effectively accommodate and organize heterogeneous learnwares.
-This implies that ``regular specification``\ s are usually applicable across different markets, while ``system specification``\ s are generally closely associated
-with particular learnware market implementations.
+In addition to ``regular specification``\ s, the learnware doc system leverages its knowledge to generate new ``system specification``\ s for learnwares.
+The ``system specification`` module is automatically generated by the doc system. For newly inserted learnwares, the ``organizer`` generates new system specifications based on existing learnware statistical specifications to facilitate search operations and expand the search scope.
-``system specification`` plays a critical role in heterogeneous markets such as the ``Hetero Market``:
-- Learnware organizers use these specifications to connect isolated specification islands into unified "specification world"s.
-- Learnware searchers perform helpful learnware recommendations among all table learnwares in the market, leveraging the ``system specification``\ s generated for users.
+Currently, the ``learnware`` package has implemented the ``HeteroMapTableSpecification`` which enables learnwares organized by the ``Hetero Market`` to support tasks with varying feature and prediction spaces.
+This specification is derived by mapping the ``RKMETableSpecification`` to a unified semantic embedding space, utilizing the heterogenous engine which is a tabular network trained on feature semantics of all tabular learnwares.
+Please refer to `COMPONENTS: Hetero Market <../components/market.html#hetero-market>`_ for implementation details.
-The ``learnware`` package now includes a type of ``system specification``, named ``HeteroMapTableSpecification``, made especially for the ``Hetero Market`` implementation.
-This specification is automatically given to all table learnwares when they are added to the ``Hetero Market``.
-It is also set up to be updated periodically, ensuring it remains accurate as the learnware market evolves and builds more precise specification worlds.
-Please refer to `COMPONENTS: Hetero Market <../components/market.html#hetero-market>`_ for implementation details.
\ No newline at end of file
+References
+-----------
+
+.. [1] Adrià Garriga-Alonso, Laurence Aitchison, and Carl Edward Rasmussen. Deep convolutional networks as shallow gaussian processes. In *International Conference on Learning Representations*, 2019.
\ No newline at end of file
diff --git a/docs/index.rst b/docs/index.rst
index 946a5c5..0ea88af 100644
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -7,9 +7,11 @@
``Learnware`` Documentation
============================================================
-The ``learnware`` package provides a fundamental implementation of the central concepts and procedures for the learnware paradigm.
-A learnware is a well-performed trained machine learning model with a specification that enables it to be adequately identified to reuse according to the requirement of future users who may know nothing about the learnware in advance.
-The learnware paradigm is a new paradigm aimed at enabling users to reuse existed well-trained models to solve their AI tasks instead of starting from scratch.
+The ``learnware`` package provides a fundamental implementation of the central concepts and procedures and encompasses all processes within the *learnware paradigm*,
+including the submitting, usability testing, organization, identification, deployment and reuse of learnwares.
+Its well-structured design ensures high scalability and facilitates the seamless integration of additional features and techniques in the future.
+In addition, the ``learnware`` package serves as the core engine for the `Beimingwu System `_, which supports the computational and algorithmic aspects of ``Beimingwu``
+and offers rich algorithmic interfaces for learnware-related tasks and research experiments.
.. _user_guide:
diff --git a/docs/references/beimingwu.rst b/docs/references/beimingwu.rst
index e5f626b..a25a6b1 100644
--- a/docs/references/beimingwu.rst
+++ b/docs/references/beimingwu.rst
@@ -5,25 +5,25 @@ Beimingwu System
`Beimingwu `_ is the first systematic open-source implementation of learnware dock system, providing a preliminary research platform for learnware studies. Developers worldwide can submit their models freely to the learnware dock. They can generate specifications for the model with the help of Beimingwu without disclosing their raw data, and then the model and specification can be assembled into a learnware, which will be accommodated in the learnware dock. Future users can solve their tasks by submitting their requirements and reusing helpful learnwares returned by Beimingwu, while also not disclosing their own data. It is anticipated that after Beimingwu accumulates millions of learnwares, an "emergent" behavior may occur: machine learning tasks that have never been specifically tackled may be solved by assembling and reusing some existing learnwares.
-The ``learnware`` package is the cornerstone of the Beimingwu system, functioning as its core engine.
-It offers a comprehensive suite of central APIs that encompass a wide range of functionalities, including the submission, verification, organization, search, and deployment of learnware.
-This integration ensures a streamlined and efficient process, facilitating seamless interactions within the system.
+The ``learnware`` package serves as the core engine for the ``Beimingwu`` system, which supports the computational and algorithmic aspects of ``Beimingwu``.
+It offers a comprehensive suite of unified and scalable interfaces that encompass all processes within the learnware paradigm, including the submitting, usability testing, organization, management, identification, deployment and reuse of learnware.
+This integration ensures a streamlined and efficient process, facilitating seamless interactions within the system and provides a foundation for future research in organization, identification and reuse algorithms.
Core Features in the Beimingwu System
=======================================
The Beimingwu learnware dock system, serving as a preliminary research platform for learnware, systematically implements the core processes of the learnware paradigm for the first time:
-- ``Submitting Stage``: The system includes multiple detection mechanisms to ensure the quality of uploaded learnwares. Additionally, the system trains a heterogeneous engine based on existing learnware specifications in the system to merge different specification islands and assign new specifications to learnwares. With more learnwares are submitted, the heterogeneous engine will continue to update, achieving continuous iteration of learnware specifications and building a more precise specification world.
-- ``Deploying Stage``: After users upload task requirements, the system automatically selects whether to recommend a single learnware or multiple learnware combinations and provides efficient deployment methods. Whether it's a single learnware or a combination of multiple learnwares, the system offers convenient learnware reuse tools.
+- ``Submitting Stage``: The system includes multiple detection mechanisms to ensure the quality of uploaded learnwares. Additionally, the system trains a heterogeneous engine based on existing learnware specifications in the system to merge different specification islands and assign new specifications to learnwares. With the submission of more learnwares, the heterogeneous engine will continually update, aiming to construct a more precise specification world through the constant iteration of learnware specifications.
+- ``Deploying Stage``: After users upload task requirements, the system automatically selects whether to recommend a single learnware or multiple learnware combinations and provides efficient deployment methods. Whether it's a single learnware or a combination of multiple learnwares, the system offers baseline learnware reuse methods in a uniform format for convenient usage.
In addition, the Beimingwu system also has the following features:
-- ``Learnware Specification Generation``: The Beimingwu system provides specification generation interfaces in the learnware package, supporting various data types (tables, images, and text) for efficient local generation.
+- ``Learnware Specification Generation``: The Beimingwu system provides specification generation interfaces in the ``learnware`` package, supporting various data types (tables, images, and text) for efficient local generation.
- ``Learnware Quality Inspection``: The Beimingwu system includes multiple detection mechanisms to ensure the quality of each learnware in the system.
-- ``Diverse Learnware Search``: The Beimingwu system supports both semantic specifications and statistical specifications searches, covering data types such as tables, images, and text. In addition, for table-based tasks, the system also supports the search for heterogeneous table learnwares.
-- ``Local Learnware Deployment``: The Beimingwu system provides interfaces for learnware deployment and learnware reuse in the learnware package, facilitating users' convenient and secure learnware deployment.
-- ``Data Privacy Protection``: The Beimingwu system operations, including learnware upload, search, and deployment, do not require users to upload local data. All relevant statistical specifications are generated locally by users, ensuring data privacy.
-- ``Open Source System``: The Beimingwu system's source code is open-source, including the learnware package and frontend/backend code. The learnware package is highly extensible, making it easy to integrate new specification designs, learnware system designs, and learnware reuse methods in the future.
+- ``Diverse Learnware Search``: The Beimingwu system supports both semantic specifications and statistical specifications searches, covering data types such as tables, images, and text. In addition, for table-based tasks, the system preliminarily supports the search for heterogeneous table learnwares.
+- ``Local Learnware Deployment``: The Beimingwu system provides a unified interface for learnware deployment and learnware reuse in the ``learnware`` package, facilitating users' convenient and secure deployment and reuse of arbitrary learnwares.
+- ``Raw Data Protection``: The Beimingwu system operations, including learnware upload, search, and deployment, do not require users to upload raw data. All relevant statistical specifications are generated locally by users using open-source API.
+- ``Open Source System``: The Beimingwu system's source code is open-source, including the learnware package and frontend/backend code. The ``learnware`` package is highly extensible, making it easy to integrate new specification designs, learnware system designs, and learnware reuse methods in the future.
Building the learnware paradigm requires collective efforts from the community. As the first learnware dock system, Beimingwu is still in its early stages, with much room for improvement in related technologies. We sincerely invite the community to upload models, collaborate in system development, and engage in research and enhancements in learnware algorithms. Your valuable feedback is essential for the continuous improvement of the system.
\ No newline at end of file
diff --git a/docs/start/exp.rst b/docs/start/exp.rst
index b683929..b18888f 100644
--- a/docs/start/exp.rst
+++ b/docs/start/exp.rst
@@ -4,7 +4,7 @@
Experiments and Examples
================================
-This chapter will introduce related experiments to illustrate the search and reuse performance of our learnware system.
+In this section, we build various types of experimental scenarios and conduct extensive empirical study to evaluate the baseline algorithms, implemented and refined in the ``learnware`` package, for specification generation, learnware identification, and reuse on tabular, image, and text data.
Environment
====================
diff --git a/docs/start/intro.rst b/docs/start/intro.rst
index d1dcd6e..b29553b 100644
--- a/docs/start/intro.rst
+++ b/docs/start/intro.rst
@@ -5,9 +5,9 @@ Introduction
*Learnware* was proposed by Professor Zhi-Hua Zhou in 2016 [1, 2]. In the *learnware paradigm*, developers worldwide can share models with the *learnware dock system*, which effectively searches for and reuse learnware(s) to help users solve machine learning tasks efficiently without starting from scratch.
-The ``learnware`` package provides a fundamental implementation of the central concepts and procedures within the learnware paradigm. Its well-structured design ensures high scalability and facilitates the seamless integration of additional features and techniques in the future.
+The ``learnware`` package provides a fundamental implementation of the central concepts and procedures and encompasses all processes within the *learnware paradigm*, including the submitting, usability testing, organization, identification, deployment and reuse of learnwares. Its well-structured design ensures high scalability and facilitates the seamless integration of additional features and techniques in the future.
-In addition, the ``learnware`` package serves as the engine for the `Beimingwu System `_ and can be effectively employed for conducting experiments related to learnware.
+In addition, the ``learnware`` package serves as the core engine for the `Beimingwu System `_, which supports the computational and algorithmic aspects of ``Beimingwu`` and offers rich algorithmic interfaces for learnware-related tasks and research experiments.
| [1] Zhi-Hua Zhou. Learnware: on the future of machine learning. *Frontiers of Computer Science*, 2016, 10(4): 589–590
| [2] Zhi-Hua Zhou. Machine Learning: Development and Future. *Communications of CCF*, 2017, vol.13, no.1 (2016 CNCC keynote)
@@ -32,31 +32,30 @@ The Benefits of Learnware Paradigm
Machine learning has achieved great success in many fields but still faces various challenges, such as the need for extensive training data and advanced training techniques, the difficulty of continuous learning, the risk of catastrophic forgetting, and the leakage of data privacy.
-Although there are many efforts focusing on one of these issues separately, they are entangled, and solving one problem may exacerbate others. The learnware paradigm aims to address many of these challenges through a unified framework.
-
-+-----------------------+-----------------------------------------------------------------------------------------------+
-| Benefit | Description |
-+=======================+===============================================================================================+
-| Lack of training data | Strong models can be built with small data by adapting well-performed learnwares. |
-+-----------------------+-----------------------------------------------------------------------------------------------+
-| Lack of training | Ordinary users can obtain strong models by leveraging well-performed learnwares instead of |
-| skills | building models from scratch. |
-+-----------------------+-----------------------------------------------------------------------------------------------+
-| Catastrophic | Accepted learnwares are always stored in the learnware market, retaining old knowledge. |
-| forgetting | |
-+-----------------------+-----------------------------------------------------------------------------------------------+
-| Continual learning | The learnware market continually enriches its knowledge with constant submissions of |
-| | well-performed learnwares. |
-+-----------------------+-----------------------------------------------------------------------------------------------+
-| Data privacy/ | Developers only submit models, not data, preserving data privacy/proprietary. |
-| proprietary | |
-+-----------------------+-----------------------------------------------------------------------------------------------+
-| Unplanned tasks | Open to all legal developers, the learnware market can accommodate helpful learnwares for |
-| | various tasks. |
-+-----------------------+-----------------------------------------------------------------------------------------------+
-| Carbon emission | Assembling small models may offer good-enough performance, reducing interest in training |
-| | large models and the carbon footprint. |
-+-----------------------+-----------------------------------------------------------------------------------------------+
+Although many efforts focus on one of these issues separately, these efforts pay less attention to the fact that most issues are entangled in practice. The learnware paradigm aims to tackle many of these challenges through a unified framework:
+
++-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
+| Challenges | Learnware Paradigm Solutions |
++=======================+================================================================================================================================================================================+
+| Lack of training data | Strong models can be built with a small amount of data by refining well-performing learnwares. |
++-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
+| Lack of training | Users across all levels of expertise can adequately utilize numerous high-quality and potentially helpful learnwares |
+| skills | identified by the system for their specific tasks. |
++-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
+| Catastrophic | Learnwares which pass the usability checks are always stored in the learnware doc system, retaining old knowledge. |
+| forgetting | |
++-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
+| Continual learning | The learnware doc system continually expands its knowledge base with constant submissions of |
+| | well-performed learnwares. |
++-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
+| Data privacy/ | Developers worldwide freely share their high-performing models, without revealing their training data. |
+| proprietary | |
++-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
+| Unplanned tasks | Open to all legal developers, the learnware doc system accommodate helpful learnwares for |
+| | various tasks, especially for unplanned, specialized, data-sensitive scenarios. |
++-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
+| Carbon emission | By assembling the most suitable small learnwares, local deployment becomes feasible, offering a practical alternative to large cloud-based models and their carbon footprints. |
++-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
How to Solve Future Tasks with Learnware Paradigm?
----------------------------------------------------
@@ -64,13 +63,13 @@ How to Solve Future Tasks with Learnware Paradigm?
.. image:: ../_static/img/learnware_paradigm.jpg
:align: center
-Instead of building a model from scratch, users can submit their requirements to the learnware market, which then identifies and deploys helpful learnware(s) based on the specifications. Users can apply the learnware directly, adapt it using their data, or exploit it in other ways to improve their models. This process is more efficient and less expensive than building a model from scratch.
+When a user is going to solve a new machine learning task, she can submit her requirements to the learnware doc system, and then the system will identify and assemble some helpful learnware(s) from numerous learnwares to return to the user based on the learnware specifications. She can apply the learnware(s) directly, adapt them by her own data, or exploit it in other ways to improve her own model. No matter which learnware reuse mechanism is adopted, the whole process can be much less expensive and more efficient than building a model from scratch by herself.
Procedure of Learnware Paradigm
==================================
-- ``Submitting Stage``: Developers voluntarily submit various learnwares to the learnware market, and the system conducts quality checks and further organization of these learnwares.
-- ``Deploying Stage``: When users submit task requirements, the learnware market automatically selects whether to recommend a single learnware or a combination of multiple learnwares and provides efficient deployment methods. Whether it's a single learnware or a combination of multiple learnwares, the system offers convenient learnware reuse interfaces.
+- ``Submitting Stage``: Developers voluntarily submit various learnwares to the learnware doc system, and the system conducts quality checks and further organization of these learnwares.
+- ``Deploying Stage``: The user submits her task requirement to the learnware doc system, and the system will identify and return some helpful learnwares to the user based on specifications, which can be further reused on user data.
.. image:: ../_static/img/learnware_market.svg
:align: center
@@ -86,15 +85,15 @@ The architecture is designed based on the guidelines including *decoupling*, *au
- At the workflow level, the ``learnware`` package consists of ``Submitting Stage`` and ``Deploying Stage``.
-+---------------------+-------------------------------------------------------------------------------------------------------------------+
-| Module | Workflow |
-+=====================+===================================================================================================================+
-| ``Submitting Stage``| The learnware developers submit learnwares to the learnware market, which conducts usability checks and further |
-| | organization of these learnwares. |
-+---------------------+-------------------------------------------------------------------------------------------------------------------+
-| ``Deploying Stage`` | The `learnware` package identifies learnwares according to users’ task requirements and provides efficient |
-| | reuse and deployment methods. |
-+---------------------+-------------------------------------------------------------------------------------------------------------------+
++----------------------+---------------------------------------------------------------------------------------------------------------------+
+| Module | Workflow |
++======================+=====================================================================================================================+
+| ``Submitting Stage`` | The learnware developers submit learnwares to the learnware doc system, which conducts usability checks and further |
+| | organization of these learnwares. |
++----------------------+---------------------------------------------------------------------------------------------------------------------+
+| ``Deploying Stage`` | The `learnware` package identifies learnwares according to users’ task requirements and provides efficient |
+| | reuse and deployment methods. |
++----------------------+---------------------------------------------------------------------------------------------------------------------+
- At the module level, the ``learnware`` package is a platform that consists of ``Learnware``, ``Market``, ``Specification``, ``Model``, ``Reuse``, and ``Interface`` modules.
diff --git a/docs/start/quick.rst b/docs/start/quick.rst
index dc4fb59..d5b00cb 100644
--- a/docs/start/quick.rst
+++ b/docs/start/quick.rst
@@ -7,8 +7,7 @@ Quick Start
Introduction
====================
-This ``Quick Start`` guide aims to illustrate the straightforward process of establishing a full ``Learnware`` workflow
-and utilizing ``Learnware`` to handle user tasks.
+This ``Quick Start`` guide aims to illustrate the straightforward process of establishing a full ``Learnware`` workflow and utilizing ``Learnware`` to handle user tasks.
Installation
@@ -47,9 +46,8 @@ Learnware Package Workflow
Users can start a ``Learnware`` workflow according to the following steps:
Initialize a Learnware Market
--------------------------------
+------------------------------
-The ``EasyMarket`` class provides the core functions of a ``Learnware Market``.
You can initialize a basic ``Learnware Market`` named "demo" using the code snippet below:
.. code-block:: python
@@ -63,12 +61,9 @@ You can initialize a basic ``Learnware Market`` named "demo" using the code snip
Upload Leanware
-------------------------------
-Before uploading your learnware to the ``Learnware Market``,
-you'll need to create a semantic specification, ``semantic_spec``. This involves selecting or inputting values for predefined semantic tags
-to describe the features of your task and model.
+Before uploading your learnware to the ``Learnware Market``, you'll need to create a semantic specification, ``semantic_spec``. This involves selecting or inputting values for semantic tags to describe the features of your task and model.
-For instance, the following codes illustrates the semantic specification for a Scikit-Learn type model.
-This model is tailored for education scenarios and performs classification tasks on tabular data:
+For instance, the following code illustrates the semantic specification for a Scikit-Learn type model. This model is tailored for education scenarios and performs classification tasks on tabular data:
.. code-block:: python
@@ -83,8 +78,7 @@ This model is tailored for education scenarios and performs classification tasks
license="MIT",
)
-After defining the semantic specification,
-you can upload your learnware using a single line of code:
+After preparing the semantic specification, you can insert your learnware into the ``Learnware Market`` using a single line of code:
.. code-block:: python
@@ -96,8 +90,7 @@ Here, ``zip_path`` is the directory of your learnware ``zip`` package.
Semantic Specification Search
-------------------------------
-To find learnwares that align with your task's purpose, you'll need to provide a semantic specification, ``user_semantic``, that outlines your task's characteristics.
-The ``Learnware Market`` will then perform an initial search using ``user_semantic``, identifying potentially useful learnwares with models that solve tasks similar to your requirements.
+To identify learnwares that align with your task's purpose, you'll need to provide a semantic specification, ``user_semantic``, that outlines your task's characteristics. The ``Learnware Market`` will then perform an initial search based on ``user_semantic``, which filters learnwares by considering the semantic information of your task.
.. code-block:: python
@@ -105,7 +98,7 @@ The ``Learnware Market`` will then perform an initial search using ``user_semant
user_info = BaseUserInfo(id="user", semantic_spec=semantic_spec)
# search_learnware: performs semantic specification search when user_info doesn't include a statistical specification
- search_result = easy_market.search_learnware(user_info)
+ search_result = demo_market.search_learnware(user_info)
single_result = search_results.get_single_results()
# single_result: the List of Tuple[Score, Learnware] returned by semantic specification search
@@ -115,9 +108,7 @@ The ``Learnware Market`` will then perform an initial search using ``user_semant
Statistical Specification Search
---------------------------------
-If you decide in favor of porviding your own statistical specification file, ``stat.json``,
-the ``Learnware Market`` can further refine the selection of learnwares from the previous step.
-This second-stage search leverages statistical information to identify one or more learnwares that are most likely to be beneficial for your task.
+If you generate and provide a statistical specification file ``rkme.json``, the ``Learnware Market`` will conduct learnware identification based on statistical information, and return more targeted models. Using the API we provided, you can easily generate this statistical specification locally.
For example, the code below executes learnware search when using Reduced Kernel Mean Embedding (RKME) as the statistical specification:
@@ -132,7 +123,7 @@ For example, the code below executes learnware search when using Reduced Kernel
user_info = BaseUserInfo(
semantic_spec=user_semantic, stat_info={"RKMETableSpecification": user_spec}
)
- search_result = easy_market.search_learnware(user_info)
+ search_result = demo_market.search_learnware(user_info)
single_result = search_results.get_single_results()
multiple_result = search_results.get_multiple_results()
@@ -153,31 +144,30 @@ For example, the code below executes learnware search when using Reduced Kernel
Reuse Learnwares
-------------------------------
-With the list of learnwares, ``mixture_learnware_list``, returned from the previous step, you can readily apply them to make predictions on your own data, bypassing the need to train a model from scratch.
-We offer provide two methods for reusing a given list of learnwares: ``JobSelectorReuser`` and ``AveragingReuser``.
-Just substitute ``test_x`` in the code snippet below with your own testing data, and you're all set to reuse learnwares:
+We offer two data-free methods ``JobSelectorReuser`` and ``AveragingReuser`` for reusing a given list of learnwares. Please substitute ``test_x`` in the code snippet below with your own testing data:
.. code-block:: python
from learnware.reuse import JobSelectorReuser, AveragingReuser
- # using jobselector reuser to reuse the searched learnwares to make prediction
+ # Use job selector reuser to reuse the searched learnwares to make prediction
reuse_job_selector = JobSelectorReuser(learnware_list=mixture_item.learnwares)
job_selector_predict_y = reuse_job_selector.predict(user_data=test_x)
- # using averaging ensemble reuser to reuse the searched learnwares to make prediction
+ # Use averaging ensemble reuser to reuse the searched learnwares to make prediction
reuse_ensemble = AveragingReuser(learnware_list=mixture_item.learnwares)
ensemble_predict_y = reuse_ensemble.predict(user_data=test_x)
-We also provide two method when the user has labeled data for reusing a given list of learnwares: ``EnsemblePruningReuser`` and ``FeatureAugmentReuser``.
-Just substitute ``test_x`` in the code snippet below with your own testing data, and substitute ``train_X, train_y`` with your own training labeled data, and you're all set to reuse learnwares:
+We also provide two data-dependent methods: ``EnsemblePruningReuser`` and ``FeatureAugmentReuser``, when the user has minor labeled data for refining a given list of learnwares. Here's an example for adopting multiple returned learnwares by labeled data to solve classification tasks:
.. code-block:: python
from learnware.reuse import EnsemblePruningReuser, FeatureAugmentReuser
# Use ensemble pruning reuser to reuse the searched learnwares to make prediction
+ # (train_x, train_y) is the small amount of labeled data
+ # `mode` has two options "classification" and "regression"
reuse_ensemble = EnsemblePruningReuser(learnware_list=mixture_item.learnwares, mode="classification")
reuse_ensemble.fit(train_X, train_y)
ensemble_pruning_predict_y = reuse_ensemble.predict(user_data=data_X)
@@ -190,6 +180,5 @@ Just substitute ``test_x`` in the code snippet below with your own testing data,
Auto Workflow Example
============================
-The ``Learnware`` also offers automated workflow examples.
-This includes preparing learnwares, uploading and deleting learnwares from the market, and searching for learnwares using both semantic and statistical specifications.
+The ``Learnware`` also offers automated workflow examples. This includes preparing learnwares, uploading and deleting learnwares from the market, and searching for learnwares using both semantic and statistical specifications.
To experience the basic workflow of the Learnware Market, please refer to `Learnware Examples `_.
diff --git a/docs/workflows/client.rst b/docs/workflows/client.rst
index 0482a3c..45a34f4 100644
--- a/docs/workflows/client.rst
+++ b/docs/workflows/client.rst
@@ -6,7 +6,7 @@ Learnware Client
Introduction
====================
-``Learnware Client`` is a ``Python API`` that provides a convenient interface for interacting with the ``BeimingWu`` system. You can easily use the client to upload, download, delete, update, and search learnwares.
+``Learnware Client`` is a ``Python API`` that provides a convenient interface for interacting with the ``Beimingwu`` system. You can easily use the client to upload, download, delete, update, and search learnwares.
Prepare access token
@@ -36,10 +36,12 @@ Where email is the registered mailbox of the system and token is the token obtai
Upload Leanware
-------------------------------
-Before uploading a learnware, you'll need to prepare the semantic specification of your learnware. Let's take the classification task for tabular data as an example. You can create a semantic specification by a helper function ``create_semantic_specification``.
+Before uploading a learnware, you'll need to prepare the semantic specification of your learnware. Let's take the classification task for tabular data as an example. You can create a semantic specification by a helper function ``generate_semantic_spec``.
.. code-block:: python
+ from learnware.specification import generate_semantics_spec
+
# Prepare input description when data_type="Table"
input_description = {
"Dimension": 5,
@@ -63,7 +65,7 @@ Before uploading a learnware, you'll need to prepare the semantic specification
}
# Create semantic specification
- semantic_spec = client.create_semantic_specification(
+ semantic_spec = generate_semantic_spec(
name="learnware_example",
description="Just a example for uploading a learnware",
data_type="Table",
@@ -75,7 +77,7 @@ Before uploading a learnware, you'll need to prepare the semantic specification
output_description=output_description,
)
-Ensure that the input parameters for the semantic specification fall within the specified ranges provided by ``client.list_semantic_specification_values(key)``:
+Please ensure that the input parameters for the semantic specification fall within the specified ranges provided by ``client.list_semantic_specification_values(key)``:
* "data_type" must be within the range of ``key=SemanticSpecificationKey.DATA_TYPE``.
* "task_type" must be within the range of ``key=SemanticSpecificationKey.TASK_TYPE``.
@@ -87,7 +89,7 @@ Ensure that the input parameters for the semantic specification fall within the
Finally, the semantic specification and the zip package path of the learnware were filled in to upload the learnware.
-Remember to verify the learnware before uploading it, as shown in the following code example:
+Remember to validate your learnware before uploading it, as shown in the following code example:
.. code-block:: python
@@ -104,7 +106,7 @@ Remember to verify the learnware before uploading it, as shown in the following
learnware_zip_path=zip_path, semantic_specification=semantic_spec
)
-After uploading the learnware successfully, you can see it in ``My Learnware``, the background will check it. Click on the learnware, which can be viewed in the ``Verify Status``. After the check passes, the Unverified tag of the learnware will disappear, and the uploaded learnware will appear in the system.
+After uploading the learnware successfully, you can see it in ``Personal Information - My Learnware``, the background will check it. Click on the learnware, which can be viewed in the ``Verify Status``. After the check passes, the Unverified tag of the learnware will disappear, and the uploaded learnware will appear in the system.
Update Learnware
-------------------------------
@@ -153,37 +155,40 @@ The ``delete_learnware`` method is used to delete a learnware from the server.
Semantic Specification Search
-------------------------------
-You can search the learnware in the system through the semantic specification, and all the learnware conforming to the semantic specification will be returned through the API. For example, the following code will give you all the learnware in the system whose task type is classified:
+You can search for learnware(s) in the system through semantic specifications, and all learnwares that meet the semantic specifications will be returned via the API. For example, the following code retrieves all learnware in the system with a task type of "Classification":
.. code-block:: python
from learnware.market import BaseUserInfo
- user_semantic = client.create_semantic_specification(
+ user_semantic = generate_semantic_spec(
task_type="Classification"
)
user_info = BaseUserInfo(semantic_spec=user_semantic)
- learnware_list = client.search_learnware(user_info, page_size=None)
-
+ search_result = client.search_learnware(user_info)
+
Statistical Specification Search
---------------------------------
-You can also search the learnware in the system through the statistical specification, and all the learnware with similar distribution will be returned through the API. Using the ``generate_stat_spec`` function mentioned above, you can easily get the ``stat_spec`` for your current task, and then get the learnware that meets the statistical specification for the same type of data in the system by using the following code:
+Moreover, you can also search for learnware(s) in the learnware dock system through statistical specifications, and more targeted learnwares for your task will be returned through the API. Using the ``generate_stat_spec`` function mentioned above, you can generate your task's statistical specification ``stat_spec``. Then, you can use the following code to easily obtain suitable learnware(s) identified by the system for your specific task:
.. code-block:: python
user_info = BaseUserInfo(stat_info={stat_spec.type: stat_spec})
- learnware_list = client.search_learnware(user_info, page_size=None)
+ search_result = client.search_learnware(user_info)
Combine Semantic and Statistical Search
----------------------------------------
-By combining statistical and semantic specifications, you can perform more detailed searches, such as the following code that searches tabular data for pieces of learnware that satisfy your semantic specifications:
+
+By combining both semantic and statistical specifications, you can perform more accurate searches. For instance, the code below demonstrates how to search for learnware(s) in tabular data that satisfy both the semantic and statistical specifications:
.. code-block:: python
- user_semantic = client.create_semantic_specification(
+ from learnware.specification import generate_stat_spec
+
+ user_semantic = generate_semantic_spec(
task_type="Classification",
scenarios=["Business"],
)
@@ -191,11 +196,12 @@ By combining statistical and semantic specifications, you can perform more detai
user_info = BaseUserInfo(
semantic_spec=user_semantic, stat_info={rkme_table.type: rkme_table}
)
- learnware_list = client.search_learnware(user_info, page_size=None)
+ search_result = client.search_learnware(user_info)
+
Heterogeneous Table Search
----------------------------------------
-When you provide a statistical specification for tabular data, the task type is "Classification" or "Regression", and your semantic specification includes descriptions for each dimension, the system will automatically enable heterogeneous table search. It won't only search in the tabular learnwares with same dimensions. The following code will perform heterogeneous table search through the API:
+For tabular tasks, if the task type is "Classification" or "Regression", and you have provided a statistical specification along with descriptions for each feature dimension in the semantic specification, the system will enable heterogeneous table search. This is designed to support searching models from different feature spaces preliminarily. The following code example shows how to perform a heterogeneous table search via the API:
.. code-block:: python
@@ -206,7 +212,7 @@ When you provide a statistical specification for tabular data, the task type is
"1": "leaf length",
},
}
- user_semantic = client.create_semantic_specification(
+ user_semantic = generate_semantic_spec(
task_type="Classification",
scenarios=["Business"],
input_description=input_description,
@@ -215,19 +221,18 @@ When you provide a statistical specification for tabular data, the task type is
user_info = BaseUserInfo(
semantic_spec=user_semantic, stat_info={rkme_table.type: rkme_table}
)
- learnware_list = client.search_learnware(user_info)
+ search_result = client.search_learnware(user_info)
Download and Use Learnware
-------------------------------
-When the search is complete, you can download the learnware and configure the environment through the following code:
+After the learnware search is completed, you can locally load and use the learnwares through the learnware IDs in ``search_result``, as shown in the following example:
.. code-block:: python
- for temp_learnware in learnware_list:
- learnware_id = temp_learnware["learnware_id"]
-
- # you can use the learnware to make prediction now
- learnware = client.load_learnware(
- learnware_id=learnware_id, runnable_option="conda"
- )
\ No newline at end of file
+ learnware_id = search_result["single"]["learnware_ids"][0]
+ learnware = client.load_learnware(
+ learnware_id=learnware_id, runnable_option="conda"
+ )
+ # test_x is the user's data for prediction
+ predict_y = learnware.predict(test_x)
\ No newline at end of file
diff --git a/docs/workflows/reuse.rst b/docs/workflows/reuse.rst
index ef729d0..2c81bc5 100644
--- a/docs/workflows/reuse.rst
+++ b/docs/workflows/reuse.rst
@@ -2,9 +2,10 @@
Learnwares Reuse
==========================================
-``Learnware Reuser`` is a ``Python API`` that offers a variety of convenient tools for learnware reuse. Users can reuse a single learnware, combination of multiple learnwares,
-and heterogeneous learnwares using these tools efficiently, thereby saving the laborious time and effort of building models from scratch. There are mainly two types of
-reuse tools, based on whether user has gathered a small amount of labeled data beforehand: (1) data-free reuser and (2) data-dependent reuser.
+``Learnware Reuser`` is a core module providing various basic reuse methods for convenient learnware reuse.
+Users can efficiently reuse a single learnware, combination of multiple learnwares,
+and heterogeneous learnwares using these methods.
+There are two main categories of reuse methods: (1) data-free reusers which reuse learnwares directly and (2) data-dependent reusers which reuse learnwares with a small amount of labeled data.
.. note::
@@ -40,7 +41,7 @@ Data-Free Reuser
# predict_y is the prediction result of the reused learnwares
predict_y = reuse_job_selector.predict(user_data=test_x)
-- ``AveragingReuser`` uses an ensemble method to make predictions. The ``mode`` parameter specifies the specific ensemble method:
+- ``AveragingReuser`` uses an ensemble method to make predictions. The ``mode`` parameter specifies the type of ensemble method:
.. code:: python
@@ -61,9 +62,9 @@ Data-Free Reuser
Data-Dependent Reuser
------------------------------------
-When users have a small amount of labeled data, they can also adapt/polish the received learnware(s) by reusing them with the labeled data, gaining even better performance.
+When users have minor labeled data, they can also adapt the received learnware(s) by reusing them with the labeled data.
-- ``EnsemblePruningReuser`` selectively ensembles a subset of learnwares to choose the ones that are most suitable for the user's task:
+- ``EnsemblePruningReuser`` selects a subset of suitable learnwares using a multi-objective evolutionary algorithm and uses an average ensemble for prediction:
.. code:: python
@@ -79,7 +80,7 @@ When users have a small amount of labeled data, they can also adapt/polish the r
reuse_ensemble_pruning.fit(val_X, val_y)
predict_y = reuse_job_selector.predict(user_data=test_x)
-- ``FeatureAugmentReuser`` helps users reuse learnwares by augmenting features. This reuser regards each received learnware as a feature augmentor, taking its output as a new feature and then build a simple model on the augmented feature set(``logistic regression`` for classification tasks and ``ridge regression`` for regression tasks):
+- ``FeatureAugmentReuser`` assists in reusing learnwares by augmenting features. It concatenates the output of the original learnware with the user's task features, creating enhanced labeled data, on which a simple model is then trained (logistic regression for classification tasks and ridge regression for regression tasks):
.. code:: python
@@ -99,12 +100,14 @@ When users have a small amount of labeled data, they can also adapt/polish the r
Hetero Reuse
====================
-When heterogeneous learnware search is activated(see `WORKFLOWS: Hetero Search <../workflows/search.html#hetero-search>`_), users would receive heterogeneous learnwares which are identified from the whole "specification world".
-Though these recommended learnwares are trained from tasks with different feature/label spaces from the user's task, they can still be helpful and perform well beyond their original purpose.
-Normally these learnwares are hard to be used, leave alone polished by users, due to the feature/label space heterogeneity. However with the help of ``HeteroMapAlignLearnware`` class which align heterogeneous learnware
-with the user's task, users can easily reuse them with the same set of reuse methods mentioned above.
+When heterogeneous learnware search is activated,
+users receive potentially helpful heterogeneous learnwares which are identified from the whole "specification world"(see `WORKFLOWS: Hetero Search <../workflows/search.html#hetero-search>`_).
+Normally, these learnwares cannot be directly applied to their tasks due to discrepancies in input and prediction spaces.
+Nevertheless, the ``learnware`` package facilitates the reuse of heterogeneous learnwares through ``HeteroMapAlignLearnware``,
+which aligns the input and output domain of learnwares to match those of the users' tasks.
+These feature-aligned learnwares can then be utilized with either data-free reusers or data-dependent reusers.
-During the alignment process of heterogeneous learnware, the statistical specifications of the learnware and the user's task ``(user_spec)`` are used for input space alignment,
+During the alignment process of a heterogeneous learnware, the statistical specifications of the learnware and the user's task ``(user_spec)`` are used for input space alignment,
and a small amount of labeled data ``(val_x, val_y)`` is mandatory to be used for output space alignment. This can be done by the following code:
.. code:: python
@@ -120,7 +123,7 @@ and a small amount of labeled data ``(val_x, val_y)`` is mandatory to be used fo
predict_y = hetero_learnware.predict(user_data=test_x)
To reuse multiple heterogeneous learnwares,
-combine ``HeteroMapAlignLearnware`` with the homogeneous reuse methods ``AveragingReuser`` and ``EnsemblePruningReuser`` mentioned above will do the trick:
+combine ``HeteroMapAlignLearnware`` with the homogeneous reuse methods ``AveragingReuser`` and ``EnsemblePruningReuser`` mentioned above:
.. code:: python
@@ -157,7 +160,7 @@ Run the following codes to try run a learnware with ``Model Container``:
learnware = env_container.get_learnwares_with_container()[0]
print(learnware.predict(test_x))
-The ``mode`` parameter has two options, each for a specific learnware environment loading method:
+The ``mode`` parameter includes two options, each corresponding to a specific learnware environment loading method:
- ``'conda'``: Install a separate conda virtual environment for each learnware (automatically deleted after execution); run each learnware independently within its virtual environment.
- ``'docker'``: Install a conda virtual environment inside a Docker container (automatically destroyed after execution); run each learnware independently within the container (requires Docker privileges).
diff --git a/docs/workflows/search.rst b/docs/workflows/search.rst
index d4491c5..506752b 100644
--- a/docs/workflows/search.rst
+++ b/docs/workflows/search.rst
@@ -2,75 +2,81 @@
Learnwares Search
============================================================
-``Learnware Searcher`` is a key component of ``Learnware Market`` that identifies and recommends helpful learnwares to users according to their ``UserInfo``. Based on whether the returned learnware dimensions are consistent with user tasks, the searchers can be divided into two categories: homogeneous searchers and heterogeneous searchers.
+``Learnware Searcher`` is a key module within the ``Learnware Market`` that identifies and recommends helpful learnwares to users according to their user information. The ``learnware`` package currently provide two types of learnware searchers:
-All the searchers are implemented as a subclass of ``BaseSearcher``. When initializing, you should assign a ``organizer`` to it. The introduction of ``organizer`` is shown in `COMPONENTS: Market - Framework <../components/market.html>`_. Then these searchers can be called with ``UserInfo`` and return ``SearchResults``.
+- homogeneous searchers conduct homogeneous learnware identification and return helpful learnware(s) within the same feature space as the user's task;
+- heterogenous searchers preliminarily support heterogenous learnware identification for tabular tasks, which broaden the search scope and return targeted learnware(s) from different feature spaces.
+
+All the searchers are implemented as a subclass of ``BaseSearcher``. When initializing, an ``organizer`` should be assigned to it.
+The introduction of ``organizer`` is shown in `COMPONENTS: Market - Framework <../components/market.html>`_.
+Then, these searchers can be invoked with user information provided in ``BaseUserInfo``, and they will return ``SearchResults`` containing identification results.
Homo Search
======================
-The homogeneous search of helpful learnwares can be divided into two stages: semantic specification search and statistical specification search. Both of them needs ``BaseUserInfo`` as input. The following codes shows how to use the searcher to search for helpful learnwares from a market ``easy_market`` for a user. The introduction of ``EasyMarket`` is in `COMPONENTS: Market <../components/market.html>`_.
+The homogeneous search of helpful learnwares can be divided into two stages: semantic specification search and statistical specification search. Both of them needs ``BaseUserInfo`` as input.
+The following codes shows how to use the searcher to search for helpful learnwares from a market ``easy_market`` for a user.
+The introduction of ``EasyMarket`` is in `COMPONENTS: Market <../components/market.html>`_.
.. code-block:: python
+ from learnware.market import BaseUserInfo, instantiate_learnware_market
+ from learnware.specification import generate_semantic_spec, generate_stat_spec
+
+ easy_market = instantiate_learnware_market(market_id="demo", name="easy", rebuild=True)
+
# generate BaseUserInfo(semantic_spec + stat_info)
- user_semantic = {
- "Data": {"Values": ["Table"], "Type": "Class"},
- "Task": {"Values": ["Regression"], "Type": "Class"},
- "Library": {"Values": ["Scikit-learn"], "Type": "Class"},
- "Scenario": {"Values": ["Business"], "Type": "Tag"},
- "Description": {"Values": "", "Type": "String"},
- "Name": {"Values": "", "Type": "String"},
- "Input": {"Dimension": 82, "Description": {},},
- "Output": {"Dimension": 1, "Description": {},},
- "License": {"Values": ["MIT"], "Type": "Class"},
- }
- user_spec = generate_rkme_table_spec(X=x)
+ user_semantic = generate_semantic_spec(
+ task_type="Classification",
+ scenarios=["Business"],
+ )
+ rkme_table = generate_stat_spec(type="table", X=train_x)
user_info = BaseUserInfo(
- semantic_spec=user_semantic,
- stat_info={"RKMETableSpecification": user_spec}
+ semantic_spec=user_semantic, stat_info={rkme_table.type: rkme_table}
)
-
- # search the market for the user
search_result = easy_market.search_learnware(user_info)
- # search result: single_result
- single_result = search_result.get_single_results()
- print(f"single model num: {len(single_result)},
- max_score: {single_result[0].score},
- min_score: {single_result[-1].score}"
- )
-
- # search result: multiple_result
- multiple_result = search_result.get_multiple_results()
- mixture_id = " ".join([learnware.id for learnware in multiple_result[0].learnwares])
- print(f"mixture_score: {multiple_result[0].score}, mixture_learnwares: {mixture_id}")
+In the above code, ``search_result`` is of type dict, with the following specific structure (``"single"`` and ``"multiple"`` correspond to the search results for a single learnware and multiple learnwares, respectively):
+
+.. code-block:: python
+
+ search_result = {
+ "single": {
+ "learnware_ids": List[str],
+ "semantic_specifications": List[dict],
+ "matching": List[float],
+ },
+ "multiple": {
+ "learnware_ids": List[str],
+ "semantic_specifications": List[dict],
+ "matching": float,
+ },
+ }
Hetero Search
======================
-For table-based user tasks,
-homogeneous searchers like ``EasySearcher`` fail to recommend learnwares when no table learnware matches the user task's feature dimension, returning empty results.
-To enhance functionality, the ``learnware`` package includes the heterogeneous learnware search feature, whose processions is as follows:
+For tabular tasks, homogeneous searchers like ``EasySearcher`` may fail to recommend learnwares if no table learnware shares the same feature space as the user's task, resulting in empty returns. The ``learnware`` package preliminarily supports the search of learnwares from different feature spaces through heterogeneous searchers. The process is as follows:
-- Learnware markets such as ``Hetero Market`` integrate different specification islands into a unified "specification world" by assigning system-level specifications to all learnwares. This allows heterogeneous searchers like ``HeteroSearcher`` to find helpful learnwares from all available table learnwares.
-- Searchers assign system-level specifications to users based on ``UserInfo``'s statistical specification, using methods provided by corresponding organizers. In ``Hetero Market``, for example, ``HeteroOrganizer.generate_hetero_map_spec`` generates system-level specifications for users.
-- Finally searchers conduct statistical specification search across the "specification world". User's system-level specification will guide the searcher in pinpointing helpful heterogeneous learnwares.
+- Learnware markets such as ``Hetero Market`` integrate different tabular specification islands into a unified "specification world" by generating new system specifications for learnwares. This allows heterogeneous searchers like ``HeteroSearcher`` to recommend tabular learnwares from the entire learnware collection.
+- Based on their statistical specifications, users receive new specifications assigned by searchers, which employ methods from the respective organizers. For instance, in ``Hetero Market``, ``HeteroOrganizer.generate_hetero_map_spec`` is used to generate new specifications for users.
+- Finally searchers conduct statistical specification search across the unified "specification world" based on users' new specifications and return potentially targeted heterogeneous learnwares.
-To activate heterogeneous learnware search, ``UserInfo`` should contain both semantic and statistical specifications. What's more, the semantic specification should meet the following requirements:
+To activate heterogeneous learnware search, ``UserInfo`` needs to include both semantic and statistical specifications. Furthermore, the semantic specification should meet the following requirements:
- The task type should be ``Classification`` or ``Regression``.
- The data type should be ``Table``.
-- It should include description for at least one feature dimension.
-- The feature dimension stated here should match with the feature dimension in the statistical specification.
+- There should be a description for at least one feature dimension.
+- The feature dimension mentioned here must align with that in the statistical specification.
+
+The code below demonstrates how to search for potentially useful heterogeneous learnwares from a market ``hetero_market`` for a user.
+For more information about ``HeteroMarket``, see `COMPONENTS: Hetero Market <../components/market.html#hetero-market>`_.
-The following codes shows how to search for helpful heterogeneous learnwares from a market
-``hetero_market`` for a user. The introduction of ``HeteroMarket`` is in `COMPONENTS: Hetero Market <../components/market.html#hetero-market>`_.
.. code-block:: python
# initiate a Hetero Market
- hetero_market = initiate_learnware_market(market_id="test_hetero", name="hetero")
+ hetero_market = initiate_learnware_market(market_id="demo", name="hetero", rebuild=True)
# user_semantic should meet the above requirements
input_description = {
diff --git a/docs/workflows/upload.rst b/docs/workflows/upload.rst
index 382843a..5a0d9d7 100644
--- a/docs/workflows/upload.rst
+++ b/docs/workflows/upload.rst
@@ -26,7 +26,7 @@ Model Invocation File ``__init__.py``
To ensure that the uploaded learnware can be used by subsequent users, you need to provide interfaces for model fitting ``fit(X, y)``, prediction ``predict(X)``, and fine-tuning ``finetune(X, y)`` in ``__init__.py``. Among these interfaces, only the ```predict(X)``` interface is mandatory, while the others depend on the functionality of your model.
-Below is a reference template for the ```__init__.py``` file. Please make sure that the input parameter format (the number of parameters and parameter names) for each interface in your model invocation file matches the template below.
+Below is a reference template for the ``__init__.py`` file. Please make sure that the input parameter format (the number of parameters and parameter names) for each interface in your model invocation file matches the template below.
.. code-block:: python
@@ -250,7 +250,7 @@ For more details, please refer to :ref:`semantic specification