[MNT] merge main branch

2 years ago · a108ef950b
--- a/.gitignore
+++ b/.gitignore
@@ -43,4 +43,5 @@ cache/
 tmp/
 learnware_pool/
 PFS/
 data/
 data/
 examples/results/
--- a/docs/_static/img/image_labeled.png
+++ b/docs/_static/img/image_labeled.png
--- a/docs/_static/img/table_hetero_labeled.png
+++ b/docs/_static/img/table_hetero_labeled.png
--- a/docs/_static/img/table_homo_labeled.png
+++ b/docs/_static/img/table_homo_labeled.png
--- a/docs/_static/img/text_labeled_curves.png
+++ b/docs/_static/img/text_labeled_curves.png
--- a/docs/advanced/evolve.rst
+++ b/docs/advanced/evolve.rst
@@ -1,5 +1,5 @@
 ==============================
 Specification evolvement
 Specification Evolvement
 ==============================

 The specification is the core of the learnware paradigm.
--- a/docs/components/learnware.rst
+++ b/docs/components/learnware.rst
@@ -1,87 +1,138 @@
 .. _learnware:

 ==========================================
 Learnware & Reuser
 ==========================================

 Learnware and Reuser are related...
 ``Learnware`` is the most basic concept in the ``learnware paradigm``. In this section, we will introduce the concept and design of ``learnware`` and its extension for ``Hetero Reuse``. Then we will introduce the ``Reuse Methods``, which applies one or several ``learnware``\ s to solve the user's task.

 Concepts
 ===================
 The learnware paradiam, first introduced by Zhi-Hua Zhou, is defined as a proficiently trained machine learning model accompanied by a specification that allows future users with no prior knowledge of the learnware to identify and reuse it according to their needs.
 In the learnware paradigm, a learnware is a well-performed trained machine learning model with a specification which enables it to be adequately identified to reuse according to the requirement of future users who know nothing about the learnware in advance. The introduction of specifications are shown in `COMPONENTS: Specification <./spec.html>`_.

 Developers or owners of trained machine learning models can voluntarily submit their models to a learnware marketplace. If the marketplace accepts the model, it assigns a specification to the model and makes it available in the marketplace.
 In our implementation, the class ``Learnware`` has 3 important member variables:

 Utilizing Learnware in Practice
 -------------------------------
 - ``id``: The learnware id is generated by ``market``.
 - ``model``: The model in the learnware, can be a ``BaseModel`` or a dict including model name and path. When it is a dict, the function ``Learnware.instantiate_model`` is used to transform it to a ``BaseModel``. The function ``Learnware.predict`` use the model to predict for an input ``X``. See more in `COMPONENTS: Model <./model.html>`_.
 - ``specification``: The specification including the semantic specification and the statistic specification.

 With a learnware marketplace in place, users can tackle machine learning tasks without having to create models from scratch. 
 Learnware for Hetero Reuse (Feature Align + Hetero Map Learnware)
 =======================================================================

 Addressing Concerns with Learnware
 ----------------------------------
 In the Hetero Market(see `COMPONENTS: Hetero Market  <./market.html#hetero-market>`_ for details), ``HeteroSearcher`` identifies and recommends helpful learnwares among all learnwares in the market, 
 including learnwares with feature/label spaces different from the user's task requirements(heterogeneous learnwares). ``FeatureAlignLearnware`` and ``HeteroMapLearnware``
 are designed to enable the reuse of heterogeneous learnwares, which extends ``Learnware`` with the ability to align the feature space and label space of the learnware to the user's task requirements, 
 and provide basic interfaces for heterogeneous learnwares to be applied to tasks beyond their original purposes.

 ``FeatureAlignLearnware``
 ---------------------------

 ``FeatureAlignLearnware`` employs a neural network to align the feature space of the learnware to the user's task. 
 It is initialized with a ``Learnware``, and has the following methods to expand the applicable scope of this ``Learnware``:

 - **align**: Trains a neural network to align ``user_rkme``, which is the ``RKMETableSpecification`` of the user's data, with the learnware's statistical specification.
 - **predict**: Predict the output for user data using the trained neural network and the original learnware's model.


 ``HeteroMapAlignLearnware``
 -----------------------------

 If user data is not only heterogeneous in feature space but also in label space, ``HeteroMapAlignLearnware`` uses the help of 
 a small amount of labeled data ``(x_train, y_train)`` required from the user task to align heterogeneous learnwares with the user task.
 There are two key interfaces in ``HeteroMapAlignLearnware``:

 - ``HeteroMapAlignLearnware.align(self, user_rkme: RKMETableSpecification, x_train: np.ndarray, y_train: np.ndarray)``

    - **input space alignment**: Align the feature space of the learnware to the user task's statistical specification ``user_rkme`` using ``FeatureAlignLearnware``.
    - **output space alignment**: Further align the label space of the aligned learnware to the user task through supervised learning of ``FeatureAugmentReuser`` using ``(x_train, y_train)``.

 - ``HeteroMapAlignLearnware.predict(self, user_data)``

    - If input space and output space alignment are both performed, use the ``FeatureAugmentReuser`` to predict the output for ``user_data``.

 The learnware approach aims to address several challenges:


 +------------------------+----------------------------------------------------------------------------------------+
 | Concern                | Solution                                                                               |
 +========================+========================================================================================+
 | Limited training data  | Use existing high-quality learnware and require only a small amount of data for        |
 |                        | adaptation or refinement.                                                              |
 +------------------------+----------------------------------------------------------------------------------------+
 | Lack of training skills| Leverage existing learnware instead of building a model from scratch.                  |
 +------------------------+----------------------------------------------------------------------------------------+
 | Catastrophic forgetting| Retain old knowledge in the marketplace as accepted learnware remain available.        |
 +------------------------+----------------------------------------------------------------------------------------+
 | Continual learning     | Facilitate continuous and lifelong learning with the constant influx of high-quality   |
 |                        | learnware, enriching the knowledge base.                                               |
 +------------------------+----------------------------------------------------------------------------------------+
 | Data privacy and       | Ensure data privacy and proprietary protection by having developers only submit        |
 | proprietary concerns   | models, not their data.                                                                |
 +------------------------+----------------------------------------------------------------------------------------+
 | Unplanned tasks        | Ensure the availability of helpful learnware for various tasks, unless entirely new    |
 |                        | to all legal developers.                                                               |
 +------------------------+----------------------------------------------------------------------------------------+
 | Carbon emissions       | Reduce the need to train numerous large models by assembling smaller models that       |
 |                        | provide satisfactory performance.                                                      |
 +------------------------+----------------------------------------------------------------------------------------+

 Future Work and Progress
 ------------------------

 Despite the promising potential of the learnware proposal, much work remains to bring it to fruition. The following sections will discuss some of the progress made thus far.


 Learnware for Hetero Reuse (Feature Aligh + Hetero Map Learnware)
 =======================================================================

 All Reuse Methods
 ===========================

 In addition to applying ``Learnware``, ``FeatureAlignLearnware`` or ``HeteroMapAlignLearnware`` objects directly by calling their ``predict`` interface, 
 the ``learnware`` package also provides a set of ``Reuse Methods`` for users to further customize a single or multiple learnwares, with the hope of enabling learnwares to be 
 helpful beyond their original purposes, and eliminating the need for users to build models from scratch.

 There are two main categories of ``Reuse Methods``: (1) direct reuse and (2) reuse based on a small amount of labeled data.

 .. note:: 
    Combine ``HeteroMapAlignLearnware`` with the following reuse methods to enable the reuse of heterogeneous learnwares. See `WORKFLOW: Hetero Reuse <../workflows/reuse.html#hetero-reuse>`_ for details.

 Direct Reuse of Learnware
 --------------------------

 Two methods for direct reuse of learnwares are provided: ``JobSelectorReuser`` and ``AveragingReuser``.

 JobSelectorReuser
 --------------------
 ^^^^^^^^^^^^^^^^^^

 The ``JobSelectorReuser`` is a class that inherits from the base reuse class ``BaseReuser``.
 Its purpose is to create a job selector that identifies the optimal learnware for each data point in user data.
 There are three parameters required to initialize the class:
 ``JobSelectorReuser`` trains a classifier ``job selector`` that identifies the optimal learnware for each data point in user data.
 There are three member variables:

 - ``learnware_list``: A list of objects of type ``Learnware``. Each ``Learnware`` object should have an RKME specification.
 - ``learnware_list``: A list of ``Learnware`` objects for the ``JobSelectorReuser`` to choose from.
 - ``herding_num``: An optional integer that specifies the number of items to herd, which defaults to 1000 if not provided.
 - ``use_herding``: A boolean flag indicating whether to use kernel herding.

 The job selector is essentially a multi-class classifier :math:`g(\boldsymbol{x}):\mathcal{X}\rightarrow \mathcal{I}` with :math:`\mathcal{I}=\{1,\ldots, C\}`, where :math:`C` is the size of ``learnware_list``.
 Given a testing sample :math:`\boldsymbol{x}`, the ``JobSelectorReuser`` predicts it by using the :math:`g(\boldsymbol{x})`-th learnware in ``learnware_list``.
 If ``use_herding`` is set to false, the ``JobSelectorReuser`` uses data points in each learware's RKME spefication with the corresponding learnware index to train a job selector.
 If ``use_herding`` is true, the algorithm estimates the mixture weight based on RKME specifications and raw user data, uses the weight to generate ``herding_num`` auxiliary data points mimicking the user distribution through the kernel herding method, and learns a job selector on these data.
 The most important methods of ``JobSelectorReuser`` are ``job_selector`` and ``predict``:

 - **job_selector**: Train a ``job selector`` based on user's data and the ``learnware_list``. Processions are different based on the value of ``use_herding``:

    - If ``use_herding`` is False: Statistical specifications of learnwares in ``learnware_list`` combined with the corresponding learnware index are used to train the ``job selector``.
    - If ``use_herding`` is True:
  
      - Estimate the mixture weight based on user raw data and the statistical specifications of learnwares in ``learnware_list``
      - Use the mixture weight to generate ``herding_num`` auxiliary data points which mimic the user task's distribution through the kernel herding method
      - Finally learns the ``job selector`` on the auxiliary data points.
  
 - **predict**: The ``job selector`` is essentially a multi-class classifier :math:`g(\boldsymbol{x}):\mathcal{X}\rightarrow \mathcal{I}` with :math:`\mathcal{I}=\{1,\ldots, C\}`, where :math:`C` is the size of ``learnware_list``. Given a testing sample :math:`\boldsymbol{x}`, the ``JobSelectorReuser`` predicts it by using the :math:`g(\boldsymbol{x})`-th learnware in ``learnware_list``.


 AveragingReuser
 ------------------
 ^^^^^^^^^^^^^^^^^^

 ``AveragingReuser`` uses an ensemble method to make predictions. It is initialized with a list of ``Learnware`` objects, and has a member variable ``mode`` which
 specifies the ensemble method(default is set to ``mean``). 

 - **predict**: The member variable ``mode`` provides different options for classification and regression tasks:

    - For regression tasks, ``mode`` should be set to ``mean``. The prediction is the average of the learnwares' outputs.
    - For classification tasks, ``mode`` has two available options. If ``mode`` is set to ``vote_by_label``, the prediction is the majority vote label based on learnwares' output labels. If ``mode`` is set to ``vote_by_prob``, the prediction is the mean vector of all learnwares' output label probabilities.


 Reuse Learnware with Labeled Data
 ----------------------------------

 When users have a small amount of labeled data available, ``learnware`` package provides two methods: ``EnsemblePruningReuser`` and ``FeatureAugmentReuser`` to help reuse learnwares.
 They are both initialized with a list of ``Learnware`` objects ``learnware_list``, and have different implementations of ``fit`` and ``predict`` methods.

 EnsemblePruningReuser
 ^^^^^^^^^^^^^^^^^^^^^^

 The ``EnsemblePruningReuser`` class implements a selective ensemble approach inspired by the MDEP algorithm, as detailed in [1]_.
 It selects a subset of learnwares from ``learnware_list``, utilizing user's labeled data for effective ensemble integration on user tasks. 
 This method effectively balances validation error, margin ratio, and ensemble size, leading to a robust and optimized selection of learnwares for task-specific ensemble creation. 

 - **fit**: Effectively prunes the large set of learnwares ``learnware_list`` by evaluating and comparing the learnwares based on their performance on user's labeled validation data ``(val_X, val_y)``. Returns the most suitable subset of learnwares. 
 - **predict**: The ``mode`` member variable has two available options. Set ``mode`` to ``regression`` for regression tasks, and ``classification`` for classification tasks. The prediction is the average of the selected learnwares' outputs.


 FeatureAugmentReuser
 ^^^^^^^^^^^^^^^^^^^^^^

 ``FeatureAugmentReuser`` helps users reuse learnwares by augmenting features. In this method, 
 outputs of the learnwares from ``learnware_list`` on user's validation data ``val_X`` are taken as augmented features and are concatenated with original features ``val_X``.
 The augmented data(concatenated features combined with validation labels ``val_y``) are then used to train a simple model ``augment_reuser`` which gives the final prediction
 on ``user_data``.

 - **fit**: Trains the ``augment_reuser`` using augmented user validation data. For classification tasks, ``mode`` should be set to ``classification``, and ``augment_reuser`` is a ``LogisticRegression`` model. For regression tasks, mode should be set to ``classification``, and ``augment_reuser`` is a ``RidgeCV`` model. 

 The ``AveragingReuser`` is a class that inherits from the base reuse class ``BaseReuser``, that implements the average ensemble method by averaging each learnware's output to predict user data.
 There are two parameters required to initialize the class:

 - ``learnware_list``: A list of objects of type ``Learnware``.
 - ``mode``: The mode of averaging leanrware outputs, which can be set to "mean" or "vote" and defaults to "mean".
 References
 -----------

 If ``mode`` is set to "mean", the ``AveragingReuser`` computes the mean of the learnware's output to predict user data, which is commonly used in regression tasks.
 If ``mode`` is set to "vote", the ``AveragingReuser`` computes the mean of the softmax of the learnware's output to predict each label probability of user data, which is commonly used in classification tasks.
 .. [1] Yu-Chang Wu, Yi-Xiao He, Chao Qian, and Zhi-Hua Zhou. Multi-objective Evolutionary Ensemble Pruning Guided by Margin Distribution. In: Proceedings of the 17th International Conference on Parallel Problem Solving from Nature (PPSN'22), Dortmund, Germany, 2022.
--- a/docs/components/market.rst
+++ b/docs/components/market.rst
@@ -1,6 +1,7 @@
 .. _market:

 ================================
 Market
 Learnware Market
 ================================

 The ``learnware market`` receives high-performance machine learning models from developers, incorporates them into the system, and provides services to users by identifying and reusing learnware to help users solve current tasks. Developers voluntarily submit various learnwares to the learnware market, and the market conducts quality checks and further organization of these learnwares. When users submit task requirements, the learnware market automatically selects whether to recommend a single learnware or a combination of multiple learnwares. 
@@ -10,11 +11,11 @@ The ``learnware market`` will receive various kinds of learnwares, and learnware
 Framework
 ======================================

 The ``learnware market`` is combined with a ``organizer``, a ``searcher``, and a list of ``checker``s. 
 The ``learnware market`` is combined with a ``organizer``, a ``searcher``, and a list of ``checker``\ s. 

 The ``organizer`` can store and organize learnwares in the market. It supports ``add``, ``delete``, and ``update`` operations for learnwares. It also provides the interface for ``searcher`` to search learnwares based on user requirement.

 The ``searcher`` can search learnwares based on user requirement. The implementation of ``searcher`` is dependent on the concrete implementation and interface for ``organizer``, where usually an ``organizer`` can be compatible with multiple different ``searcher``s.
 The ``searcher`` can search learnwares based on user requirement. The implementation of ``searcher`` is dependent on the concrete implementation and interface for ``organizer``, where usually an ``organizer`` can be compatible with multiple different ``searcher``\ s.

 The ``checker`` is used for checking the learnware in some standards. It should check the utility of a learnware and is supposed to return the status and a message related to the learnware's check result. Only the learnwares who passed the ``checker`` could be able to be stored and added into the ``learnware market``. 

@@ -23,9 +24,9 @@ The ``checker`` is used for checking the learnware in some standards. It should
 Current Checkers
 ======================================

 The ``learnware`` package provide two different implementation of ``market`` where both of them share the same ``checker`` list. So we first introduce the details of ``checker``s.
 The ``learnware`` package provide two different implementation of ``market`` where both of them share the same ``checker`` list. So we first introduce the details of ``checker``\ s.

 The ``checker``s check a learnware object in different aspects, including environment configuration (``CondaChecker``), semantic specifications (``EasySemanticChecker``), and statistical specifications (``EasyStatChecker``). The ``__call__`` method of each checker is designed to be invoked as a function to conduct the respective checks on the learnware and return the outcomes. It defines three types of learnwares: ``INVALID_LEARNWARE`` denotes the learnware does not pass the check, ``NONUSABLE_LEARNWARE`` denotes the learnware pass the check but cannot make prediction, ``USABLE_LEARWARE`` denotes the leanrware pass the check and can make prediction. Currently, we have three ``checker``s, which are described below.
 The ``checker``s check a learnware object in different aspects, including environment configuration (``CondaChecker``), semantic specifications (``EasySemanticChecker``), and statistical specifications (``EasyStatChecker``). The ``__call__`` method of each checker is designed to be invoked as a function to conduct the respective checks on the learnware and return the outcomes. It defines three types of learnwares: ``INVALID_LEARNWARE`` denotes the learnware does not pass the check, ``NONUSABLE_LEARNWARE`` denotes the learnware pass the check but cannot make prediction, ``USABLE_LEARWARE`` denotes the leanrware pass the check and can make prediction. Currently, we have three ``checker``\ s, which are described below.


 ``CondaChecker``
@@ -52,15 +53,115 @@ The ``learnware`` package provide two different implementation of ``market``, i.
 Easy Market
 -------------

 Easy market is a basic realization of the learnware market. It consists of ``EasyOrganizer``, ``EasySearcher``, and the checker list ``[EasySemanticChecker, EasyStatChecker]``.


 ``Easy Organizer``
 ++++++++++++++++++++

 ``EasyOrganizer`` mainly has the following methods to store learnwares, which is an easy way to organize learnwares.

 - **reload_market**: Reload the learnware market when server restarted, and return a flag indicating whether the market is reloaded successfully.
 - **add_learnware**: Add a learnware with ``learnware_id``, ``semantic_spec`` and model files in ``zip_path`` into the market. Return the ``learnware_id`` and ``learnwere_status``. The ``learnwere_status`` is set ``check_status`` if it is provided, else ``checker`` will be called to generate the ``learnwere_status``.
 - **delete_learnware**: Delete the learnware with ``id`` from the market, return a flag of whether the deletion is successfully.
 - **update_learnware**: Update the learnware's ``zip_path``, ``semantic_spec``, ``check_status``. If None, the corresponding item is not updated. Return a flag indicating whether it passed the ``checker``.
 - **get_learnwares**: Similar to **get_learnware_ids**, but return list of learnwares instead of ids.
 - **reload_learnware**: Reload all the attributes of the learnware with ``learnware_id``.

 ``Easy Searcher``
 ++++++++++++++++++++

 ``EasySearcher`` consists of ``EasyFuzzsemanticSearcher`` and ``EasyStatSearcher``. ``EasyFuzzsemanticSearcher`` is a kind of ``Semantic Specification Searcher``, while ``EasyStatSearcher`` is a kind of ``Statistical Specification Searcher``. All these searchers return helpful learnwares based on ``BaseUserInfo`` provided by users.

 ``BaseUserInfo`` is a ``Python API`` for users to provide enough information to identify helpful learnwares.
 When initializing ``BaseUserInfo``, three optional information can be provided: ``id``, ``semantic_spec`` and ``stat_info``. The introductions of these specifications is shown in `COMPONENTS: Specification <./spec.html>`_.


 The semantic specification search and statistical specification search have been integrated into the same interface ``EasySearcher``. 

 - **EasySearcher.__call__(self, user_info: BaseUserInfo, check_status: int = None, max_search_num: int = 5, search_method: str = "greedy",) -> SearchResults**

  - It conducts the semantic searcher ``EasyFuzzsematicSearcher``  on all the learnwares from the ``organizer`` with the same ``check_status`` (All learnwares if ``check_status`` is None). If the result is not empty and the ``stat_info`` is provided in ``user_info``, then it conducts ``EasyStatSearcher``, and return the ``SearchResults``.


 ``Semantic Specification Searcher``
 ''''''''''''''''''''''''''''''''''''

 ``Semantic Specification Searcher`` is the first-stage search based on ``user_semantic``, identifying potentially helpful learnwares whose models solve tasks similar to your requirements. There are two types of Semantic Specification Search: ``EasyExactSemanticSearcher`` and ``EasyFuzzSemanticSearcher``. 

 In these two searchers, each learnware in the ``learnware_list`` is compared with ``user_info`` according to their ``semantic_spec``, and added to the search result if mathched. Two semantic_spec are matched when all the key words are matched or empty in ``user_info``. Different keys have different matching rules. Their ``__call__`` functions are the same:

 - **EasyExactSemanticSearcher/EasyFuzzSemanticSearcher.__call__(self, learnware_list: List[Learnware], user_info: BaseUserInfo)-> SearchResults**

  - For keys ``Data``, ``Task``, ``Library`` and ``license``, two``semantic_spec`` keys are matched only if these values(only one value foreach key) of learnware ``semantic_spec`` exists in values(may be muliplevalues for one key) of user ``semantic_spec``.
  - For the key ``Scenario``, two ``semantic_spec`` keys are matched iftheir values have nonempty intersections.
  - For keys ``Name`` and ``Description``, the values are strings and caseis ignored. In ``EasyExactSemanticSearcher``, two ``semantic_spec`` keysare matched if these values of learnware ``semantic_spec`` is a substringof user ``semantic_spec``; In ``EasyFuzzSemanticSearcher``, first theexact semantic searcher is conducted like ``EasyExactSemanticSearcher``.If the result is empty, the fuzz semantic searcher is activated: the``learnware_list`` is sorted according to the fuzz score function ``fuzzpartial_ratio`` in ``rapidfuzz``.

 The results are returned storing in ``single_results`` of ``SearchResults``.


 ``Statistical Specification Searcher``
 ''''''''''''''''''''''''''''''''''''''''''

 If user's statistical specification ``stat_info`` is provided,  the learnware market can perform a more accurate leanware selection using ``EasyStatSearcher``. 

 - **EasyStatSearcher.__call__(self, learnware_list: List[Learnware], user_info: BaseUserInfo, max_search_num: int = 5, search_method: str = "greedy",) -> SearchResults**
 
  - It searches for helpful learnwares from ``learnware_list`` based on the ``stat_info`` in ``user_info``.
  - The result ``SingleSearchItem`` and ``MultipleSearchItem`` are both stored in ``SearchResults``. In ``SingleSearchItem``, it searches for single learnwares that could solve the user task; scores are also provided to represent the fitness of each single learnware and user task. In ``MultipleSearchItem``, it searches for a mixture of learnwares that could solve the user task better; the mixture learnware list and a score for the mixture is returned.
  - The parameter ``search_method`` provides two choice of search strategies for mixture learnwares: ``greedy`` and ``auto``. For the search method ``greedy``, each time it chooses a learnware to make their mixture closer to the user's ``stat_info``; for the search method ``auto``, it directly calculates a best mixture weight for the ``learnware_list``.
  - For single learnware search, we only return the learnwares with score larger than 0.6; For multiple learnware search, the parameter ``max_search_num`` specifies the maximum length of the returned mixture learnware list. 


 ``Easy Checker``
 ++++++++++++++++++++

 ``EasySemanticChecker`` and ``EasyStatChecker`` are used to check the validity of the learnwares. They are used as:

 - ``EasySemanticChecker`` mainly check the integrity and legitimacy of the ``semantic_spec`` in the learnware. A legal ``semantic_spec`` should includes all the keys, and the type of each key should meet our requirements. For keys with type ``Class``, the values should be unique and in our ``valid_list``; for keys with type ``Tag``, the values should not be empty; for keys with type ``String``, a non-empty string is expected as the value; for a table learnware, the dimensions and description of inputs is needed; for ``classification`` or ``regression`` learnwares, the dimensions and description of outputs is indispensable. The learnwares that pass the ``EasySemanticChecker`` is marked as ``NONUSABLE_LEARNWARE``; otherwise, it is ``INVALID_LEARNWARE`` and error information will be returned.
 - ``EasyStatChecker`` mainly check the ``model`` and ``stat_spec`` of the learnwares. It includes the following steps:

  - **Check model instantiation**: ``learnware.instantiate_model`` to instantiate the model and transform it to a ``BaseModel``.
  - **Check input shape**: Check whether the shape of ``semantic_spec`` input(if exists), ``learnware.input_shape`` and shape of ``stat_spec`` are consistent, and then generate an example input with that shape. 
  - **Check model prediction**: Use the model to predict the label of the example input, and record the output shape. 
  - **Check output shape**: For ``Classification``, ``Regression`` and ``Feature Extraction`` tasks, the output shape should be consistent with that in ``semantic_spec`` and ``learnware.output_shape``. Besides, for ``Regression`` tasks, the output should be a legal class in ``semantic_spec``.

 If any step above fails or meets a error, the learnware will be marked as ``INVALID_LEARNWARE``. The learnwares that pass the ``EasyStatChecker`` is marked as ``USABLE_LEARNWARE``.


 Hetero Market
 --------------
 -------------

 Hetero Market consists of ``HeteroMapTableOrganizer``, ``HeteroSearcher``, and the checker list ``[EasySemanticChecker, EasyStatChecker]``.
 It is an extended version of the Easy Market which accommodates table learnwares from different feature spaces(heterogeneous table learnwares), expanding the applicable scope of learnware paradigm. 
 This market trains a heterogeneous engine based on existing learnware specifications in the market to merge different specification islands and assign new specifications(``HeteroMapTableSpecification``) to learnwares. 
 With more learnwares submitted, the heterogeneous engine will continuously update and is expected to build a more precise specification world.


 ``HeteroMapTableOrganizer``
 +++++++++++++++++++++++++++

 ``HeteroMapTableOrganizer`` overrides methods from ``EasyOrganizer`` and implements new methods to support organization of heterogeneous table learnwares. Key features include:

 - **reload_market**: Reloads the heterogeneous engine if there is one, otherwise initializes an engine with default configurations. Returns a flag indicating whether the market is reloaded successfully.
 - **reset**: Resets the heterogeneous market with specific settings regarding the heterogeneous engine such as ``auto_update``, ``auto_update_limit`` and ``training_args`` configurations.
 - **add_learnware**: Add a learnware into the market, meanwhile assigning ``HeteroMapTableSpecification`` to the learnware using the heterogeneous engine. The engine's update process will be triggered if ``auto_update`` is set to True and the number of learnwares in the market with ``USABLE_LEARNWARE`` status exceeds ``auto_update_limit``. Return the ``learnware_id`` and ``learnwere_status``.
 - **delete_learnware**: Removes the learnware with ``id`` from the market, also remove its new specification if there is one. Return a flag of whether the deletion is successful.
 - **update_learnware**: Update the learnware's ``zip_path``, ``semantic_spec``, ``check_status`` and its new specification if there is one. Return a flag indicating whether it passed the ``checker``.
 - **generate_hetero_map_spec**: Generate ``HeteroMapTableSpecification`` for users based on the information provided in ``user_info``.
 - **train**: Build the heterogeneous engine using learnwares from the market that supports heterogeneous market training.


 The learnware market naturally consists of models with different feature spaces, different label spaces, or different objectives. It is beneficial for the market to accommodate these heterogeneous learnwares and provide corresponding learnware recommendation and reuse services to the user so as to expand the applicable scope of learnware paradigm.
 ``HeteroSearcher``
 ++++++++++++++++++

 Models are submitted to the market with their original specifications. However, these specifications are hard to be used for responding to user requirements due to heterogeneity. Specifications of heterogeneous models reside in different specification spaces. The market needs to merge these specification spaces into a unified one. To achieve this adjustment, you need to implement the class ``EvolvedMarket``, especially the function ``EvolvedMarket.generate_new_stat_specification``, which generates new statistical specifcation in an identical space for each submitted model.
 ``HeteroSearcher`` builds upon ``EasySearcher`` with additional support for searching among heterogeneous table learnwares, returning helpful learnwares with feature space and label space different from the user's task requirements.
 The semantic specification search and statistical specification search have been integrated into the same interface ``HeteroSearcher``.

 One important case is that models have different feature spaces. In order to enable the learnware market to handle heterogeneous feature spaces, you need to implement the class ``HeterogeneousFeatureMarket`` in the following way:
 - **HeteroSearcher.__call__(self, user_info: BaseUserInfo, check_status: int = None, max_search_num: int = 5, search_method: str = "greedy",) -> SearchResults**

 - First, design a method for the market to connect different feature spaces to a common subspace and implement the function ``HeterogeneousFeatureMarket.learn_mapping_functions``. This function uses specifications of all submitted models to learn mapping functions that can map the data in the original feature space to the common subspace and vice verse.
 - Second, use learned mapping functions to implement the functions ``HeterogeneousFeatureMarket.transform_original_to_subspace`` and ``HeterogeneousFeatureMarket.transform_subspace_to_original``.
 - Third, use the functions ``HeterogeneousFeatureMarket.transform_original_to_subspace`` and ``HeterogeneousFeatureMarket.transform_subspace_to_original`` to overwrite the mehtod ``EvolvedMarket.generate_new_stat_specification`` and  ``EvolvedMarket.EvolvedMarket.evolve_learnware_list`` of the base class ``EvolvedMarket``.
  - It conducts the semantic searcher ``EasyFuzzsematicSearcher``  on all the learnwares from the ``HeteroOrganizer`` with the same ``check_status`` (All learnwares if ``check_status`` is None).
  - If the ``stat_info`` is provided in ``user_info``, it conducts one of the following two types of statistical specification search using ``EasySearcher``, depending on whether heterogeneous learnware search is enabled. If enabled, ``stat_info`` will be updated with a ``HeteroMapTableSpecification`` generated for the user, and the Hetero Market performs heterogeneous learnware selection based on the updated ``stat_info``. If not enabled, the Hetero Market performs homogeneous learnware selection based on the original ``stat_info``.
  
 .. note:: 
  The heterogeneous learnware search is enabled when ``user_info`` contains valid heterogeneous search information. Please refer to `WORKFLOWS: Hetero Search  <../workflows/search.html#hetero-search>`_ for details.
--- a/docs/components/spec.rst
+++ b/docs/components/spec.rst
@@ -3,80 +3,67 @@
 Specification
 ================================

 Concepts & Types
 ======================================

 The search of helpful learnwares can be divided into two stages: statistical specification and semantic specification.

 Statistical Specification
 ---------------------------

 The learnware specification should ideally provide essential information about every model in the learnware market, enabling efficient and accurate identification for future users. Our current specification design has two components. The first part consists of a string of descriptions or tags assigned by the learnware market based on developer-submitted information. These descriptions or tags help identify the model's specification island. Different learnware market enterprises may use different descriptions or tags.

 The second part of the specification is crucial for determining the model's position in the functional space :math:`F: \mathcal{X} \mapsto \mathcal{Y}` with respect to obj. A recent development in this area is the RKME (Reduced Kernel Mean Embedding) specification, which builds on the reduced set of KME (Kernel Mean Embedding) techniques. KME is a powerful method for mapping a probability distribution to a point in RKHS (Reproducing Kernel Hilbert Space), while the reduced set retains this ability with a concise representation that doesn't reveal the original data.

 The RKME specification assumes that each learnware is a well-performed model on its training data. The RKME specification is based on RKME :math:`\widetilde{\Phi}`, which aims to provide a good representation by constructing a reduced set to approximate the empirical KME :math:`\Phi=\int_{\mathcal{X}} k(\boldsymbol{x}, \cdot) \mathrm{d} P(\boldsymbol{x})` of the underlying distribution. Theoretically, when the kernel function satisfies :math:`k(\boldsymbol{x}, \boldsymbol{x}) \leq 1` for all :math:`x \in \mathcal{X}`, we have the guarantee that

 .. math::

   \|\widetilde{\Phi}-\Phi\|_{\mathcal{H}} \leq 2 \sqrt{\frac{2}{n}}+\sqrt{\frac{1}{m}}+\sqrt{\frac{2 \log (1 / \delta)}{m}},

 with a probability of at least :math:`1-\delta`, where :math:`n, m` are the sizes of the RKME reduced set and the original data, respectively. It is known that when using characteristic kernels such as the Gaussian kernel, KME can capture all information about the distribution. Additionally, when the RKHS of the kernel function is finite-dimensional, RKME has a linear convergence rate :math:`O\left(e^{-n}\right)` to empirical KME; for infinite-dimensional RKHS, it has been proved constructively that RKME can enjoy :math:`O(\sqrt{d} / n)` convergence rate under :math:`L_{\infty}` measure, where :math:`d` is the dimension of the original data. Therefore, RKME is guaranteed to be a good estimation of KME and a valid representation for data distribution that encodes the ability of a trained model.

 Under certain assumptions, the risk for the user task can be bounded, such as assuming that the distribution corresponding to the user's task matches that of a learnware, or that it can be approximated by a mixture of distributions corresponding to a set of learnwares' tasks, i.e.,

 .. math::

   \mathcal{D}_u=\sum_{i=1}^N w_i \mathcal{D}_i

 where :math:`\mathcal{D}_u` is the distribution corresponding to the user's task, :math:`N` is the number of learnwares, and :math:`\mathcal{D}_i` are their corresponding distributions. We have :math:`\sum_{i=1}^N w_i=1` and :math:`w_i \geq 0`. These two assumptions are known as task-recurrent and instance-recurrent assumptions. Additionally, assume that all learnwares are well-performed ones:

 .. math::

   \mathbb{E}_{\mathcal{D}_i}\left[\ell\left(\widehat{f}_i(\boldsymbol{x}), \boldsymbol{y}\right)\right] \leq \epsilon, \forall i \in[N],
 Learnware specification is the core component of the learnware paradigm, linking all processes about learnwares, including uploading, organizing, searching, deploying and reusing. 

 where :math:`\widehat{f}_i` is the function corresponding to the :math:`i`-th learnware, :math:`\ell` is the loss function, and :math:`\boldsymbol{y}` is assumed to be determined by a ground-truth global function :math:`h`. Under these assumptions, recent studies have attempted to bound the risk on the user's task. With the task-recurrent assumption and selecting the learnware :math:`\left(\widehat{f}_i, \tilde{\Phi}_i\right)` with the smallest RKHS distance :math:`\eta` according to RKME, given the loss function
 In this section, we will introduce the concept and design of learnware specification in the ``learnware`` package.
 We will then explore ``regular specification``\ s tailored for different data types such as tables, images and texts.
 Lastly, we cover a ``system specification`` specifically assigned to table learnwares by the learnware market, aimed at accommodating all available table learnwares into a unified "specification world" despite their heterogeneity.

 .. math::

   \left|\ell\left(\widehat{f}_i(\boldsymbol{x}), h(\boldsymbol{x})\right)\right| \leq U, \forall \boldsymbol{x} \in \mathcal{X}, \forall i \in[N],
 Concepts & Types
 ==================

 we have
 The learnware specification describes the model's specialty and utility in a certain format, allowing the model to be identified and reused by future users who may have no prior knowledge of the learnware.
 The ``learnware`` package employs a highly extensible specification design, which consists of two parts:

 .. math::
 - **Semantic specification** describes the model's type and functionality through a set of descriptions and tags. Learnwares with similar semantic specifications reside in the same specification island
 - **Statistical specification** characterizes the statistical information contained in the model using various machine learning techniques. It plays a crucial role in locating the appropriate place for the model within the specification island.

   \mathbb{E}_{\mathcal{D}_u}\left[\ell\left(\widehat{f}_i(\boldsymbol{x}), \boldsymbol{y}\right)\right] \leq \epsilon+U \eta+O\left(\frac{1}{\sqrt{m}}+\frac{1}{\sqrt{n}}\right).
 When searching in the learnware market, the system first locates specification islands based on the semantic specification of the user's task, 
 then pinpoints highly beneficial learnwares on theses islands based on the statistical specification of the user's task.

 As for the instance-recurrent assumption and the 0/1-loss
 Statistical Specification
 ---------------------------

 .. math::
 We employ the ``Reduced Kernel Mean Embedding (RKME) Specification`` as the foundation for implementing statistical specification for diverse data types, 
 with adjustments made according to the characteristics of each data type. 
 The RKME specification is a recent development in learnware specification design, which represents the distribution of a model's training data in a privacy-preserving manner.

   \ell_{01}(f(\boldsymbol{x}), \boldsymbol{y})=\mathbb{I}(f(\boldsymbol{x}) \neq \boldsymbol{y}),
 Within the ``learnware`` package, you'll find two types of statistical specifications: ``regular specification`` and ``system specification``. The former is generated locally
 by users to express their model's statistical information, while the latter is assigned by the learnware market to accommodate and organize heterogeneous learnwares. 

 a more general result has been achieved:
 Semantic Specification
 -----------------------

 .. math::
 The semantic specification consists of a "dict" structure that includes keywords "Data", "Task", "Library", "Scenario", "License", "Description", and "Name". 
 In the case of table learnwares, users should additionally provide descriptions for each feature dimension and output dimension through the "Input" and "Output" keywords.

   \mathbb{E}_{\mathcal{D}_u}\left[\ell_{01}(f(\boldsymbol{x}), \boldsymbol{y})\right] \leq \epsilon+R(g),

 where :math:`R(g)=\sum_{i=1}^N w_i \mathbb{E}_{\mathcal{D}_1}\left[\ell_{01}(g(\boldsymbol{x}), i)\right]` represents the weighted risk of any learnware selector :math:`g(x)`, which takes unlabeled data as input and assigns it to the appropriate model, :math:`f(\boldsymbol{x})=\widehat{f}_{g(\boldsymbol{x})}(\boldsymbol{x})` is the final model for the user's task.
 Regular Specification
 ======================================

 Efforts have been made to enable the learnware market to handle unseen tasks, where the user's task involves some unseen aspects that have never been addressed by the current learnwares in the market. A more general theoretical analysis has been presented based on mixture proportion estimation.
 The ``learnware`` package provides a unified interface, ``generate_stat_spec``, for generating ``regular specification``\ s across different data types. 
 Users can use the training data ``train_x`` (supported types include numpy.ndarray, pandas.DataFrame, and torch.Tensor) as input to generate the ``regular specification`` of the model,
 as shown in the following code:

 .. code:: python

 Semantic Specification
 ---------------------------
   for learnware.specification import generate_stat_spec

 The semantic specification describes the characteristics of user's task and the market will identify potentially helpful leaarnwares whose models solve tasks similar to your requirements. The detail semantic specification is in `Indentification Learnwares <../workflow/identify.html>`_.
   data_type = "table" # supported data types: ["table", "image", "text"]
   regular_spec = generate_stat_spec(type=data_type, x=train_x)
   regular_spec.save("stat.json")

 It's worth noting that the above code only runs on user's local computer and does not interact with any cloud servers or leak any local private data.

 Regular Specification
 ======================================
 .. note:: 

   In cases where the model's training data is too large, causing the above code to fail, you can consider sampling the training data to ensure it's of a suitable size before proceeding with reduction generation.

 Table Specification
 --------------------------

 The ``regular specification`` for tabular learnware is essentially the RKME specification of the model's training table data. No additional adjustment is needed.

 Image Specification
 --------------------------

@@ -99,19 +86,17 @@ By randomly sampling a subset of the dataset, we can construct Image Specificati

 .. code-block:: python

   import torchvision
   from torch.utils.data import DataLoader
   from learnware.specification import generate_rkme_image_spec
    import torchvision
    from torch.utils.data import DataLoader
    from learnware.specification import generate_rkme_image_spec

   SAMPLED_SIZE = 5000

   full_set = torchvision.datasets.CIFAR10(
      root='./data', train=True, download=True, transform=torchvision.transforms.ToTensor())
   loader =  DataLoader(full_set, batch_size=SAMPLED_SIZE, shuffle=True)
   sampled_X, _ = next(iter(loader))
    cifar10 = torchvision.datasets.CIFAR10(
       root='./data', train=True, download=True, transform=torchvision.transforms.ToTensor())
    X, _ = next(iter(DataLoader(cifar10, batch_size=len(cifar10))))

   spec = generate_rkme_image_spec(sampled_X)
   spec.save("cifar10.json")
    spec = generate_rkme_image_spec(X, sample_size=5000)
    spec.save("cifar10.json")

 Privacy Protection
 ^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -130,4 +115,21 @@ Text Specification


 System Specification
 ======================================
 ======================================

 In contrast to ``regular specification``\ s which are generated solely by users,
 ``system specification``\ s are higher-level statistical specifications assigned by learnware markets 
 to effectively accommodate and organize heterogeneous learnwares. 
 This implies that ``regular specification``\ s are usually applicable across different markets, while ``system specification``\ s are generally closely associated
 with particular learnware market implementations.

 ``system specification`` play a critical role in heterogeneous markets such as the ``Hetero Market``:

 - Learnware organizers use these specifications to connect isolated specification islands into unified "specification world"s.
 - Learnware searchers perform helpful learnware recommendations among all table learnwares in the market, leveraging the ``system specification``\ s generated for users.


 ``learnware`` package now includes a type of ``system specification``, named ``HeteroMapTableSpecification``, made especially for the ``Hetero Market`` implementation.
 This specification is automatically given to all table learnwares when they are added to the ``Hetero Market``.
 It is also set up to be updated periodically, ensuring it remains accurate as the learnware market evolves and builds more precise specification worlds.
 Please refer to `COMPONENTS: Hetero Market  <../components/market.html#hetero-market>`_ for implementation details.
--- a/docs/conf.py
+++ b/docs/conf.py
@@ -92,8 +92,8 @@ html_theme = "sphinx_book_theme"
 html_theme_path = [sphinx_book_theme.get_html_theme_path()]
 html_theme_options = {
    "logo_only": True,
 #    "collapse_navigation": False,
 #    "display_version": False,
    "collapse_navigation": False,
    # "display_version": False,
    "navigation_depth": 4,
 }
 html_logo = "_static/img/logo/logo1.png"
--- a/docs/start/exp.rst
+++ b/docs/start/exp.rst
@@ -21,50 +21,195 @@ Table: homo+hetero

 Datasets
 ------------------
 We designed experiments on three publicly available datasets, namely `Prediction Future Sales (PFS) <https://www.kaggle.com/c/competitive-data-science-predict-future-sales/data>`_,
 `M5 Forecasting (M5) <https://www.kaggle.com/competitions/m5-forecasting-accuracy/data>`_ and `CIFAR 10 <https://www.cs.toronto.edu/~kriz/cifar.html>`_.
 For the two sales forecasting data sets of PFS and M5, we divide the user data according to different stores, and train the Ridge model and LightGBM model on the corresponding data respectively.
 For the CIFAR10 image classification task, we first randomly pick 6 to 10 categories, and randomly select 800 to 2000 samples from each category from the categories corresponding to the training set, constituting a total of 50 different uploaders.
 For test users, we first randomly pick 3 to 6 categories, and randomly select 150 to 350 samples from each category from the corresponding categories from the test set, constituting a total of 20 different users.
 Our study involved three public datasets in the sales forecasting field: `Predict Future Sales (PFS) <https://www.kaggle.com/c/competitive-data-science-predict-future-sales/data>`_,
 `M5 Forecasting (M5) <https://www.kaggle.com/competitions/m5-forecasting-accuracy/data>`_ and `Corporacion <https://www.kaggle.com/competitions/favorita-grocery-sales-forecasting/data>`_.
 We applied various pre-processing methods to these datasets to enhance the richness of the data.
 After pre-processing, we first divided each dataset by store and then split the data for each store into training and test sets. Specifically:

 - For PFS, the test set consisted of the last month's data from each store.
 - For M5, we designated the final 28 days' data from each store as the test set.
 - For Corporacion, the test set was composed of the last 16 days of data from each store.

 In the submitting stage, the Corporacion dataset's 55 stores are regarded as 165 uploaders, each employing one of three different feature engineering methods. 
 For the PFS dataset, 100 uploaders are established, each using one of two feature engineering approaches. 
 These uploaders then utilize their respective stores' training data to develop LightGBM models. 
 As a result, the learnware market comprises 265 learnwares, derived from five types of feature spaces and two types of label spaces

 Based on the specific design of user tasks, our experiments were primarily categorized into two types:

 - ``homogeneous experiments`` are designed to evaluate performance when users can reuse learnwares in the learnware market that have the same feature space as their tasks(homogeneous learnwares).
  This contributes to showing the effectiveness of using learnwares that align closely with the user's specific requirements.
   
 - ``heterogeneous experiments`` aim to evaluate the performance of identifying and reusing helpful heterogeneous learnwares in situations where 
  no available learnwares match the feature space of the user's task. This helps to highlight the potential of learnwares for applications beyond their original purpose.

 Homo Experiments
 -----------------------

 For homogeneous experiments, the 55 stores in the Corporacion dataset act as 55 users, each applying one feature engineering method, 
 and using the test data from their respective store as user data. These users can then search for homogeneous learnwares in the market with the same feature spaces as their tasks.

 The Mean Squared Error (MSE) of search and reuse is presented in the table below:

 +-----------------------------------+---------------------+
 | Mean in Market (Single)           | 0.331 ± 0.040       |
 +-----------------------------------+---------------------+
 | Best in Market (Single)           | 0.151 ± 0.046       |
 +-----------------------------------+---------------------+
 | Top-1 Reuse (Single)              | 0.280 ± 0.090       |
 +-----------------------------------+---------------------+
 | Job Selector Reuse (Multiple)     | 0.274 ± 0.064       |
 +-----------------------------------+---------------------+
 | Average Ensemble Reuse (Multiple) | 0.267 ± 0.051       |
 +-----------------------------------+---------------------+

 When users have both test data and limited training data derived from their original data, reusing single or multiple searched learnwares from the market can often yield
 better results than training models from scratch on limited training data. We present the change curves in MSE for the user's self-trained model, as well as for the Feature Augmentation single learnware reuse method and the Ensemble Pruning multiple learnware reuse method. 
 These curves display their performance on the user's test data as the amount of labeled training data increases. 
 The average results across 55 users are depicted in the figure below:

 .. image:: ../_static/img/table_homo_labeled.png
   :align: center
   :alt: Table Homo Limited Labeled Data

 From the figure, it's evident that when users have limited training data, the performance of reusing single/multiple table learnwares is superior to that of the user's own model. 
 This emphasizes the benefit of learnware reuse in significantly reducing the need for extensive training data and achieving enhanced results when available user training data is limited.


 Hetero Experiments
 -------------------------

 In heterogeneous experiments, the learnware market would recommend helpful heterogeneous learnwares with different feature spaces with 
 the user tasks. Based on whether there are learnwares in the market that handle tasks similar to the user's task, the experiments can be further subdivided into the following two types:

 Cross Feature Space Experiments
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

 We designate the 41 stores in the PFS dataset as users, creating their user data with an alternative feature engineering approach that varies from the methods employed by learnwares in the market. 
 Consequently, while the market's learnwares from the PFS dataset undertake tasks very similar to our users, the feature spaces do not match exactly. In this experimental configuration,
 we tested various heterogeneous learnware reuse methods (without using user's labeled data) and compared them to the user's self-trained model based on a small amount of training data.
 The average MSE performance across 41 users are as follows:

 +-----------------------------------+---------------------+
 | Mean in Market (Single)           | 1.459 ± 1.066       |
 +-----------------------------------+---------------------+
 | Best in Market (Single)           | 1.226 ± 1.032       |
 +-----------------------------------+---------------------+
 | Top-1 Reuse (Single)              | 1.407 ± 1.061       |
 +-----------------------------------+---------------------+
 | Average Ensemble Reuse (Multiple) | 1.312 ± 1.099       |
 +-----------------------------------+---------------------+
 | User model with 50 labeled data   | 1.267 ± 1.055       |
 +-----------------------------------+---------------------+

 From the results, it is noticeable that the learnware market still perform quite well even when users lack labeled data, 
 provided it includes learnwares addressing tasks that are similar but not identical to the user's. 
 In these instances, the market's effectiveness can match or even rival scenarios where users have access to a limited quantity of labeled data.

 Cross Task experiments
 ^^^^^^^^^^^^^^^^^^^^^^^

 Here we have chosen the 10 stores from the M5 dataset to act as users. Although the broad task of sales forecasting is similar to the tasks addressed by the learnwares in the market, 
 there are no learnwares available that directly cater to the M5 sales forecasting requirements. All learnwares show variations in both feature and label spaces compared to the tasks of M5 users.
 We present the change curves in RMSE for the user's self-trained model and several learnware reuse methods. 
 These curves display their performance on the user's test data as the amount of labeled training data increases. 
 The average results across 10 users are depicted in the figure below:

 .. image:: ../_static/img/table_hetero_labeled.png
   :align: center
   :alt: Table Hetero Limited Labeled Data

 We can observe that heterogeneous learnwares are beneficial when there's a limited amount of the user's labeled training data available, 
 aiding in better alignment with the user's specific task. This underscores the potential of learnwares to be applied to tasks beyond their original purpose.

 We tested the efficiency of the specification generation and the accuracy of the search and reuse model respectively.
 The evaluation index on PFS and M5 data is RMSE, and the evaluation index on CIFAR10 classification task is classification accuracy
 Text Experiment
 ====================

 Datasets
 ------------------
 We conducted experiments on the widely used text benchmark dataset: `20-newsgroup <http://qwone.com/~jason/20Newsgroups/>`_.
 20-newsgroup is a renowned text classification benchmark with a hierarchical structure, featuring 5 superclasses {comp, rec, sci, talk, misc}.

 In the submitting stage, we enumerated all combinations of three superclasses from the five available, randomly sampling 50% of each combination from the training set to create datasets for 50 uploaders.

 In the deploying stage, we considered all combinations of two superclasses out of the five, selecting all data for each combination from the testing set as a test dataset for one user. This resulted in 10 users.
 The user's own training data was generated using the same sampling procedure as the user test data, despite originating from the training dataset.

 Model training comprised two parts: the first part involved training a tfidf feature extractor, and the second part used the extracted text feature vectors to train a naive Bayes classifier.

 Our experiments comprises two components:

 * ``unlabeled_text_example`` is designed to evaluate performance when users possess only testing data, searching and reusing learnware available in the market.
 * ``labeled_text_example`` aims to assess performance when users have both testing and limited training data, searching and reusing learnware directly from the market instead of training a model from scratch. This helps determine the amount of training data saved for the user.

 Results
 ----------------

 The time-consuming specification generation is shown in the table below:
 * ``unlabeled_text_example``:

 ====================  ====================  =================================
 Dataset               Data Dimensions       Specification Generation Time (s)
 ====================  ====================  =================================
 PFS                   8714274*31            < 1.5
 M5                    46027957*82           9~15
 CIFAR10               9000*3*32*32          7~10
 ====================  ====================  =================================
 The accuracy of search and reuse is presented in the table below:

 The accuracy of search and reuse is shown in the table below:
 +-----------------------------------+---------------------+
 | Mean in Market (Single)           | 0.507 ± 0.030       |
 +-----------------------------------+---------------------+
 | Best in Market (Single)           | 0.859 ± 0.051       |
 +-----------------------------------+---------------------+
 | Top-1 Reuse (Single)              | 0.846 ± 0.054       |
 +-----------------------------------+---------------------+
 | Job Selector Reuse (Multiple)     | 0.845 ± 0.053       |
 +-----------------------------------+---------------------+
 | Average Ensemble Reuse (Multiple) | 0.862 ± 0.051       |
 +-----------------------------------+---------------------+

 ====================  ==================== ================================= =================================
 Dataset               Top-1 Performance    Job Selector Reuse                Average Ensemble Reuse
 ====================  ==================== ================================= =================================
 PFS                     1.955 +/- 2.866    2.175 +/- 2.847                    1.950 +/- 2.888
 M5                      2.066 +/- 0.424    2.116 +/- 0.472                    2.512 +/- 0.573
 CIFAR10                 0.619 +/- 0.138    0.585 +/- 0.056                    0.715 +/- 0.075
 ====================  ==================== ================================= =================================
 * ``labeled_text_example``:

 We present the change curves in classification error rates for both the user's self-trained model and the multiple learnware reuse(EnsemblePrune), showcasing their performance on the user's test data as the user's training data increases. The average results across 10 users are depicted below:

 .. image:: ../_static/img/text_labeled_curves.png
   :align: center
   :alt: Text Limited Labeled Data

 Text Experiment
 ====================

 From the figure above, it is evident that when the user's own training data is limited, the performance of multiple learnware reuse surpasses that of the user's own model. As the user's training data grows, it is expected that the user's model will eventually outperform the learnware reuse. This underscores the value of reusing learnware to significantly conserve training data and achieve superior performance when user training data is limited.


 Image Experiment
 ====================

 For the CIFAR-10 dataset, we sampled the training set unevenly by category and constructed unbalanced training datasets for the 50 learnwares that contained only some of the categories. This makes it unlikely that there exists any learnware in the learnware market that can accurately handle all categories of data; only the learnware whose training data is closest to the data distribution of the target task is likely to perform well on the target task. Specifically, the probability of each category being sampled obeys a random multinomial distribution, with a non-zero probability of sampling on only 4 categories, and the sampling ratio is 0.4: 0.4: 0.1: 0.1. Ultimately, the training set for each learnware contains 12,000 samples covering the data of 4 categories in CIFAR-10.

 We constructed 50 target tasks using data from the test set of CIFAR-10. Similar to constructing the training set for the learnwares, in order to allow for some variation between tasks, we sampled the test set unevenly. Specifically, the probability of each category being sampled obeys a random multinomial distribution, with non-zero sampling probability on 6 categories, and the sampling ratio is 0.3: 0.3: 0.1: 0.1: 0.1: 0.1. Ultimately, each target task contains 3000 samples covering the data of 6 categories in CIFAR-10.

 With this experimental setup, we evaluated the performance of RKME Image using 1 - Accuracy as the loss.

 +-----------------------------------+---------------------+
 | Mean in Market (Single)           | 0.655 ± 0.021       |
 +-----------------------------------+---------------------+
 | Best in Market (Single)           | 0.304 ± 0.046       |
 +-----------------------------------+---------------------+
 | Top-1 Reuse (Single)              | 0.406 ± 0.128       |
 +-----------------------------------+---------------------+
 | Job Selector Reuse (Multiple)     | 0.406 ± 0.128       |
 +-----------------------------------+---------------------+
 | Average Ensemble Reuse (Multiple) | 0.310 ± 0.112       |
 +-----------------------------------+---------------------+

 In some specific settings, the user will have a small number of labelled samples. In such settings, learning the weight of selected learnwares on a limited number of labelled samples can result in a better performance than training directly on a limited number of labelled samples.

 .. image:: ../_static/img/image_labeled.png
   :align: center

 Get Start Examples
 =========================
 Examples for `PFS, M5` and `CIFAR10` are available at [xxx]. You can run { main.py } directly to reproduce related experiments.
 The test code is mainly composed of three parts, namely data preparation (optional), specification generation and market construction, and search test.
 You can load data prepared by as and skip the data preparation step.
 You can load data prepared by as and skip the data preparation step.


 Examples for the `20-newsgroup` dataset are available at [examples/dataset_text_workflow].
 We utilize the `fire` module to construct our experiments. You can execute the experiment with the following commands:

 * `python main.py prepare_market`: Prepares the market.
 * `python main.py unlabeled_text_example`: Executes the unlabeled_text_example experiment; the results will be printed in the terminal.
 * `python main.py labeled_text_example`: Executes the labeled_text_example experiment; result curves will be automatically saved in the `figs` directory.
 * Additionally, you can use `python main.py unlabeled_text_example True` to combine steps 1 and 2. The same approach applies to running labeled_text_example directly.
--- a/docs/workflows/reuse.rst
+++ b/docs/workflows/reuse.rst
@@ -2,36 +2,135 @@
 Learnwares Reuse
 ==========================================

 This part introduces two baseline methods for reusing a given list of learnwares, namely ``JobSelectorReuser`` and ``AveragingReuser``.
 Instead of training a model from scratch, the user can easily reuse a list of learnwares (``List[Learnware]``) to predict the labels of their own data (``numpy.ndarray`` or ``torch.Tensor``).
 ``Learnware Reuser`` is a ``Python API`` that offers a variety of convenient tools for learnware reuse. Users can reuse a single learnware, combination of multiple learnwares,
 and heterogeneous learnwares using these tools efficiently, thereby saving the laborious time and effort of building models from scratch. There are mainly two types of 
 reuse tools, based on whether user has gathered a small amount of labeled data beforehand: (1) direct reuse and (2) customized reuse based on labeled data.

 To illustrate, we provide a code demonstration that obtains the user dataset using ``sklearn.datasets.load_digits``, where ``test_data`` represents the data that requires prediction.
 Assuming that ``learnware_list`` is the list of learnwares searched by the learnware market based on user specifications, the user can reuse each learnware in the ``learnware_list`` through ``JobSelectorReuser`` or ``AveragingReuser`` to predict the label of ``test_data``, thereby avoiding training a model from scratch.
 .. note:: 

    For detailed explanations of the learnware reusers mentioned below, please refer to `COMPONENTS: All Reuse Methods  <../components/learnware.html#all-reuse-methods>`_ .

 Homo Reuse
 ====================

 .. code-block:: python
 This part introduces baseline methods for reusing homogeneous learnwares to make predictions on unlabeled data.

 Direct reuse of Learnware
 --------------------------

    from sklearn.datasets import load_digits
    from learnware.learnware import JobSelectorReuser, AveragingReuser
 - ``JobSelector`` selects different learnwares for different data by training a ``job selector`` classifier. The following code shows how to use it:

    # Load user data
    X, y = load_digits(return_X_y=True)
    test_data = X
 .. code:: python

    # Based on user information, the learnware market returns a list of learnwares (learnware_list)
    # Use jobselector reuser to reuse the searched learnwares to make prediction
    from learnware.reuse import JobSelectorReuser

    # learnware_list is the list of searched learnware
    reuse_job_selector = JobSelectorReuser(learnware_list=learnware_list)
    job_selector_predict_y = reuse_job_selector.predict(user_data=test_data)

    # Use averaging ensemble reuser to reuse the searched learnwares to make prediction
    reuse_ensemble = AveragingReuser(learnware_list=learnware_list)
    ensemble_predict_y = reuse_ensemble.predict(user_data=test_data)
    # test_x is the user's data for prediction
    # predict_y is the prediction result of the reused learnwares
    predict_y = reuse_job_selector.predict(user_data=test_x)

 - ``AveragingReuser`` uses an ensemble method to make predictions. The ``mode`` parameter specifies the specific ensemble method:

 .. code:: python

    from learnware.reuse import AveragingReuser

    # Regression tasks:
    #   - mode="mean": average the learnware outputs.
    # Classification tasks:
    #   - mode="vote_by_label": majority vote for learnware output labels.
    #   - mode="vote_by_prob": majority vote for learnware output label probabilities.
    
    reuse_ensemble = AveragingReuser(
        learnware_list=learnware_list, mode="vote_by_label"
    )
    ensemble_predict_y = reuse_ensemble.predict(user_data=test_x)


 Reusing Learnware with Labeled Data
 ------------------------------------

 When users have a small amount of labeled data, they can also adapt/polish the received learnware(s) by reusing them with the labeled data, gaining even better performance. 

 - ``EnsemblePruningReuser`` selectively ensembles a subset of learnwares to choose the ones that are most suitable for the user's task:

 .. code:: python

    from learnware.reuse import EnsemblePruningReuser

    # mode="regression": Suitable for regression tasks
    # mode="classification": Suitable for classification tasks
    reuse_ensemble_pruning = EnsemblePruningReuser(
        learnware_list=learnware_list, mode="regression"
    )

    # (val_X, val_y) is the small amount of labeled data
    reuse_ensemble_pruning.fit(val_X, val_y)
    predict_y = reuse_job_selector.predict(user_data=test_x)

 - ``FeatureAugmentReuser`` helps users reuse learnwares by augmenting features. This reuser regards each received learnware as a feature augmentor, taking its output as a new feature and then build a simple model on the augmented feature set(``logistic regression`` for classification tasks and ``ridge regression`` for regression tasks):

 .. code:: python

    from learnware.reuse import FeatureAugmentReuser

    # mode="regression": Suitable for regression tasks
    # mode="classification": Suitable for classification tasks
    reuse_feature_augment = FeatureAugmentReuser(
        learnware_list=learnware_list, mode="regression"
    )

    # (val_X, val_y) is the small amount of labeled data
    reuse_feature_augment.fit(val_X, val_y)
    predict_y = reuse_feature_augment.predict(user_data=test_x)


 Hetero Reuse
 ====================

 When heterogeneous learnware search is activated(see `WORKFLOWS: Hetero Search <../workflows/search.html#hetero-search>`_), users would receive heterogeneous learnwares which are identified from the whole "specification world". 
 Though these recommended learnwares are trained from tasks with different feature/label spaces from the user's task, they can still be helpful and perform well beyond their original purpose.
 Normally these learnwares are hard to be used, leave alone polished by users, due to the feature/label space heterogeneity. However with the help of ``HeteroMapAlignLearnware`` class which align heterogeneous learnware
 with the user's task, users can easily reuse them with the same set of reuse methods mentioned above.

 During the alignment process of heterogeneous learnware, the statistical specifications of the learnware and the user's task ``(user_spec)`` are used for input space alignment, 
 and a small amount of labeled data ``(val_x, val_y)`` is mandatory to be used for output space alignment. This can be done by the following code:

 .. code:: python

    from learnware.reuse import HeteroMapAlignLearnware

    # mode="regression": For user tasks of regression
    # mode="classification": For user tasks of classification
    hetero_learnware = HeteroMapAlignLearnware(learnware=leanrware, mode="regression")
    hetero_learnware.align(user_spec, val_x, val_y)

    # Make predictions using the aligned heterogeneous learnware
    predict_y = hetero_learnware.predict(user_data=test_x)

 To reuse multiple heterogeneous learnwares, 
 combine ``HeteroMapAlignLearnware`` with the homogeneous reuse methods ``AveragingReuser`` and ``EnsemblePruningReuser`` mentioned above will do the trick:

 .. code:: python

    hetero_learnware_list = []
    for learnware in learnware_list:
        hetero_learnware = HeteroMapAlignLearnware(learnware, mode="regression")
        hetero_learnware.align(user_spec, val_x, val_y)
        hetero_learnware_list.append(hetero_learnware)
                
    # Reuse multiple heterogeneous learnwares using AveragingReuser
    reuse_ensemble = AveragingReuser(learnware_list=hetero_learnware_list, mode="mean")
    ensemble_predict_y = reuse_ensemble.predict(user_data=test_x)

    # Reuse multiple heterogeneous learnwares using EnsemblePruningReuser
    reuse_ensemble = EnsemblePruningReuser(
        learnware_list=hetero_learnware_list, mode="regression"
    )
    reuse_ensemble.fit(val_x, val_y)
    ensemble_pruning_predict_y = reuse_ensemble.predict(user_data=test_x)

 Reuse with Container
 =====================
--- a/docs/workflows/search.rst
+++ b/docs/workflows/search.rst
@@ -2,111 +2,95 @@
 Learnwares Search
 ============================================================

 When a user comes with her requirements, the market should identify helpful learnwares and recommend them to the user.
 The search of helpful learnwares is based on the user information, and can be divided into two stages: semantic specification search and statistical specification search.
 ``Learnware Searcher`` is a key component of ``Learnware Market`` that identifies and recommends helpful learnwares to users according to their ``UserInfo``. Based on whether the returned learnware dimensions are consistent with user tasks, the searchers can be divided into two categories: homogeneous searchers and heterogeneous searchers. 

 All the searchers are implemented as a subclass of ``BaseSearcher``. When initializing, you should assign a ``organizer`` to it. The introduction of ``organizer`` is shown in `COMPONENTS: Market - Framework <../components/market.html>`_. Then these searchers can be called with ``UserInfo`` and return ``SearchResults``.

 Homo Search
 ======================

 User information
 -------------------------------
 The user should provide her requirements in ``BaseUserInfo``. The class ``BaseUserInfo`` consists of user's semantic specification ``user_semantic`` and statistical information ``stat_info``. 

 The semantic specification ``user_semantic`` is stored in a ``dict``, with keywords 'Data', 'Task', 'Library', 'Scenario', 'Description' and 'Name'. An example is shown below, and you could choose their values according to the figure. For the keys of type 'Class, you should choose one ilegal value; for the keys of type 'Tag', you can choose one or more values; for the keys of type 'String', you should provide a string; the key 'Description' is used in learnwares' semantic specifications and is ignored in user semantic specification; the values of all these keys can be empty if the user have no idea of them.
 The homogeneous search of helpful learnwares can be divided into two stages: semantic specification search and statistical specification search. Both of them needs ``BaseUserInfo`` as input. The following codes shows how to use the searcher to search for helpful learnwares from a market ``easy_market`` for a user. The introduction of ``EasyMarket`` is in `COMPONENTS: Market <../components/market.html>`_.

 .. code-block:: python

    # An example of user_semantic
    # generate BaseUserInfo(semantic_spec + stat_info)
    user_semantic = {
        "Data": {"Values": ["Image"], "Type": "Class"},
        "Task": {"Values": ["Classification"], "Type": "Class"},
        "Library": {"Values": ["Scikit-learn"], "Type": "Tag"},
        "Scenario": {"Values": ["Education"], "Type": "Class"},
        "Data": {"Values": ["Table"], "Type": "Class"},
        "Task": {"Values": ["Regression"], "Type": "Class"},
        "Library": {"Values": ["Scikit-learn"], "Type": "Class"},
        "Scenario": {"Values": ["Business"], "Type": "Tag"},
        "Description": {"Values": "", "Type": "String"},
        "Name": {"Values": "digits", "Type": "String"},
        "Name": {"Values": "", "Type": "String"},
        "Input": {"Dimension": 82, "Description": {},},
        "Output": {"Dimension": 1, "Description": {},}, 
        "License": {"Values": ["MIT"], "Type": "Class"},
    }


 .. image:: ../_static/img/semantic_spec.png
   :align: center


 The user's statistical information ``stat_info`` is stored in a ``json`` file, e.g., ``stat.json``. The generation of this file is seen in `Learnware Preparation <./submit>`_.



 Semantic Specification Search
 -------------------------------
 To search for learnwares that fit your task purpose, 
 the user should first provide a semantic specification ``user_semantic`` that describes the characteristics of your task.
 The Learnware Market will perform a first-stage search based on ``user_semantic``,
 identifying potentially helpful leaarnwares whose models solve tasks similar to your requirements. 

 .. code-block:: python

    # construct user_info which includes semantic specification for searching learnware
    user_info = BaseUserInfo(semantic_spec=semantic_spec)

    # search_learnware performs semantic specification search if user_info doesn't include a statistical specification
    _, single_learnware_list, _ = easy_market.search_learnware(user_info) 

    # single_learnware_list is the learnware list by semantic specification searching
    print(single_learnware_list)

 In semantic specification search, we go through all learnwares in the market to compare their semantic specifications with the user's one, and return all the learnwares that pass through the comparation. When comparing two learnwares' semantic specifications, we design different ways for different semantic keys:

 - For semantic keys with type 'Class', they are matched only if they have the same value.

 - For semantic keys with type 'Tag', they are matched only if they have nonempty intersections.

 - For the user's input in the search box, it matchs with a learnware's semantic specification only if it's a substring of its 'Name' or 'Description'. All the strings are converted to the lower case before matching.

 - When a key value is missing, it will not participate in the match. The user could upload no semantic specifications if he wants.

 Statistical Specification Search
 ---------------------------------

 If you choose to provide your own statistical specification file ``stat.json``, 
 the Learnware Market can perform a more accurate leanware selection from 
 the learnwares returned by the previous step. This second-stage search is based on statistical information and returns one or more learnwares that are most likely to be helpful for your task. 

 For example, the following code is designed to work with Reduced Kernel Mean Embedding (RKME) as a statistical specification:

 .. code-block:: python

    import learnware.specification as specification

    user_spec = specification.RKMETableSpecification()
    user_spec.load(os.path.join("rkme.json"))
    user_spec = generate_rkme_table_spec(X=x)
    user_info = BaseUserInfo(
        semantic_spec=user_semantic, stat_info={"RKMETableSpecification": user_spec}
        semantic_spec=user_semantic, 
        stat_info={"RKMETableSpecification": user_spec}
    )
    (sorted_score_list, single_learnware_list,
        mixture_score, mixture_learnware_list) = easy_market.search_learnware(user_info)

    # sorted_score_list is the learnware scores based on MMD distances, sorted in descending order
    print(sorted_score_list) 
    # search the market for the user
    search_result = easy_market.search_learnware(user_info)

    # single_learnware_list is the learnwares sorted in descending order based on their scores
    print(single_learnware_list)
    # search result: single_result
    single_result = search_result.get_single_results()
    print(f"single model num: {len(single_result)}, 
        max_score: {single_result[0].score}, 
        min_score: {single_result[-1].score}"
    )
    
    # search result: multiple_result
    multiple_result = search_result.get_multiple_results()
    mixture_id = " ".join([learnware.id for learnware in multiple_result[0].learnwares])
    print(f"mixture_score: {multiple_result[0].score}, mixture_learnwares: {mixture_id}")

    # mixture_learnware_list is the learnwares whose mixture is helpful for your task
    print(mixture_learnware_list) 
 Hetero Search
 ======================

    # mixture_score is the score of the mixture of learnwares
    print(mixture_score)
 For table-based user tasks, 
 homogeneous searchers like ``EasySearcher`` fail to recommend learnwares when no table learnware matches the user task's feature dimension, returning empty results.
 To enhance functionality, ``learnware`` package includes the heterogeneous learnware search feature, whose processions is as follows: 

 The return values of statistical specification search are ``sorted_score_list``, ``single_learnware_list``, ``mixture_score`` and ``mixture_learnware_list``.
 ``sorted_score_list`` and ``single_learnware_list`` are the ranking of each single learnware and the corresponding scores. We return at least 15 learnwares unless there're no enough ones. If there are more than 15 matched learnwares, the ones with scores less than 50 will be ignored.
 ``mixture_score`` and ``mixture_learnware_list`` are the chosen mixture learnwares and the corresponding score. At most 5 learnwares will be chosen, whose mixture may have a relatively good performance on the user's task.
 - Learnware markets such as ``Hetero Market`` integrate different specification islands into a unified "specification world" by assigning system-level specifications to all learnwares. This allows heterogeneous searchers like ``HeteroSearcher`` to find helpful learnwares from all available table learnwares.
 - Searchers assign system-level specifications to users based on ``UserInfo``'s statistical specification, using methods provided by corresponding organizers. In ``Hetero Market``, for example, ``HeteroOrganizer.generate_hetero_map_spec`` generates system-level specifications for users.
 - Finally searchers conduct statistical specification search across the "specification world". User's system-level specification will guide the searcher in pinpointing helpful heterogeneous learnwares.

 To activate heterogeneous learnware search, ``UserInfo`` should contain both semantic and statistical specifications. What's more, the semantic specification should meet the following requirements: 

 The statistical specification search is done in the following way.
 We first filter by the dimension of RKME specifications; only those with the same dimension with the user's will enter the subsequent stage.
 - The task type should be ``Classification`` or ``Regression``.
 - The data type should be ``Table``.
 - It should include description for at least one feature dimension.
 - The feature dimension stated here should match with the feature dimension in the statistical specification.

 The single_learnware_list is calculated using the distances between two RKMEs. The greater the distance from the user's RKME, the lower the score is. 
 The following codes shows how to search for helpful heterogeneous learnwares from a market 
 ``hetero_market`` for a user. The introduction of ``HeteroMarket`` is in `COMPONENTS: Hetero Market <../components/market.html#hetero-market>`_.

 The mixture_learnware_list is calculated in a greedy way. Each time we choose a learnware to make their mixture closer to the user's RKME. Specifically, each time we go through all the left learnwares to find the one whose combination with chosen learnwares could minimize the distance between their mixture's RKME and the user's RKME. The mixture weight is calculated by minimizing the RKME distance, which is solved by quadratic programming. If the distance become larger or the number of chosen learnwares reaches a threshold, the process will end and the chosen learnware and weight list will return.
 .. code-block:: python

 Hetero Search
 ======================
  # initiate a Hetero Market
  hetero_market = initiate_learnware_market(market_id="test_hetero", name="hetero")
  
  # user_semantic should meet the above requirements
  input_description = {
      "Dimension": 2,
      "Description": {
          "0": "leaf width",
          "1": "leaf length",
      },
  }
  user_semantic = generate_semantic_spec(
      data_type="table",
      task_type="Classification",
      scenarios=["Business"],
      input_description=input_description,
  )
  user_spec = generate_stat_spec(type="table", X=train_x)
  user_info = BaseUserInfo(
      semantic_spec=user_semantic,
      stat_info={user_spec.type: user_spec}
  )

  # search for heterogeneous learnwares in hetero_market
  search_result = hetero_market.search_learnware(user_info)
--- a/examples/dataset_text_workflow/README.md
+++ b/examples/dataset_text_workflow/README.md
@@ -0,0 +1,58 @@
 # Text Dataset Workflow Example

 ## Introduction

 We conducted experiments on the widely used text benchmark dataset: [``20-newsgroup``](http://qwone.com/~jason/20Newsgroups/).
 ``20-newsgroup`` is a renowned text classification benchmark with a hierarchical structure, featuring 5 superclasses {comp, rec, sci, talk, misc}.

 In the submitting stage, we enumerated all combinations of three superclasses from the five available, randomly sampling 50% of each combination from the training set to create datasets for 50 uploaders.

 In the deploying stage, we considered all combinations of two superclasses out of the five, selecting all data for each combination from the testing set as a test dataset for one user. This resulted in 10 users.
 The user's own training data was generated using the same sampling procedure as the user test data, despite originating from the training dataset.

 Model training comprised two parts: the first part involved training a tfidf feature extractor, and the second part used the extracted text feature vectors to train a naive Bayes classifier.

 Our experiments comprises two components:

 * ``unlabeled_text_example`` is designed to evaluate performance when users possess only testing data, searching and reusing learnware available in the market.

 * ``labeled_text_example`` aims to assess performance when users have both testing and limited training data, searching and reusing learnware directly from the market instead of training a model from scratch. This helps determine the amount of training data saved for the user.


 ## Run the code

 Run the following command to start the ``unlabeled_text_example``.

 ```bash
 python workflow.py unlabeled_text_example
 ```

 Run the following command to start the ``labeled_text_example``.

 ```bash
 python workflow.py labeled_text_example
 ```

 ## Results

 ### ``unlabeled_text_example``:

 The accuracy of search and reuse is presented in the table below:

 | Metric                               | Value               |
 |--------------------------------------|---------------------|
 | Mean in Market (Single)              | 0.507 ± 0.030       |
 | Best in Market (Single)              | 0.859 ± 0.051       |
 | Top-1 Reuse (Single)                 | 0.846 ± 0.054       |
 | Job Selector Reuse (Multiple)        | 0.845 ± 0.053       |
 | Average Ensemble Reuse (Multiple)    | 0.862 ± 0.051       |

 ### ``labeled_text_example``:

 We present the change curves in classification error rates for both the user's self-trained model and the multiple learnware reuse(EnsemblePrune), showcasing their performance on the user's test data as the user's training data increases. The average results across 10 users are depicted below:

 <div align=center>
  <img src="../../docs/_static/img/text_labeled_curves.png" alt="Text Limited Labeled Data" style="width:50%;" />
 </div>

 From the figure above, it is evident that when the user's own training data is limited, the performance of multiple learnware reuse surpasses that of the user's own model. As the user's training data grows, it is expected that the user's model will eventually outperform the learnware reuse. This underscores the value of reusing learnware to significantly conserve training data and achieve superior performance when user training data is limited.
--- a/examples/dataset_text_workflow/config.py
+++ b/examples/dataset_text_workflow/config.py
@@ -0,0 +1,62 @@
 from learnware.tests.benchmarks import BenchmarkConfig


 text_benchmark_config = BenchmarkConfig(
    name="20-Newsgroups",
    user_num=10,
    learnware_ids=[
        "00002193",
        "00002192",
        "00002191",
        "00002190",
        "00002189",
        "00002188",
        "00002187",
        "00002186",
        "00002185",
        "00002184",
        "00002183",
        "00002182",
        "00002181",
        "00002180",
        "00002179",
        "00002178",
        "00002177",
        "00002176",
        "00002175",
        "00002174",
        "00002173",
        "00002172",
        "00002171",
        "00002170",
        "00002169",
        "00002168",
        "00002167",
        "00002166",
        "00002165",
        "00002164",
        "00002163",
        "00002162",
        "00002161",
        "00002160",
        "00002159",
        "00002158",
        "00002157",
        "00002156",
        "00002155",
        "00002154",
        "00002153",
        "00002152",
        "00002151",
        "00002150",
        "00002149",
        "00002148",
        "00002147",
        "00002146",
        "00002145",
        "00002144",
    ],
    test_data_path="20-Newsgroups/test_data.zip",
    train_data_path="20-Newsgroups/train_data.zip",
    extra_info_path="20-Newsgroup/extra_info.zip",
 )
--- a/examples/dataset_text_workflow/example_files/example_init.py
+++ b/examples/dataset_text_workflow/example_files/example_init.py
@@ -1,29 +0,0 @@
 import os
 import pickle

 import numpy as np

 from learnware.model import BaseModel


 class Model(BaseModel):
    def __init__(self):
        super(Model, self).__init__(input_shape=(1,), output_shape=(1,))
        dir_path = os.path.dirname(os.path.abspath(__file__))

        modelv_path = os.path.join(dir_path, "modelv.pth")
        with open(modelv_path, "rb") as f:
            self.modelv = pickle.load(f)

        modell_path = os.path.join(dir_path, "modell.pth")
        with open(modell_path, "rb") as f:
            self.modell = pickle.load(f)

    def fit(self, X: np.ndarray, y: np.ndarray):
        pass

    def predict(self, X: np.ndarray) -> np.ndarray:
        return self.modell.predict(self.modelv.transform(X))

    def finetune(self, X: np.ndarray, y: np.ndarray):
        pass
--- a/examples/dataset_text_workflow/example_files/example_yaml.yaml
+++ b/examples/dataset_text_workflow/example_files/example_yaml.yaml
@@ -1,8 +0,0 @@
 model:
  class_name: Model
  kwargs: { }
 stat_specifications:
  - module_path: learnware.specification
    class_name: RKMETextSpecification
    file_name: rkme.json
    kwargs: { }
--- a/examples/dataset_text_workflow/example_files/requirements.txt
+++ b/examples/dataset_text_workflow/example_files/requirements.txt
@@ -1,4 +0,0 @@
 numpy
 pickle
 lightgbm
 scikit-learn
--- a/examples/dataset_text_workflow/get_data.py
+++ b/examples/dataset_text_workflow/get_data.py
@@ -1,16 +0,0 @@
 import os

 import pandas as pd


 def get_data(data_root="./data"):
    dtrain = pd.read_csv(os.path.join(data_root, "train.csv"))
    dtest = pd.read_csv(os.path.join(data_root, "test.csv"))

    # returned X(DataFrame), y(Series)
    return (
        dtrain[["discourse_text", "discourse_type"]],
        dtrain["discourse_effectiveness"],
        dtest[["discourse_text", "discourse_type"]],
        dtest["discourse_effectiveness"],
    )
--- a/examples/dataset_text_workflow/main.py
+++ b/examples/dataset_text_workflow/main.py
@@ -1,274 +0,0 @@
 import os
 import fire
 import pickle
 import time
 import zipfile
 from shutil import copyfile, rmtree

 import numpy as np

 import learnware.specification as specification
 from get_data import get_data
 from learnware.logger import get_module_logger
 from learnware.market import instantiate_learnware_market, BaseUserInfo
 from learnware.reuse import JobSelectorReuser, AveragingReuser, EnsemblePruningReuser
 from utils import generate_uploader, generate_user, TextDataLoader, train, eval_prediction

 logger = get_module_logger("text_test", level="INFO")
 origin_data_root = "./data/origin_data"
 processed_data_root = "./data/processed_data"
 tmp_dir = "./data/tmp"
 learnware_pool_dir = "./data/learnware_pool"
 dataset = "ae"  # argumentative essays
 n_uploaders = 7
 n_users = 7
 n_classes = 3
 data_root = os.path.join(origin_data_root, dataset)
 data_save_root = os.path.join(processed_data_root, dataset)
 user_save_root = os.path.join(data_save_root, "user")
 uploader_save_root = os.path.join(data_save_root, "uploader")
 model_save_root = os.path.join(data_save_root, "uploader_model")
 os.makedirs(data_root, exist_ok=True)
 os.makedirs(user_save_root, exist_ok=True)
 os.makedirs(uploader_save_root, exist_ok=True)
 os.makedirs(model_save_root, exist_ok=True)

 output_description = {
    "Dimension": 1,
    "Description": {
        "0": "classify as 0(ineffective), 1(effective), or 2(adequate).",
    },
 }
 semantic_specs = [
    {
        "Data": {"Values": ["Text"], "Type": "Class"},
        "Task": {"Values": ["Classification"], "Type": "Class"},
        "Library": {"Values": ["Scikit-learn"], "Type": "Class"},
        "Scenario": {"Values": ["Education"], "Type": "Tag"},
        "Description": {"Values": "", "Type": "String"},
        "Name": {"Values": "learnware_1", "Type": "String"},
        "Output": output_description,
        "License": {"Values": ["MIT"], "Type": "Class"},
    }
 ]

 user_semantic = {
    "Data": {"Values": ["Text"], "Type": "Class"},
    "Task": {"Values": ["Classification"], "Type": "Class"},
    "Library": {"Values": ["Scikit-learn"], "Type": "Class"},
    "Scenario": {"Values": ["Education"], "Type": "Tag"},
    "Description": {"Values": "", "Type": "String"},
    "Name": {"Values": "", "Type": "String"},
    "Output": output_description,
    "License": {"Values": ["MIT"], "Type": "Class"},
 }


 class TextDatasetWorkflow:
    def _init_text_dataset(self):
        self._prepare_data()
        self._prepare_model()

    def _prepare_data(self):
        X_train, y_train, X_test, y_test = get_data(data_root)

        generate_uploader(X_train, y_train, n_uploaders=n_uploaders, data_save_root=uploader_save_root)
        generate_user(X_test, y_test, n_users=n_users, data_save_root=user_save_root)

    def _prepare_model(self):
        dataloader = TextDataLoader(data_save_root, train=True)
        for i in range(n_uploaders):
            logger.info("Train on uploader: %d" % (i))
            X, y = dataloader.get_idx_data(i)
            vectorizer, lgbm = train(X, y, out_classes=n_classes)

            modelv_save_path = os.path.join(model_save_root, "uploader_v_%d.pth" % (i))
            modell_save_path = os.path.join(model_save_root, "uploader_l_%d.pth" % (i))

            with open(modelv_save_path, "wb") as f:
                pickle.dump(vectorizer, f)

            with open(modell_save_path, "wb") as f:
                pickle.dump(lgbm, f)

            logger.info("Model saved to '%s' and '%s'" % (modelv_save_path, modell_save_path))

    def _prepare_learnware(
        self, data_path, modelv_path, modell_path, init_file_path, yaml_path, env_file_path, save_root, zip_name
    ):
        os.makedirs(save_root, exist_ok=True)
        tmp_spec_path = os.path.join(save_root, "rkme.json")

        tmp_modelv_path = os.path.join(save_root, "modelv.pth")
        tmp_modell_path = os.path.join(save_root, "modell.pth")

        tmp_yaml_path = os.path.join(save_root, "learnware.yaml")
        tmp_init_path = os.path.join(save_root, "__init__.py")
        tmp_env_path = os.path.join(save_root, "requirements.txt")

        with open(data_path, "rb") as f:
            X = pickle.load(f)
        semantic_spec = semantic_specs[0]

        st = time.time()

        user_spec = specification.RKMETextSpecification()

        user_spec.generate_stat_spec_from_data(X=X)
        ed = time.time()
        logger.info("Stat spec generated in %.3f s" % (ed - st))
        user_spec.save(tmp_spec_path)

        copyfile(modelv_path, tmp_modelv_path)
        copyfile(modell_path, tmp_modell_path)

        copyfile(yaml_path, tmp_yaml_path)
        copyfile(init_file_path, tmp_init_path)
        copyfile(env_file_path, tmp_env_path)
        zip_file_name = os.path.join(learnware_pool_dir, "%s.zip" % (zip_name))
        with zipfile.ZipFile(zip_file_name, "w", compression=zipfile.ZIP_DEFLATED) as zip_obj:
            zip_obj.write(tmp_spec_path, "rkme.json")

            zip_obj.write(tmp_modelv_path, "modelv.pth")
            zip_obj.write(tmp_modell_path, "modell.pth")

            zip_obj.write(tmp_yaml_path, "learnware.yaml")
            zip_obj.write(tmp_init_path, "__init__.py")
            zip_obj.write(tmp_env_path, "requirements.txt")
        rmtree(save_root)
        logger.info("New Learnware Saved to %s" % (zip_file_name))
        return zip_file_name

    def prepare_market(self, regenerate_flag=False):
        if regenerate_flag:
            self._init_text_dataset()
        text_market = instantiate_learnware_market(market_id="ae", rebuild=True)
        try:
            rmtree(learnware_pool_dir)
        except:
            pass
        os.makedirs(learnware_pool_dir, exist_ok=True)
        for i in range(n_uploaders):
            data_path = os.path.join(uploader_save_root, "uploader_%d_X.pkl" % (i))

            modelv_path = os.path.join(model_save_root, "uploader_v_%d.pth" % (i))
            modell_path = os.path.join(model_save_root, "uploader_l_%d.pth" % (i))

            init_file_path = "./example_files/example_init.py"
            yaml_file_path = "./example_files/example_yaml.yaml"
            env_file_path = "./example_files/requirements.txt"
            new_learnware_path = self._prepare_learnware(
                data_path,
                modelv_path,
                modell_path,
                init_file_path,
                yaml_file_path,
                env_file_path,
                tmp_dir,
                "%s_%d" % (dataset, i),
            )
            semantic_spec = semantic_specs[0]
            semantic_spec["Name"]["Values"] = "learnware_%d" % (i)
            semantic_spec["Description"]["Values"] = "test_learnware_number_%d" % (i)
            text_market.add_learnware(new_learnware_path, semantic_spec)

        logger.info("Total Item: %d" % (len(text_market)))

    def test(self, regenerate_flag=False):
        self.prepare_market(regenerate_flag)
        text_market = instantiate_learnware_market(market_id="ae")
        print("Total Item: %d" % len(text_market))

        select_list = []
        avg_list = []
        improve_list = []
        job_selector_score_list = []
        ensemble_score_list = []
        pruning_score_list = []
        for i in range(n_users):
            user_data_path = os.path.join(user_save_root, "user_%d_X.pkl" % (i))
            user_label_path = os.path.join(user_save_root, "user_%d_y.pkl" % (i))
            with open(user_data_path, "rb") as f:
                user_data = pickle.load(f)
            with open(user_label_path, "rb") as f:
                user_label = pickle.load(f)

            user_stat_spec = specification.RKMETextSpecification()
            user_stat_spec.generate_stat_spec_from_data(X=user_data)
            user_info = BaseUserInfo(semantic_spec=user_semantic, stat_info={"RKMETextSpecification": user_stat_spec})
            logger.info("Searching Market for user: %d" % (i))

            search_result = text_market.search_learnware(user_info)
            single_result = search_result.get_single_results()
            multiple_result = search_result.get_multiple_results()

            print(f"search result of user{i}:")
            print(
                f"single model num: {len(single_result)}, max_score: {single_result[0].score}, min_score: {single_result[-1].score}"
            )

            l = len(single_result)
            acc_list = []
            for idx in range(l):
                learnware = single_result[idx].learnware
                score = single_result[idx].score
                pred_y = learnware.predict(user_data)
                acc = eval_prediction(pred_y, user_label)
                acc_list.append(acc)
            print(
                f"Top1-score: {single_result[0].score}, learnware_id: {single_result[0].learnware.id}, acc: {acc_list[0]}"
            )

            if len(multiple_result) > 0:
                mixture_id = " ".join([learnware.id for learnware in multiple_result[0].learnwares])
                print(f"mixture_score: {multiple_result[0].score}, mixture_learnware: {mixture_id}")
                mixture_learnware_list = multiple_result[0].learnwares
            else:
                mixture_learnware_list = [single_result[0].learnware]

            # test reuse (job selector)
            reuse_baseline = JobSelectorReuser(learnware_list=mixture_learnware_list, herding_num=100)
            reuse_predict = reuse_baseline.predict(user_data=user_data)
            reuse_score = eval_prediction(reuse_predict, user_label)
            job_selector_score_list.append(reuse_score)
            print(f"mixture reuse loss(job selector): {reuse_score}")

            # test reuse (ensemble)
            reuse_ensemble = AveragingReuser(learnware_list=mixture_learnware_list, mode="vote_by_label")
            ensemble_predict_y = reuse_ensemble.predict(user_data=user_data)
            ensemble_score = eval_prediction(ensemble_predict_y, user_label)
            ensemble_score_list.append(ensemble_score)
            print(f"mixture reuse accuracy (ensemble): {ensemble_score}")

            # test reuse (ensemblePruning)
            reuse_pruning = EnsemblePruningReuser(learnware_list=mixture_learnware_list)
            pruning_predict_y = reuse_pruning.predict(user_data=user_data)
            pruning_score = eval_prediction(pruning_predict_y, user_label)
            pruning_score_list.append(pruning_score)
            print(f"mixture reuse accuracy (ensemble Pruning): {pruning_score}\n")

            select_list.append(acc_list[0])
            avg_list.append(np.mean(acc_list))
            improve_list.append((acc_list[0] - np.mean(acc_list)) / np.mean(acc_list))

        logger.info(
            "Accuracy of selected learnware: %.3f +/- %.3f, Average performance: %.3f +/- %.3f"
            % (np.mean(select_list), np.std(select_list), np.mean(avg_list), np.std(avg_list))
        )
        logger.info("Average performance improvement: %.3f" % (np.mean(improve_list)))
        logger.info(
            "Average Job Selector Reuse Performance: %.3f +/- %.3f"
            % (np.mean(job_selector_score_list), np.std(job_selector_score_list))
        )
        logger.info(
            "Averaging Ensemble Reuse Performance: %.3f +/- %.3f"
            % (np.mean(ensemble_score_list), np.std(ensemble_score_list))
        )
        logger.info(
            "Selective Ensemble Reuse Performance: %.3f +/- %.3f"
            % (np.mean(pruning_score_list), np.std(pruning_score_list))
        )


 if __name__ == "__main__":
    fire.Fire(TextDatasetWorkflow)
--- a/examples/dataset_text_workflow/utils.py
+++ b/examples/dataset_text_workflow/utils.py
@@ -1,104 +0,0 @@
 import os
 import pickle

 import numpy as np
 import pandas as pd
 from lightgbm import LGBMClassifier
 from sklearn.feature_extraction.text import TfidfVectorizer


 class TextDataLoader:
    def __init__(self, data_root, train: bool = True):
        self.data_root = data_root
        self.train = train

    def get_idx_data(self, idx=0):
        if self.train:
            X_path = os.path.join(self.data_root, "uploader", "uploader_%d_X.pkl" % (idx))
            y_path = os.path.join(self.data_root, "uploader", "uploader_%d_y.pkl" % (idx))
            if not (os.path.exists(X_path) and os.path.exists(y_path)):
                raise Exception("Index Error")
            with open(X_path, "rb") as f:
                X = pickle.load(f)
            with open(y_path, "rb") as f:
                y = pickle.load(f)
        else:
            X_path = os.path.join(self.data_root, "user", "user_%d_X.pkl" % (idx))
            y_path = os.path.join(self.data_root, "user", "user_%d_y.pkl" % (idx))
            if not (os.path.exists(X_path) and os.path.exists(y_path)):
                raise Exception("Index Error")
            with open(X_path, "rb") as f:
                X = pickle.load(f)
            with open(y_path, "rb") as f:
                y = pickle.load(f)
        return X, y


 def generate_uploader(data_x: pd.Series, data_y: pd.Series, n_uploaders=50, data_save_root=None):
    if data_save_root is None:
        return
    os.makedirs(data_save_root, exist_ok=True)

    types = data_x["discourse_type"].unique()

    for i in range(n_uploaders):
        indices = data_x["discourse_type"] == types[i]
        selected_X = data_x[indices]["discourse_text"].to_list()
        selected_y = data_y[indices].to_list()

        X_save_dir = os.path.join(data_save_root, "uploader_%d_X.pkl" % (i))
        y_save_dir = os.path.join(data_save_root, "uploader_%d_y.pkl" % (i))
        with open(X_save_dir, "wb") as f:
            pickle.dump(selected_X, f)
        with open(y_save_dir, "wb") as f:
            pickle.dump(selected_y, f)

        print("Saving to %s" % (X_save_dir))


 def generate_user(data_x, data_y, n_users=50, data_save_root=None):
    if data_save_root is None:
        return
    os.makedirs(data_save_root, exist_ok=True)

    types = data_x["discourse_type"].unique()

    for i in range(n_users):
        indices = data_x["discourse_type"] == types[i]
        selected_X = data_x[indices]["discourse_text"].to_list()
        selected_y = data_y[indices].to_list()

        X_save_dir = os.path.join(data_save_root, "user_%d_X.pkl" % (i))
        y_save_dir = os.path.join(data_save_root, "user_%d_y.pkl" % (i))
        with open(X_save_dir, "wb") as f:
            pickle.dump(selected_X, f)
        with open(y_save_dir, "wb") as f:
            pickle.dump(selected_y, f)

        print("Saving to %s" % (X_save_dir))


 # Train Uploaders' models
 def train(X, y, out_classes):
    vectorizer = TfidfVectorizer(stop_words="english")
    X_tfidf = vectorizer.fit_transform(X)

    lgbm = LGBMClassifier(boosting_type="dart", n_estimators=500, num_leaves=21)
    lgbm.fit(X_tfidf, y)

    return vectorizer, lgbm


 def eval_prediction(pred_y, target_y):
    if not isinstance(pred_y, np.ndarray):
        pred_y = pred_y.detach().cpu().numpy()
    if len(pred_y.shape) == 1:
        predicted = np.array(pred_y)
    else:
        predicted = np.argmax(pred_y, 1)
    annos = np.array(target_y)

    total = predicted.shape[0]
    correct = (predicted == annos).sum().item()

    return correct / total
--- a/examples/dataset_text_workflow/workflow.py
+++ b/examples/dataset_text_workflow/workflow.py
@@ -0,0 +1,285 @@
 import os
 import fire
 import time
 import random
 import pickle
 import tempfile
 import numpy as np
 import matplotlib.pyplot as plt
 from sklearn.metrics import accuracy_score
 from sklearn.naive_bayes import MultinomialNB
 from sklearn.feature_extraction.text import TfidfVectorizer

 from learnware.client import LearnwareClient
 from learnware.logger import get_module_logger
 from learnware.specification import RKMETextSpecification
 from learnware.tests.benchmarks import LearnwareBenchmark
 from learnware.market import instantiate_learnware_market, BaseUserInfo
 from learnware.reuse import JobSelectorReuser, AveragingReuser, EnsemblePruningReuser
 from config import text_benchmark_config

 logger = get_module_logger("text_workflow", level="INFO")


 class TextDatasetWorkflow:
    @staticmethod
    def _train_model(X, y):
        vectorizer = TfidfVectorizer(stop_words="english")
        X_tfidf = vectorizer.fit_transform(X)
        clf = MultinomialNB(alpha=0.1)
        clf.fit(X_tfidf, y)
        return vectorizer, clf

    @staticmethod
    def _eval_prediction(pred_y, target_y):
        if not isinstance(pred_y, np.ndarray):
            pred_y = pred_y.detach().cpu().numpy()

        pred_y = np.array(pred_y) if len(pred_y.shape) == 1 else np.argmax(pred_y, 1)
        target_y = np.array(target_y)
        return accuracy_score(target_y, pred_y)

    def _plot_labeled_peformance_curves(self, all_user_curves_data):
        plt.figure(figsize=(10, 6))
        plt.xticks(range(len(self.n_labeled_list)), self.n_labeled_list)

        styles = [
            {"color": "navy", "linestyle": "-", "marker": "o"},
            {"color": "magenta", "linestyle": "-.", "marker": "d"},
        ]
        labels = ["User Model", "Multiple Learnware Reuse (EnsemblePrune)"]

        user_mat, pruning_mat = all_user_curves_data
        user_mat, pruning_mat = np.array(user_mat), np.array(pruning_mat)
        for mat, style, label in zip([user_mat, pruning_mat], styles, labels):
            mean_curve, std_curve = 1 - np.mean(mat, axis=0), np.std(mat, axis=0)
            plt.plot(mean_curve, **style, label=label)
            plt.fill_between(
                range(len(mean_curve)),
                mean_curve - 0.5 * std_curve,
                mean_curve + 0.5 * std_curve,
                color=style["color"],
                alpha=0.2,
            )

        plt.xlabel("Labeled Data Size")
        plt.ylabel("1 - Accuracy")
        plt.title(f"Text Limited Labeled Data")
        plt.legend()
        plt.tight_layout()
        plt.savefig(os.path.join(self.fig_path, "text_labeled_curves.png"), bbox_inches="tight", dpi=700)

    def _prepare_market(self, rebuild=False):
        client = LearnwareClient()
        self.text_benchmark = LearnwareBenchmark().get_benchmark(text_benchmark_config)
        self.text_market = instantiate_learnware_market(market_id=self.text_benchmark.name, rebuild=rebuild)
        self.user_semantic = client.get_semantic_specification(self.text_benchmark.learnware_ids[0])
        self.user_semantic["Name"]["Values"] = ""

        if len(self.text_market) == 0 or rebuild == True:
            for learnware_id in self.text_benchmark.learnware_ids:
                with tempfile.TemporaryDirectory(prefix="text_benchmark_") as tempdir:
                    zip_path = os.path.join(tempdir, f"{learnware_id}.zip")
                    for i in range(20):
                        try:
                            semantic_spec = client.get_semantic_specification(learnware_id)
                            client.download_learnware(learnware_id, zip_path)
                            self.text_market.add_learnware(zip_path, semantic_spec)
                            break
                        except:
                            time.sleep(1)
                            continue

        logger.info("Total Item: %d" % (len(self.text_market)))

    def unlabeled_text_example(self, rebuild=False):
        self._prepare_market(rebuild)

        select_list = []
        avg_list = []
        best_list = []
        improve_list = []
        job_selector_score_list = []
        ensemble_score_list = []
        all_learnwares = self.text_market.get_learnwares()

        for i in range(self.text_benchmark.user_num):
            user_data, user_label = self.text_benchmark.get_test_data(user_ids=i)

            user_stat_spec = RKMETextSpecification()
            user_stat_spec.generate_stat_spec_from_data(X=user_data)
            user_info = BaseUserInfo(
                semantic_spec=self.user_semantic, stat_info={"RKMETextSpecification": user_stat_spec}
            )
            logger.info("Searching Market for user: %d" % (i))

            search_result = self.text_market.search_learnware(user_info)
            single_result = search_result.get_single_results()
            multiple_result = search_result.get_multiple_results()

            print(f"search result of user{i}:")
            print(
                f"single model num: {len(single_result)}, max_score: {single_result[0].score}, min_score: {single_result[-1].score}"
            )

            acc_list = []
            for idx in range(len(all_learnwares)):
                learnware = all_learnwares[idx]
                pred_y = learnware.predict(user_data)
                acc = self._eval_prediction(pred_y, user_label)
                acc_list.append(acc)

            learnware = single_result[0].learnware
            pred_y = learnware.predict(user_data)
            best_acc = self._eval_prediction(pred_y, user_label)
            best_list.append(np.max(acc_list))
            select_list.append(best_acc)
            avg_list.append(np.mean(acc_list))
            improve_list.append((best_acc - np.mean(acc_list)) / np.mean(acc_list))
            print(f"market mean accuracy: {np.mean(acc_list)}, market best accuracy: {np.max(acc_list)}")
            print(
                f"Top1-score: {single_result[0].score}, learnware_id: {single_result[0].learnware.id}, acc: {best_acc}"
            )

            if len(multiple_result) > 0:
                mixture_id = " ".join([learnware.id for learnware in multiple_result[0].learnwares])
                print(f"mixture_score: {multiple_result[0].score}, mixture_learnware: {mixture_id}")
                mixture_learnware_list = multiple_result[0].learnwares
            else:
                mixture_learnware_list = [single_result[0].learnware]

            # test reuse (job selector)
            reuse_baseline = JobSelectorReuser(learnware_list=mixture_learnware_list, herding_num=100)
            reuse_predict = reuse_baseline.predict(user_data=user_data)
            reuse_score = self._eval_prediction(reuse_predict, user_label)
            job_selector_score_list.append(reuse_score)
            print(f"mixture reuse accuracy (job selector): {reuse_score}")

            # test reuse (ensemble)
            reuse_ensemble = AveragingReuser(learnware_list=mixture_learnware_list, mode="vote_by_label")
            ensemble_predict_y = reuse_ensemble.predict(user_data=user_data)
            ensemble_score = self._eval_prediction(ensemble_predict_y, user_label)
            ensemble_score_list.append(ensemble_score)
            print(f"mixture reuse accuracy (ensemble): {ensemble_score}\n")

        logger.info(
            "Accuracy of selected learnware: %.3f +/- %.3f, Average performance: %.3f +/- %.3f, Best performance: %.3f +/- %.3f"
            % (
                np.mean(select_list),
                np.std(select_list),
                np.mean(avg_list),
                np.std(avg_list),
                np.mean(best_list),
                np.std(best_list),
            )
        )
        logger.info("Average performance improvement: %.3f" % (np.mean(improve_list)))
        logger.info(
            "Average Job Selector Reuse Performance: %.3f +/- %.3f"
            % (np.mean(job_selector_score_list), np.std(job_selector_score_list))
        )
        logger.info(
            "Averaging Ensemble Reuse Performance: %.3f +/- %.3f"
            % (np.mean(ensemble_score_list), np.std(ensemble_score_list))
        )

    def labeled_text_example(self, rebuild=False, train_flag=True):
        self.n_labeled_list = [100, 200, 500, 1000, 2000, 4000]
        self.repeated_list = [10, 10, 10, 3, 3, 3]
        self.root_path = os.path.dirname(os.path.abspath(__file__))
        self.fig_path = os.path.join(self.root_path, "figs")
        self.curve_path = os.path.join(self.root_path, "curves")

        if train_flag:
            self._prepare_market(rebuild)
            os.makedirs(self.fig_path, exist_ok=True)
            os.makedirs(self.curve_path, exist_ok=True)

            for i in range(self.text_benchmark.user_num):
                user_model_score_mat = []
                pruning_score_mat = []
                single_score_mat = []
                test_x, test_y = self.text_benchmark.get_test_data(user_ids=i)
                test_y = np.array(test_y)

                train_x, train_y = self.text_benchmark.get_train_data(user_ids=i)
                train_y = np.array(train_y)

                user_stat_spec = RKMETextSpecification()
                user_stat_spec.generate_stat_spec_from_data(X=test_x)
                user_info = BaseUserInfo(
                    semantic_spec=self.user_semantic, stat_info={"RKMETextSpecification": user_stat_spec}
                )
                logger.info(f"Searching Market for user_{i}")

                search_result = self.text_market.search_learnware(user_info)
                single_result = search_result.get_single_results()
                multiple_result = search_result.get_multiple_results()

                learnware = single_result[0].learnware
                pred_y = learnware.predict(test_x)
                best_acc = self._eval_prediction(pred_y, test_y)
                print(f"search result of user_{i}:")
                print(
                    f"single model num: {len(single_result)}, max_score: {single_result[0].score}, min_score: {single_result[-1].score}, single model acc: {best_acc}"
                )

                if len(multiple_result) > 0:
                    mixture_id = " ".join([learnware.id for learnware in multiple_result[0].learnwares])
                    print(f"mixture_score: {multiple_result[0].score}, mixture_learnware: {mixture_id}")
                    mixture_learnware_list = multiple_result[0].learnwares
                else:
                    mixture_learnware_list = [single_result[0].learnware]
                print(len(train_x))

                for n_label, repeated in zip(self.n_labeled_list, self.repeated_list):
                    user_model_score_list, reuse_pruning_score_list = [], []
                    if n_label > len(train_x):
                        n_label = len(train_x)
                    for _ in range(repeated):
                        x_train, y_train = zip(*random.sample(list(zip(train_x, train_y)), k=n_label))
                        x_train = list(x_train)
                        y_train = np.array(list(y_train))

                        modelv, modell = self._train_model(x_train, y_train)
                        user_model_predict_y = modell.predict(modelv.transform(test_x))
                        user_model_score = self._eval_prediction(user_model_predict_y, test_y)
                        user_model_score_list.append(user_model_score)

                        reuse_pruning = EnsemblePruningReuser(
                            learnware_list=mixture_learnware_list, mode="classification"
                        )
                        reuse_pruning.fit(x_train, y_train)
                        reuse_pruning_predict_y = reuse_pruning.predict(user_data=test_x)
                        reuse_pruning_score = self._eval_prediction(reuse_pruning_predict_y, test_y)
                        reuse_pruning_score_list.append(reuse_pruning_score)

                    single_score_mat.append([best_acc] * repeated)
                    user_model_score_mat.append(user_model_score_list)
                    pruning_score_mat.append(reuse_pruning_score_list)
                    print(n_label, np.mean(user_model_score_mat[-1]), np.mean(pruning_score_mat[-1]))

                logger.info(f"Saving Curves for User_{i}")
                user_curves_data = (single_score_mat, user_model_score_mat, pruning_score_mat)
                with open(os.path.join(self.curve_path, f"curve{str(i)}.pkl"), "wb") as f:
                    pickle.dump(user_curves_data, f)

        pruning_curves_data, user_model_curves_data = [], []
        for i in range(self.text_benchmark.user_num):
            with open(os.path.join(self.curve_path, f"curve{str(i)}.pkl"), "rb") as f:
                user_curves_data = pickle.load(f)
                (single_score_mat, user_model_score_mat, pruning_score_mat) = user_curves_data
            for i in range(len(single_score_mat)):
                user_model_score_mat[i] = np.mean(user_model_score_mat[i])
                pruning_score_mat[i] = np.mean(pruning_score_mat[i])
            if len(user_model_score_mat) < 6:
                for i in range(6 - len(user_model_score_mat)):
                    user_model_score_mat.append(user_model_score_mat[-1])
                    pruning_score_mat.append(pruning_score_mat[-1])
            user_model_curves_data.append(user_model_score_mat[:6])
            pruning_curves_data.append(pruning_score_mat[:6])
        self._plot_labeled_peformance_curves([user_model_curves_data, pruning_curves_data])


 if __name__ == "__main__":
    fire.Fire(TextDatasetWorkflow)
--- a/learnware/init.py
+++ b/learnware/init.py
@@ -1,4 +1,4 @@
 __version__ = "0.2.0.7"
 __version__ = "0.2.0.9"

 import os
 import json
--- a/learnware/client/learnware_client.py
+++ b/learnware/client/learnware_client.py
@@ -14,10 +14,11 @@ from typing import Union, List, Optional
 from ..config import C
 from .container import LearnwaresContainer
 from ..market import BaseChecker
 from ..specification import generate_semantic_spec
 from ..logger import get_module_logger
 from ..learnware import get_learnware_from_dirpath
 from ..market import BaseUserInfo
 from ..tests import get_semantic_specification


 CHUNK_SIZE = 1024 * 1024
 logger = get_module_logger(module_name="LearnwareClient")
@@ -52,8 +53,8 @@ class SemanticSpecificationKey(Enum):
    DATA_TYPE = "Data"
    TASK_TYPE = "Task"
    LIBRARY_TYPE = "Library"
    LICENSE = "License"
    SENARIOES = "Scenario"
    LICENSE = "License"


 class LearnwareClient:
@@ -67,8 +68,16 @@ class LearnwareClient:

        self.chunk_size = 1024 * 1024
        self.tempdir_list = []
        self.login_status = False
        atexit.register(self.cleanup)

    def is_connected(self):
        url = f"{self.host}/auth/login_by_token"
        response = requests.post(url)
        if response.status_code == 404:
            return False
        return True

    def login(self, email, token):
        url = f"{self.host}/auth/login_by_token"

@@ -80,6 +89,10 @@ class LearnwareClient:

        token = result["data"]["token"]
        self.headers = {"Authorization": f"Bearer {token}"}
        self.login_status = True

    def is_login(self):
        return self.login_status

    @require_login
    def logout(self):
@@ -166,7 +179,18 @@ class LearnwareClient:
        if result["code"] != 0:
            raise Exception("update failed: " + json.dumps(result))

    def download_learnware(self, learnware_id, save_path):
    def get_semantic_specification(self, learnware_id: str):
        url = f"{self.host}/engine/learnware_info"
        response = requests.get(url, params={"learnware_id": learnware_id}, headers=self.headers, stream=True)

        result = response.json()

        if result["code"] != 0:
            raise Exception("get learnware semantic specification failed: " + json.dumps(result))

        return result["data"]["learnware_info"]["semantic_specification"]

    def download_learnware(self, learnware_id: str, save_path: str):
        url = f"{self.host}/engine/download_learnware"

        response = requests.get(
@@ -251,7 +275,6 @@ class LearnwareClient:
                headers=self.headers,
            )
            result = response.json()

            if result["code"] != 0:
                raise Exception("search failed: " + json.dumps(result))

@@ -260,12 +283,11 @@ class LearnwareClient:
                returns["single"]["semantic_specifications"].append(learnware["semantic_specification"])
                returns["single"]["matching"].append(learnware["matching"])

            if len(result["data"]["learnware_list_multi"]) > 0:
                multi_learnware = result["data"]["learnware_list_multi"][0]
                returns["multiple"]["learnware_ids"].append(multi_learnware["learnware_id"])
                returns["multiple"]["semantic_specifications"].append(multi_learnware["semantic_specification"])
            for learnware in result["data"]["learnware_list_multi"]:
                returns["multiple"]["learnware_ids"].append(learnware["learnware_id"])
                returns["multiple"]["semantic_specifications"].append(learnware["semantic_specification"])
                returns["multiple"]["matching"] = learnware["matching"]
        

        # Delete temp json file
        os.remove(temp_file_name)

@@ -281,41 +303,6 @@ class LearnwareClient:
        if result["code"] != 0:
            raise Exception("delete failed: " + json.dumps(result))

    def create_semantic_specification(
        self,
        name: Optional[str] = None,
        description: Optional[str] = None,
        data_type: Optional[str] = None,
        task_type: Optional[str] = None,
        library_type: Optional[str] = None,
        scenarios: Optional[Union[str, List[str]]] = None,
        license: Optional[Union[str, List[str]]] = None,
        input_description: Optional[dict] = None,
        output_description: Optional[dict] = None,
    ):
        semantic_specification = dict()
        semantic_specification["Data"] = {"Type": "Class", "Values": [data_type] if data_type is not None else []}
        semantic_specification["Task"] = {"Type": "Class", "Values": [task_type] if task_type is not None else []}
        semantic_specification["Library"] = {
            "Type": "Class",
            "Values": [library_type] if library_type is not None else [],
        }

        license = [license] if isinstance(license, str) else license
        semantic_specification["License"] = {"Type": "Class", "Values": license if license is not None else []}
        scenarios = [scenarios] if isinstance(scenarios, str) else scenarios
        semantic_specification["Scenario"] = {"Type": "Tag", "Values": scenarios if scenarios is not None else []}

        semantic_specification["Name"] = {"Type": "String", "Values": name if name is not None else ""}
        semantic_specification["Description"] = {
            "Type": "String",
            "Values": description if description is not None else "",
        }
        semantic_specification["Input"] = {} if input_description is None else input_description
        semantic_specification["Output"] = {} if output_description is None else output_description

        return semantic_specification

    def list_semantic_specification_values(self, key: SemanticSpecificationKey):
        url = f"{self.host}/engine/semantic_specification"
        response = requests.get(url, headers=self.headers)
@@ -435,7 +422,17 @@ class LearnwareClient:
    @staticmethod
    def check_learnware(learnware_zip_path, semantic_specification=None):
        semantic_specification = (
            get_semantic_specification() if semantic_specification is None else semantic_specification
            generate_semantic_spec(
                name="test",
                description="test",
                data_type="Text",
                task_type="Segmentation",
                scenarios="Financial",
                library_type="Scikit-learn",
                license="Apache-2.0",
            )
            if semantic_specification is None
            else semantic_specification
        )

        check_status, message = LearnwareClient._check_semantic_specification(semantic_specification)
@@ -446,12 +443,9 @@ class LearnwareClient:
                z_file.extractall(tempdir)

            learnware = get_learnware_from_dirpath(
                id="test", semantic_spec=semantic_specification, learnware_dirpath=tempdir
                id="test", semantic_spec=semantic_specification, learnware_dirpath=tempdir, ignore_error=False
            )

            if learnware is None:
                raise Exception("The learnware is not valid.")

            check_status, message = LearnwareClient._check_stat_specification(learnware)
            assert check_status is True, message

--- a/learnware/client/scripts/init.py
+++ b/learnware/client/scripts/init.py
--- a/learnware/client/utils.py
+++ b/learnware/client/utils.py
@@ -28,8 +28,6 @@ def system_execute(args, timeout=None, env=None, stdout=subprocess.DEVNULL, stde

 def remove_enviroment(conda_env):
    system_execute(args=["conda", "env", "remove", "-n", f"{conda_env}"])
    logger.info(f"The learnware conda env [{conda_env}] is removed.")


 def install_environment(learnware_dirpath, conda_env):
    """Install environment of a learnware
@@ -51,7 +49,7 @@ def install_environment(learnware_dirpath, conda_env):
        if "environment.yaml" in os.listdir(learnware_dirpath):
            yaml_path: str = os.path.join(learnware_dirpath, "environment.yaml")
            yaml_path_filter: str = os.path.join(tempdir, "environment_filter.yaml")
            logger.info(f"checking the avaliabe conda packages for {conda_env}")
            logger.info(f"checking the available conda packages for {conda_env}")
            filter_nonexist_conda_packages_file(yaml_file=yaml_path, output_yaml_file=yaml_path_filter)
            # create environment
            logger.info(f"create conda env [{conda_env}] according to .yaml file")
@@ -60,7 +58,7 @@ def install_environment(learnware_dirpath, conda_env):
        elif "requirements.txt" in os.listdir(learnware_dirpath):
            requirements_path: str = os.path.join(learnware_dirpath, "requirements.txt")
            requirements_path_filter: str = os.path.join(tempdir, "requirements_filter.txt")
            logger.info(f"checking the avaliabe pip packages for {conda_env}")
            logger.info(f"checking the available pip packages for {conda_env}")
            filter_nonexist_pip_packages_file(requirements_file=requirements_path, output_file=requirements_path_filter)
            logger.info(f"create empty conda env [{conda_env}]")
            system_execute(args=["conda", "create", "-y", "--name", f"{conda_env}", "python=3.8"])
--- a/learnware/learnware/init.py
+++ b/learnware/learnware/init.py
@@ -1,8 +1,9 @@
 import os
 import copy
 from typing import Optional
 import traceback

 from .base import Learnware

 from .utils import get_stat_spec_from_config
 from ..specification import Specification
 from ..utils import read_yaml_to_dict
@@ -12,7 +13,7 @@ from ..config import C
 logger = get_module_logger("learnware.learnware")


 def get_learnware_from_dirpath(id: str, semantic_spec: dict, learnware_dirpath, ignore_error=True) -> Learnware:
 def get_learnware_from_dirpath(id: str, semantic_spec: dict, learnware_dirpath, ignore_error=True) -> Optional[Learnware]:
    """Get the learnware object from dirpath, and provide the manage interface tor Learnware class

    Parameters
@@ -45,7 +46,12 @@ def get_learnware_from_dirpath(id: str, semantic_spec: dict, learnware_dirpath,
    }

    try:
        yaml_config = read_yaml_to_dict(os.path.join(learnware_dirpath, C.learnware_folder_config["yaml_file"]))
        
        learnware_yaml_path = os.path.join(learnware_dirpath, C.learnware_folder_config["yaml_file"])
        assert os.path.exists(learnware_yaml_path), f"learnware.yaml is not found for learnware_{id}, please check the learnware folder or zipfile."
        
        
        yaml_config = read_yaml_to_dict(learnware_yaml_path)

        if "name" in yaml_config:
            learnware_config["name"] = yaml_config["name"]
@@ -60,7 +66,10 @@ def get_learnware_from_dirpath(id: str, semantic_spec: dict, learnware_dirpath,
        learnware_spec = Specification()
        for _stat_spec in learnware_config["stat_specifications"]:
            stat_spec = _stat_spec.copy()
            stat_spec["file_name"] = os.path.join(learnware_dirpath, stat_spec["file_name"])
            stat_spec_path = os.path.join(learnware_dirpath, stat_spec["file_name"])
            assert os.path.exists(stat_spec_path), f"statistical specification file {stat_spec['file_name']} is not found for learnware_{id}, please check the learnware folder or zipfile."
            
            stat_spec["file_name"] = stat_spec_path
            stat_spec_inst = get_stat_spec_from_config(stat_spec)
            learnware_spec.update_stat_spec(**{stat_spec_inst.type: stat_spec_inst})

@@ -69,7 +78,7 @@ def get_learnware_from_dirpath(id: str, semantic_spec: dict, learnware_dirpath,
    except Exception as e:
        if not ignore_error:
            raise e
        logger.warning(f"Load Learnware {id} failed! Due to {repr(e)}")
        logger.warning(f"Load Learnware {id} failed! Due to {e}; details:\n{traceback.format_exc()}")
        return None

    return Learnware(
--- a/learnware/market/base.py
+++ b/learnware/market/base.py
@@ -139,7 +139,7 @@ class LearnwareMarket:
    def check_learnware(self, zip_path: str, semantic_spec: dict, checker_names: List[str] = None, **kwargs) -> bool:
        try:
            final_status = BaseChecker.NONUSABLE_LEARNWARE
            if len(checker_names):
            if checker_names is not None and len(checker_names):
                with tempfile.TemporaryDirectory(prefix="pending_learnware_") as tempdir:
                    with zipfile.ZipFile(zip_path, mode="r") as z_file:
                        z_file.extractall(tempdir)
@@ -236,14 +236,14 @@ class LearnwareMarket:
        int
            The final learnware check_status.
        """
        zip_path = self.get_learnware_zip_path_by_ids(id) if zip_path is None else zip_path
        zip_path_for_check = self.get_learnware_zip_path_by_ids(id) if zip_path is None else zip_path
        semantic_spec = (
            self.get_learnware_by_ids(id).get_specification().get_semantic_spec()
            if semantic_spec is None
            else semantic_spec
        )
        checker_names = list(self.learnware_checker.keys()) if checker_names is None else checker_names
        update_status = self.check_learnware(zip_path, semantic_spec, checker_names)
        update_status = self.check_learnware(zip_path_for_check, semantic_spec, checker_names)
        check_status = (
            update_status if check_status is None or update_status == BaseChecker.INVALID_LEARNWARE else check_status
        )
--- a/learnware/market/easy/searcher.py
+++ b/learnware/market/easy/searcher.py
@@ -15,6 +15,13 @@ logger = get_module_logger("easy_seacher")


 class EasyExactSemanticSearcher(BaseSearcher):
    def _learnware_id_search(self, learnware_id: str, learnware_list: List[Learnware]) -> List[Learnware]:
        match_learnwares = []
        for learnware in learnware_list:
            if learnware_id == learnware.id:
                match_learnwares.append(learnware)
        return match_learnwares

    def _match_semantic_spec(self, semantic_spec1, semantic_spec2):
        """
        semantic_spec1: semantic spec input by user
@@ -29,12 +36,10 @@ class EasyExactSemanticSearcher(BaseSearcher):

        for key in semantic_spec1.keys():
            v1 = semantic_spec1[key].get("Values", "")
            v2 = semantic_spec2[key].get("Values", "")

            if len(v1) == 0:
                # user input is empty, no need to search
            if key not in semantic_spec2 or len(v1) == 0:
                continue

            v2 = semantic_spec2[key].get("Values", "")
            if key in ("Name", "Description"):
                v1 = v1.lower()
                if v1 not in name2 and v1 not in description2:
@@ -57,9 +62,15 @@ class EasyExactSemanticSearcher(BaseSearcher):

    def __call__(self, learnware_list: List[Learnware], user_info: BaseUserInfo) -> SearchResults:
        match_learnwares = []
        user_semantic_spec = user_info.get_semantic_spec()

        # Learnware id search
        if "learnware_id" in user_semantic_spec:
            learnware_list = self._learnware_id_search(user_semantic_spec["learnware_id"]["Values"], learnware_list)

        # Semantic tag match
        for learnware in learnware_list:
            learnware_semantic_spec = learnware.get_specification().get_semantic_spec()
            user_semantic_spec = user_info.get_semantic_spec()
            if self._match_semantic_spec(user_semantic_spec, learnware_semantic_spec):
                match_learnwares.append(learnware)
        logger.info("semantic_spec search: choose %d from %d learnwares" % (len(match_learnwares), len(learnware_list)))
@@ -67,6 +78,13 @@ class EasyExactSemanticSearcher(BaseSearcher):


 class EasyFuzzSemanticSearcher(BaseSearcher):
    def _learnware_id_search(self, learnware_id: str, learnware_list: List[Learnware]) -> List[Learnware]:
        match_learnwares = []
        for learnware in learnware_list:
            if learnware_id in learnware.id:
                match_learnwares.append(learnware)
        return match_learnwares

    def _match_semantic_spec_tag(self, semantic_spec1, semantic_spec2) -> bool:
        """Judge if tags of two semantic specs are consistent

@@ -128,6 +146,11 @@ class EasyFuzzSemanticSearcher(BaseSearcher):
        final_result = []
        user_semantic_spec = user_info.get_semantic_spec()

        # Learnware id search
        if "learnware_id" in user_semantic_spec:
            learnware_list = self._learnware_id_search(user_semantic_spec["learnware_id"]["Values"], learnware_list)

        # Semantic tag match
        for learnware in learnware_list:
            learnware_semantic_spec = learnware.get_specification().get_semantic_spec()
            if self._match_semantic_spec_tag(user_semantic_spec, learnware_semantic_spec):
--- a/learnware/market/heterogeneous/organizer/init.py
+++ b/learnware/market/heterogeneous/organizer/init.py
@@ -12,7 +12,7 @@ from ...base import BaseChecker, BaseUserInfo
 from ...easy import EasyOrganizer
 from ....learnware import Learnware
 from ....logger import get_module_logger
 from ....specification import HeteroMapTableSpecification, RKMETableSpecification
 from ....specification import HeteroMapTableSpecification


 logger = get_module_logger("hetero_map_table_organizer")
@@ -165,9 +165,11 @@ class HeteroMapTableOrganizer(EasyOrganizer):
        int
            The final learnware check_status.
        """
        old_semantic_spec = self.learnware_list[id].get_specification().get_semantic_spec()
        final_status = super(HeteroMapTableOrganizer, self).update_learnware(id, zip_path, semantic_spec, check_status)
        if final_status == BaseChecker.USABLE_LEARWARE and len(self._get_hetero_learnware_ids(id)):
            self._update_learware_hetero_spec(id)
            if zip_path is not None or old_semantic_spec.get("Input", {}) != semantic_spec.get("Input", {}):
                self._update_learware_hetero_spec(id)
        return final_status

    def _reload_learnware_hetero_spec(self, learnware_id):
@@ -245,7 +247,7 @@ class HeteroMapTableOrganizer(EasyOrganizer):
        ret = []
        for idx in ids:
            spec = self.learnware_list[idx].get_specification()
            if is_hetero(stat_specs=spec.get_stat_spec(), semantic_spec=spec.get_semantic_spec()):
            if is_hetero(stat_specs=spec.get_stat_spec(), semantic_spec=spec.get_semantic_spec(), verbose=False):
                ret.append(idx)
        return ret

--- a/learnware/market/heterogeneous/utils.py
+++ b/learnware/market/heterogeneous/utils.py
@@ -1,9 +1,10 @@
 import traceback
 from ...logger import get_module_logger

 logger = get_module_logger("hetero_utils")


 def is_hetero(stat_specs: dict, semantic_spec: dict) -> bool:
 def is_hetero(stat_specs: dict, semantic_spec: dict, verbose=True) -> bool:
    """Check if user_info satifies all the criteria required for enabling heterogeneous learnware search

    Parameters
@@ -35,15 +36,17 @@ def is_hetero(stat_specs: dict, semantic_spec: dict) -> bool:
        semantic_decription_feature_num = len(semantic_input_description["Description"])

        if semantic_decription_feature_num <= 0:
            logger.warning("At least one of Input.Description in semantic spec should be provides.")
            if verbose:
                logger.warning("At least one of Input.Description in semantic spec should be provides.")
            return False

        if table_input_shape != semantic_description_dim:
            logger.warning("User data feature dimensions mismatch with semantic specification.")
            if verbose:
                logger.warning("User data feature dimensions mismatch with semantic specification.")
            return False

        return True

    except Exception as e:
        logger.warning(f"Invalid heterogeneous search information provided due to {e}. Use homogeneous search instead.")
    except Exception as err:
        if verbose:
            logger.warning(f"Invalid heterogeneous search information provided.")
        return False
--- a/learnware/market/module.py
+++ b/learnware/market/module.py
@@ -1,9 +1,10 @@
 from .base import LearnwareMarket
 from .classes import CondaChecker
 from .easy import EasyOrganizer, EasySearcher, EasySemanticChecker, EasyStatChecker
 from .heterogeneous import HeteroMapTableOrganizer, HeteroSearcher


 def get_market_component(name, market_id, rebuild, organizer_kwargs=None, searcher_kwargs=None, checker_kwargs=None):
 def get_market_component(name, market_id, rebuild, organizer_kwargs=None, searcher_kwargs=None, checker_kwargs=None, conda_checker=False):
    organizer_kwargs = {} if organizer_kwargs is None else organizer_kwargs
    searcher_kwargs = {} if searcher_kwargs is None else searcher_kwargs
    checker_kwargs = {} if checker_kwargs is None else checker_kwargs
@@ -11,7 +12,7 @@ def get_market_component(name, market_id, rebuild, organizer_kwargs=None, search
    if name == "easy":
        easy_organizer = EasyOrganizer(market_id=market_id, rebuild=rebuild)
        easy_searcher = EasySearcher(organizer=easy_organizer)
        easy_checker_list = [EasySemanticChecker(), EasyStatChecker()]
        easy_checker_list = [EasySemanticChecker(), EasyStatChecker() if conda_checker is False else CondaChecker(EasyStatChecker())]
        market_component = {
            "organizer": easy_organizer,
            "searcher": easy_searcher,
@@ -20,7 +21,7 @@ def get_market_component(name, market_id, rebuild, organizer_kwargs=None, search
    elif name == "hetero":
        hetero_organizer = HeteroMapTableOrganizer(market_id=market_id, rebuild=rebuild, **organizer_kwargs)
        hetero_searcher = HeteroSearcher(organizer=hetero_organizer)
        hetero_checker_list = [EasySemanticChecker(), EasyStatChecker()]
        hetero_checker_list = [EasySemanticChecker(), EasyStatChecker() if conda_checker is False else CondaChecker(EasyStatChecker())]

        market_component = {
            "organizer": hetero_organizer,
@@ -40,9 +41,10 @@ def instantiate_learnware_market(
    organizer_kwargs: dict = None,
    searcher_kwargs: dict = None,
    checker_kwargs: dict = None,
    conda_checker: bool = False,
    **kwargs,
 ):
    market_componets = get_market_component(name, market_id, rebuild, organizer_kwargs, searcher_kwargs, checker_kwargs)
    market_componets = get_market_component(name, market_id, rebuild, organizer_kwargs, searcher_kwargs, checker_kwargs, conda_checker)
    return LearnwareMarket(
        organizer=market_componets["organizer"],
        searcher=market_componets["searcher"],
--- a/learnware/specification/init.py
+++ b/learnware/specification/init.py
@@ -17,5 +17,12 @@ if not is_torch_available(verbose=False):
    generate_rkme_table_spec = None
    generate_rkme_image_spec = None
    generate_rkme_text_spec = None
    generate_semantic_spec = None
 else:
    from .module import generate_stat_spec, generate_rkme_table_spec, generate_rkme_image_spec, generate_rkme_text_spec
    from .module import (
        generate_stat_spec,
        generate_rkme_table_spec,
        generate_rkme_image_spec,
        generate_rkme_text_spec,
        generate_semantic_spec,
    )
--- a/learnware/specification/module.py
+++ b/learnware/specification/module.py
@@ -1,7 +1,7 @@
 import torch
 import numpy as np
 import pandas as pd
 from typing import Union, List
 from typing import Union, List, Optional

 from .utils import convert_to_numpy
 from .base import BaseStatSpecification
@@ -78,7 +78,7 @@ def generate_rkme_image_spec(
    reduce: bool = True,
    verbose: bool = True,
    cuda_idx: int = None,
    **kwargs
    **kwargs,
 ) -> RKMEImageSpecification:
    """
        Interface for users to generate Reduced Kernel Mean Embedding (RKME) specification for Image.
@@ -168,7 +168,7 @@ def generate_rkme_text_spec(
    # Check input type
    if not isinstance(X, list) or not all(isinstance(item, str) for item in X):
        raise TypeError("Input data must be a list of strings.")
    

    # Generate rkme text spec
    rkme_text_spec = RKMETextSpecification(gamma=gamma, cuda_idx=cuda_idx)
    rkme_text_spec.generate_stat_spec_from_data(X, reduced_set_size, step_size, steps, nonnegative_beta, reduce)
@@ -177,7 +177,7 @@ def generate_rkme_text_spec(

 def generate_stat_spec(
    type: str, X: Union[np.ndarray, pd.DataFrame, torch.Tensor, List[str]], *args, **kwargs
 ) -> BaseStatSpecification:
 ) -> Union[RKMETableSpecification, RKMEImageSpecification, RKMETextSpecification]:
    """
        Interface for users to generate statistical specification.
        Return a StatSpecification object, use .save() method to save as npy file.
@@ -204,3 +204,41 @@ def generate_stat_spec(
        return generate_rkme_image_spec(X=X, *args, **kwargs)
    else:
        raise TypeError(f"type {type} is not supported!")


 def generate_semantic_spec(
    name: Optional[str] = None,
    description: Optional[str] = None,
    data_type: Optional[str] = None,
    task_type: Optional[str] = None,
    library_type: Optional[str] = None,
    scenarios: Optional[Union[str, List[str]]] = None,
    license: Optional[Union[str, List[str]]] = None,
    input_description: Optional[dict] = None,
    output_description: Optional[dict] = None,
 ):
    semantic_specification = dict()
    semantic_specification["Data"] = {"Type": "Class", "Values": [data_type] if data_type is not None else []}
    semantic_specification["Task"] = {"Type": "Class", "Values": [task_type] if task_type is not None else []}
    semantic_specification["Library"] = {
        "Type": "Class",
        "Values": [library_type] if library_type is not None else [],
    }

    license = [license] if isinstance(license, str) else license
    semantic_specification["License"] = {"Type": "Class", "Values": license if license is not None else []}
    scenarios = [scenarios] if isinstance(scenarios, str) else scenarios
    semantic_specification["Scenario"] = {"Type": "Tag", "Values": scenarios if scenarios is not None else []}

    semantic_specification["Name"] = {"Type": "String", "Values": name if name is not None else ""}
    semantic_specification["Description"] = {
        "Type": "String",
        "Values": description if description is not None else "",
    }
    if input_description is not None:
        semantic_specification["Input"] = input_description

    if output_description is not None:
        semantic_specification["Output"] = output_description

    return semantic_specification
--- a/learnware/specification/regular/image/rkme.py
+++ b/learnware/specification/regular/image/rkme.py
@@ -39,8 +39,7 @@ class RKMEImageSpecification(RegularStatSpecification):
        """
        self.RKME_IMAGE_VERSION = 1  # Please maintain backward compatibility.

        # TODO: remove this
        self.msg=None
        self.msg = None

        self.z = None
        self.beta = None
@@ -170,7 +169,8 @@ class RKMEImageSpecification(RegularStatSpecification):
            import torch_optimizer
        except ModuleNotFoundError:
            raise ModuleNotFoundError(
                f"RKMEImageSpecification is not available because 'torch-optimizer' is not installed! Please install it manually.")
                f"RKMEImageSpecification is not available because 'torch-optimizer' is not installed! Please install it manually."
            )

        # Cross-platform by default, unless the spec is specified to be generated specifically for local experiments.
        cross_platform = "experimental" not in kwargs or not kwargs["experimental"]
@@ -422,7 +422,9 @@ class RKMEImageSpecification(RegularStatSpecification):
            for d in self.get_states():
                if d in rkme_load.keys():
                    if d == "type" and rkme_load[d] != self.type:
                        raise TypeError(f"The type of loaded RKME ({rkme_load[d]}) is different from the expected type ({self.type})!")
                        raise TypeError(
                            f"The type of loaded RKME ({rkme_load[d]}) is different from the expected type ({self.type})!"
                        )
                    setattr(self, d, rkme_load[d])

            self.beta = self.beta.to(self._device)
@@ -441,9 +443,8 @@ def _get_zca_matrix(X, reg_coef=0.1):


 class RandomGenerator:

    def __init__(self, seed=0, cross_platform=True):
        self.cross_platform=cross_platform
        self.cross_platform = cross_platform
        self.state = RandomState(seed)

    def normal_(self, tensor: torch.Tensor, mean=0.0, std=1.0):
@@ -462,24 +463,24 @@ def deterministic(cross_platform, device):
    deterministic_state = torch.backends.cudnn.deterministic
    torch.backends.cudnn.deterministic = True
    if cross_platform and torch.cuda.is_available():
        torch.cuda.set_rng_state(
            new_state=torch.cuda.get_rng_state(device.index),
            device="cpu")
        torch.cuda.set_rng_state(new_state=torch.cuda.get_rng_state(device.index), device="cpu")

    yield RandomGenerator(seed=0, cross_platform=cross_platform)

    torch.backends.cudnn.deterministic = deterministic_state
    if cross_platform and torch.cuda.is_available():
        torch.cuda.set_rng_state(
            new_state=torch.cuda.get_rng_state(device.index),
            device="cuda")
        torch.cuda.set_rng_state(new_state=torch.cuda.get_rng_state(device.index), device="cuda")


 class _ConvNet_wide(nn.Module):
    def __init__(self, channel, random_generator, mu=None, sigma=None, k=2, net_width=128, net_depth=3, im_size=(32, 32)):
    def __init__(
        self, channel, random_generator, mu=None, sigma=None, k=2, net_width=128, net_depth=3, im_size=(32, 32)
    ):
        self.k = k
        super().__init__()
        self.features, shape_feat = self._make_layers(channel, net_width, net_depth, im_size, mu, sigma, random_generator)
        self.features, shape_feat = self._make_layers(
            channel, net_width, net_depth, im_size, mu, sigma, random_generator
        )
        # self.aggregation = nn.AvgPool2d(kernel_size=shape_feat[1])

    def forward(self, x):
@@ -495,7 +496,9 @@ class _ConvNet_wide(nn.Module):
        in_channels = channel
        shape_feat = [in_channels, im_size[0], im_size[1]]
        for d in range(net_depth):
            layers += [_build_conv2d_gaussian(in_channels, int(k * net_width), random_generator, 3, 1, mean=mu, std=sigma)]
            layers += [
                _build_conv2d_gaussian(in_channels, int(k * net_width), random_generator, 3, 1, mean=mu, std=sigma)
            ]
            shape_feat[0] = int(k * net_width)

            layers += [nn.ReLU(inplace=True)]
@@ -508,7 +511,9 @@ class _ConvNet_wide(nn.Module):
        return nn.Sequential(*layers), shape_feat


 def _build_conv2d_gaussian(in_channels, out_channels, random_generator: RandomGenerator, kernel=3, padding=1, mean=None, std=None):
 def _build_conv2d_gaussian(
    in_channels, out_channels, random_generator: RandomGenerator, kernel=3, padding=1, mean=None, std=None
 ):
    layer = nn.Conv2d(in_channels, out_channels, kernel, padding=padding)
    if mean is None:
        mean = 0
--- a/learnware/tests/init.py
+++ b/learnware/tests/init.py
@@ -1 +1 @@
 from .module import get_semantic_specification
 from .utils import parametrize
--- a/learnware/tests/benchmarks/init.py
+++ b/learnware/tests/benchmarks/init.py
@@ -0,0 +1,173 @@
 import os
 import pickle
 import tempfile
 import zipfile
 from dataclasses import dataclass
 from typing import Tuple, Optional, List, Union

 from .config import BenchmarkConfig, benchmark_configs
 from ..data import GetData
 from ...config import C


@dataclass
 class Benchmark:
    name: str
    user_num: int
    learnware_ids: List[str]
    test_X_paths: List[str]
    test_y_paths: List[str]
    train_X_paths: Optional[List[str]] = None
    train_y_paths: Optional[List[str]] = None
    extra_info_path: Optional[str] = None

    def get_test_data(self, user_ids: Union[int, List[int]]):
        raw_user_ids = user_ids
        if isinstance(user_ids, int):
            user_ids = [user_ids]

        ret = []
        for user_id in user_ids:
            with open(self.test_X_paths[user_id], "rb") as fin:
                test_X = pickle.load(fin)

            with open(self.test_y_paths[user_id], "rb") as fin:
                test_y = pickle.load(fin)

            ret.append((test_X, test_y))

        if isinstance(raw_user_ids, int):
            return ret[0]
        else:
            return ret

    def get_train_data(self, user_ids: Union[int, List[int]]):
        if self.train_X_paths is None or self.train_y_paths is None:
            return None

        raw_user_ids = user_ids
        if isinstance(user_ids, int):
            user_ids = [user_ids]

        ret = []
        for user_id in user_ids:
            with open(self.train_X_paths[user_id], "rb") as fin:
                train_X = pickle.load(fin)

            with open(self.train_y_paths[user_id], "rb") as fin:
                train_y = pickle.load(fin)

            ret.append((train_X, train_y))

        if isinstance(raw_user_ids, int):
            return ret[0]
        else:
            return ret


 class LearnwareBenchmark:
    def __init__(self):
        self.benchmark_configs = benchmark_configs

    def list_benchmarks(self):
        return list(self.benchmark_configs.keys())

    def _check_cache_data_valid(self, benchmark_config: BenchmarkConfig, data_type: str) -> bool:
        """Check if the cache data is valid

        Parameters
        ----------
        benchmark_config : BenchmarkConfig
            benchmark config
        data_type : str
            "test" for test data or "train" for train data

        Returns
        -------
        bool
            A flag indicating if the cache data is valid
        """
        cache_folder = os.path.join(C.cache_path, benchmark_config.name, f"{data_type}_data")
        if os.path.exists(cache_folder):
            for user_id in range(benchmark_config.user_num):
                X_path = os.path.join(cache_folder, f"user{user_id}_X.pkl")
                y_path = os.path.join(cache_folder, f"user{user_id}_X.pkl")
                if not os.path.isfile(X_path) or not os.path.isfile(y_path):
                    return False
            return True
        else:
            return False

    def _download_data(self, download_path: str, save_path: str):
        """Download data from backend

        Parameters
        ----------
        download_path : str
            data path for download in backend
        save_path : str
            local cache path for saving data
        """
        with tempfile.TemporaryDirectory(prefix="learnware_benchmark_") as tempdir:
            test_data_zippath = os.path.join(tempdir, "benchmark_data.zip")
            GetData().download_file(download_path, test_data_zippath)

            os.makedirs(save_path, exist_ok=True)
            with zipfile.ZipFile(test_data_zippath, "r") as z_file:
                z_file.extractall(save_path)

    def _load_cache_data(self, benchmark_config: BenchmarkConfig, data_type: str) -> Tuple[List[str], List[str]]:
        """Load data from local cache path

        Parameters
        ----------
        benchmark_config : BenchmarkConfig
            benchmark config
        data_type : str
            "test" for test data or "train" for train data
        """
        cache_folder = os.path.join(C.cache_path, benchmark_config.name, f"{data_type}_data")
        if not self._check_cache_data_valid(benchmark_config, data_type):
            download_path = getattr(benchmark_config, f"{data_type}_data_path", None)
            self._download_data(download_path, cache_folder)

        X_paths, y_paths = [], []
        for user_id in range(benchmark_config.user_num):
            user_X_path = os.path.join(cache_folder, f"user{user_id}_X.pkl")
            user_y_path = os.path.join(cache_folder, f"user{user_id}_y.pkl")
            assert os.path.isfile(user_X_path), f"user {user_id} {data_type}_X is not valid!"
            assert os.path.isfile(user_y_path), f"user {user_id} {data_type}_y is not valid!"
            X_paths.append(user_X_path)
            y_paths.append(user_y_path)

        return X_paths, y_paths

    def get_benchmark(self, benchmark_config: Union[str, BenchmarkConfig]):
        if isinstance(benchmark_config, str):
            benchmark_config = self.benchmark_configs[benchmark_config]

        # Load test data
        test_X_paths, test_y_paths = self._load_cache_data(benchmark_config, "test")

        # Load train data
        train_X_paths, train_y_paths = None, None
        if benchmark_config.train_data_path is not None:
            train_X_paths, train_y_paths = self._load_cache_data(benchmark_config, "train")

        # Load extra info
        extra_info_path = None
        if benchmark_config.extra_info_path is not None:
            extra_info_path = os.path.join(C.cache_path, benchmark_config.name, "extra_info")
            if not os.path.exists(extra_info_path):
                self._download_data(benchmark_config.extra_info_path, extra_info_path)

        return Benchmark(
            name=benchmark_config.name,
            user_num=benchmark_config.user_num,
            learnware_ids=benchmark_config.learnware_ids,
            test_X_paths=test_X_paths,
            test_y_paths=test_y_paths,
            train_X_paths=train_X_paths,
            train_y_paths=train_y_paths,
            extra_info_path=extra_info_path,
        )
--- a/learnware/tests/benchmarks/config.py
+++ b/learnware/tests/benchmarks/config.py
@@ -0,0 +1,15 @@
 from dataclasses import dataclass
 from typing import Optional, List


@dataclass
 class BenchmarkConfig:
    name: str
    user_num: int
    learnware_ids: List[str]
    test_data_path: str
    train_data_path: Optional[str] = None
    extra_info_path: Optional[str] = None


 benchmark_configs = {}
--- a/learnware/tests/data.py
+++ b/learnware/tests/data.py
@@ -0,0 +1,39 @@
 import json
 import requests
 from tqdm import tqdm

 from ..config import C


 class GetData:
    def __init__(self, host=None, chunk_size=1024 * 1024):
        self.headers = None

        if host is None:
            self.host = C.backend_host
        else:
            self.host = host

        self.chunk_size = chunk_size

    def download_file(self, file_path: str, save_path: str):
        url = f"{self.host}/datasets/download_datasets"

        response = requests.get(
            url,
            params={
                "dataset": file_path,
            },
            stream=True,
        )

        if response.status_code != 200:
            raise Exception("download failed: " + json.dumps(response.json()))

        num_chunks = int(response.headers["Content-Length"]) // self.chunk_size + 1
        bar = tqdm(total=num_chunks, desc="Downloading", unit="MB")

        with open(save_path, "wb") as f:
            for chunk in response.iter_content(chunk_size=self.chunk_size):
                f.write(chunk)
                bar.update(1)
--- a/learnware/tests/module.py
+++ b/learnware/tests/module.py
@@ -1,10 +0,0 @@
 def get_semantic_specification():
    semantic_specification = dict()
    semantic_specification["Data"] = {"Type": "Class", "Values": ["Text"]}
    semantic_specification["Task"] = {"Type": "Class", "Values": ["Segmentation"]}
    semantic_specification["Library"] = {"Type": "Class", "Values": ["Scikit-learn"]}
    semantic_specification["Scenario"] = {"Type": "Tag", "Values": ["Financial"]}
    semantic_specification["License"] = {"Type": "Class", "Values": ["Apache-2.0"]}
    semantic_specification["Name"] = {"Type": "String", "Values": "test"}
    semantic_specification["Description"] = {"Type": "String", "Values": "test"}
    return semantic_specification
--- a/learnware/tests/templates/init.py
+++ b/learnware/tests/templates/init.py
@@ -0,0 +1,101 @@
 import os
 import tempfile
 from dataclasses import dataclass, field
 from shutil import copyfile
 from typing import List, Tuple, Union, Optional

 from ...utils import save_dict_to_yaml, convert_folder_to_zipfile
 from ...config import C


@dataclass
 class ModelTemplate:
    class_name: str = field(init=False)
    template_path: str = field(init=False)
    model_kwargs: dict = field(init=False)
@dataclass
 class PickleModelTemplate(ModelTemplate):
    model_kwargs: dict
    pickle_filepath: str
    def __post_init__(self):
        self.class_name = "PickleLoadedModel"
        self.template_path = os.path.join(C.package_path, "tests", "templates", "pickle_model.py")
        default_model_kwargs = {
            "predict_method": "predict",
            "fit_method": "fit",
            "finetune_method": "finetune",
            "pickle_filename": "model.pkl",
        }
        default_model_kwargs.update(self.model_kwargs)
        self.model_kwargs = default_model_kwargs

@dataclass
 class StatSpecTemplate:
    filepath: str
    type: str = field(default="RKMETableSpecification")
    
 class LearnwareTemplate:

    @staticmethod
    def generate_requirements(filepath, requirements: Optional[List[Union[Tuple[str, str, str], str]]] = None):
        requirements = [] if requirements is None else requirements
        operators = {"==", "~=", ">=", "<=", ">", "<"}
        requirements_str = ""
        for requirement in requirements:
            if isinstance(requirement, str):
                line_str = requirement.strip() + "\n"
            elif isinstance(requirement, tuple):
                assert requirement[1] in operators, f"The operator of requirements is not supported."
                line_str = requirement[0].strip() + requirement[1].strip() + requirement[2].strip() + "\n"
            else:
                raise TypeError(f"requirement must be type str/tuple, rather than {type(requirement)}")
            
            requirements_str += line_str
            
        with open(filepath, "w") as fdout:
            fdout.write(requirements_str)
    
    @staticmethod
    def generate_learnware_yaml(filepath, model_config: Optional[dict] = None, stat_spec_config: Optional[List[dict]] = None):
        learnware_config = {}
        if model_config is not None:
            learnware_config["model"] = model_config
        if stat_spec_config is not None:
            learnware_config["stat_specifications"] = stat_spec_config

        save_dict_to_yaml(learnware_config, filepath)
    
    @staticmethod
    def generate_learnware_zipfile(
        learnware_zippath: str,
        model_template: ModelTemplate,
        stat_spec_template: StatSpecTemplate,
        requirements: Optional[List[Union[Tuple[str, str, str], str]]] = None,
    ):
        with tempfile.TemporaryDirectory(suffix="learnware_template") as tempdir:
            requirement_filepath = os.path.join(tempdir, "requirements.txt")
            LearnwareTemplate.generate_requirements(requirement_filepath, requirements)
            
            model_filepath =  os.path.join(tempdir, "__init__.py")
            copyfile(model_template.template_path, model_filepath)
            
            learnware_yaml_filepath = os.path.join(tempdir, "learnware.yaml")
            model_config = {
                "class_name": model_template.class_name,
                "kwargs": model_template.model_kwargs,
            }
            
            stat_spec_config = {
                "module_path": "learnware.specification",
                "class_name": stat_spec_template.type,
                "file_name": "stat_spec.json",
                "kwargs": {}
            }
            copyfile(stat_spec_template.filepath, os.path.join(tempdir, stat_spec_config["file_name"]))
            LearnwareTemplate.generate_learnware_yaml(learnware_yaml_filepath, model_config, stat_spec_config=[stat_spec_config])
            
            if isinstance(model_template, PickleModelTemplate):
                pickle_filepath = os.path.join(tempdir, model_template.model_kwargs["pickle_filename"])
                copyfile(model_template.pickle_filepath, pickle_filepath)
                
            convert_folder_to_zipfile(tempdir, learnware_zippath)
--- a/learnware/tests/templates/pickle_model.py
+++ b/learnware/tests/templates/pickle_model.py
@@ -0,0 +1,33 @@
 import os
 import pickle
 import numpy as np
 from learnware.model.base import BaseModel

 class PickleLoadedModel(BaseModel):
    
    def __init__(
        self,
        input_shape,
        output_shape,
        predict_method="predict",
        fit_method="fit",
        finetune_method="finetune",
        pickle_filename="model.pkl",
    ):
        super(PickleLoadedModel, self).__init__(input_shape=input_shape, output_shape=output_shape)
        dir_path = os.path.dirname(os.path.abspath(__file__))
        self.pickle_filepath = os.path.join(dir_path, pickle_filename)
        with open(self.pickle_filepath, "rb") as fd:
            self.model = pickle.load(fd)
        self.predict_method = predict_method
        self.fit_method = fit_method
        self.finetune_method = finetune_method
            
    def predict(self, X: np.ndarray) -> np.ndarray:
        return getattr(self.model, self.predict_method)(X)
    
    def fit(self, X: np.ndarray, y: np.ndarray):
        getattr(self.model, self.fit_method)(X, y)

    def finetune(self, X: np.ndarray, y: np.ndarray):
        getattr(self.model, self.finetune_method)(X, y)
--- a/learnware/tests/utils.py
+++ b/learnware/tests/utils.py
@@ -0,0 +1,9 @@
 import unittest

 def parametrize(test_class, **kwargs):
    test_loader = unittest.TestLoader()
    test_names = test_loader.getTestCaseNames(test_class)
    _suite = unittest.TestSuite()
    for name in test_names:
        _suite.addTest(test_class(name, **kwargs))
    return _suite
--- a/learnware/utils/init.py
+++ b/learnware/utils/init.py
@@ -3,7 +3,7 @@ import zipfile

 from .import_utils import is_torch_available
 from .module import get_module_by_module_path
 from .file import read_yaml_to_dict, save_dict_to_yaml
 from .file import read_yaml_to_dict, save_dict_to_yaml, convert_folder_to_zipfile
 from .gpu import setup_seed, choose_device, allocate_cuda_idx
 from ..config import get_platform, SystemType

--- a/learnware/utils/file.py
+++ b/learnware/utils/file.py
@@ -1,5 +1,6 @@
 import os
 import yaml

 import zipfile

 def save_dict_to_yaml(dict_value: dict, save_path: str):
    """save dict object into yaml file"""
@@ -12,3 +13,13 @@ def read_yaml_to_dict(yaml_path: str):
    with open(yaml_path, "r") as file:
        dict_value = yaml.load(file.read(), Loader=yaml.FullLoader)
        return dict_value

 def convert_folder_to_zipfile(folder_path, zip_path):
    with zipfile.ZipFile(zip_path, "w") as zip_obj:
        for foldername, subfolders, filenames in os.walk(folder_path):
            for filename in filenames:
                file_path = os.path.join(foldername, filename)
                zip_info = zipfile.ZipInfo(filename)
                zip_info.compress_type = zipfile.ZIP_STORED
                with open(file_path, "rb") as file:
                    zip_obj.writestr(zip_info, file.read())
--- a/tests/test_function/test_search.py
+++ b/tests/test_function/test_search.py
@@ -0,0 +1,103 @@
 import os
 import unittest
 import tempfile
 import logging

 import learnware

 learnware.init(logging_level=logging.WARNING)

 from learnware.learnware import Learnware
 from learnware.client import LearnwareClient
 from learnware.market import instantiate_learnware_market, BaseUserInfo, EasySemanticChecker
 from learnware.config import C


 class TestSearch(unittest.TestCase):
    client = LearnwareClient()

    @classmethod
    def setUpClass(cls):
        cls.market = instantiate_learnware_market(market_id="search_test", name="hetero", rebuild=True)
        if cls.client.is_connected():
            cls._build_learnware_market()

    @classmethod
    def _build_learnware_market(cls):
        table_learnware_ids = ["00001951", "00001980", "00001987"]
        image_learnware_ids = ["00000851", "00000858", "00000841"]
        text_learnware_ids = ["00000652", "00000637"]
        learnware_ids = table_learnware_ids + image_learnware_ids + text_learnware_ids
        with tempfile.TemporaryDirectory(prefix="learnware_search_test") as tempdir:
            for learnware_id in learnware_ids:
                learnware_zippath = os.path.join(tempdir, f"learnware_{learnware_id}.zip")
                try:
                    cls.client.download_learnware(learnware_id=learnware_id, save_path=learnware_zippath)
                    semantic_spec = (
                        cls.client.load_learnware(learnware_path=learnware_zippath)
                        .get_specification()
                        .get_semantic_spec()
                    )
                except Exception:
                    print("'learnware_id' is passed due to the network problem.")
                cls.market.add_learnware(
                    learnware_zippath,
                    learnware_id=learnware_id,
                    semantic_spec=semantic_spec,
                    checker_names=["EasySemanticChecker"],
                )

    def _skip_test(self):
        if not self.client.is_connected():
            print("Client can not connect!")
            return True
        return False

    def test_image_search(self):
        if not self._skip_test():
            learnware_id = "00000619"
            try:
                learnware: Learnware = self.client.load_learnware(learnware_id=learnware_id)
            except Exception:
                print("'test_image_search' is passed due to the network problem.")
            user_info = BaseUserInfo(stat_info=learnware.get_specification().get_stat_spec())
            search_result = self.market.search_learnware(user_info)
            print("Single Search Results:", search_result.get_single_results())
            print("Multiple Search Results:", search_result.get_multiple_results())

    def test_text_search(self):
        if not self._skip_test():
            learnware_id = "00000653"
            try:
                learnware: Learnware = self.client.load_learnware(learnware_id=learnware_id)
            except Exception:
                print("'test_text_search' is passed due to the network problem.")
            user_info = BaseUserInfo(stat_info=learnware.get_specification().get_stat_spec())
            search_result = self.market.search_learnware(user_info)
            print("Single Search Results:", search_result.get_single_results())
            print("Multiple Search Results:", search_result.get_multiple_results())

    def test_table_search(self):
        if not self._skip_test():
            learnware_id = "00001950"
            try:
                learnware: Learnware = self.client.load_learnware(learnware_id=learnware_id)
            except Exception:
                print("'test_table_search' is passed due to the network problem.")
            user_info = BaseUserInfo(stat_info=learnware.get_specification().get_stat_spec())
            search_result = self.market.search_learnware(user_info)
            print("Single Search Results:", search_result.get_single_results())
            print("Multiple Search Results:", search_result.get_multiple_results())


 def suite():
    _suite = unittest.TestSuite()
    _suite.addTest(TestSearch("test_image_search"))
    _suite.addTest(TestSearch("test_text_search"))
    _suite.addTest(TestSearch("test_table_search"))
    return _suite


 if __name__ == "__main__":
    runner = unittest.TextTestRunner()
    runner.run(suite())
--- a/tests/test_hetero_market/example_learnwares/learnware.yaml
+++ b/tests/test_hetero_market/example_learnwares/learnware.yaml
@@ -1,8 +0,0 @@
 model:
  class_name: MyModel
  kwargs: {}
 stat_specifications:
  - module_path: learnware.specification
    class_name: RKMETableSpecification
    file_name: stat.json
    kwargs: {}
--- a/tests/test_hetero_market/example_learnwares/model0.py
+++ b/tests/test_hetero_market/example_learnwares/model0.py
@@ -1,16 +0,0 @@
 from learnware.model import BaseModel
 import numpy as np
 import joblib
 import os


 class MyModel(BaseModel):
    def __init__(self):
        super(MyModel, self).__init__(input_shape=(20,), output_shape=(1,))
        dir_path = os.path.dirname(os.path.abspath(__file__))
        model_path = os.path.join(dir_path, "ridge.pkl")
        model = joblib.load(model_path)
        self.model = model

    def predict(self, X: np.ndarray) -> np.ndarray:
        return self.model.predict(X)
--- a/tests/test_hetero_market/example_learnwares/model1.py
+++ b/tests/test_hetero_market/example_learnwares/model1.py
@@ -1,16 +0,0 @@
 from learnware.model import BaseModel
 import numpy as np
 import joblib
 import os


 class MyModel(BaseModel):
    def __init__(self):
        super(MyModel, self).__init__(input_shape=(30,), output_shape=(1,))
        dir_path = os.path.dirname(os.path.abspath(__file__))
        model_path = os.path.join(dir_path, "ridge.pkl")
        model = joblib.load(model_path)
        self.model = model

    def predict(self, X: np.ndarray) -> np.ndarray:
        return self.model.predict(X)
--- a/tests/test_hetero_market/example_learnwares/requirements.txt
+++ b/tests/test_hetero_market/example_learnwares/requirements.txt
@@ -1 +0,0 @@
 learnware == 0.1.0.999
--- a/tests/test_hetero_market/test_hetero.py
+++ b/tests/test_hetero_market/test_hetero.py
@@ -1,414 +0,0 @@
 import torch
 import unittest
 import os
 import copy
 import joblib
 import zipfile
 import numpy as np
 import multiprocessing
 from sklearn.linear_model import Ridge
 from sklearn.datasets import make_regression
 from shutil import copyfile, rmtree
 from learnware.client import LearnwareClient
 from sklearn.metrics import mean_squared_error

 import learnware
 from learnware.market import instantiate_learnware_market, BaseUserInfo
 from learnware.specification import RKMETableSpecification, generate_rkme_table_spec
 from learnware.reuse import HeteroMapAlignLearnware, AveragingReuser, EnsemblePruningReuser
 from example_learnwares.config import (
    input_shape_list,
    input_description_list,
    output_description_list,
    user_description_list,
 )

 curr_root = os.path.dirname(os.path.abspath(__file__))

 user_semantic = {
    "Data": {"Values": ["Table"], "Type": "Class"},
    "Task": {
        "Values": ["Regression"],
        "Type": "Class",
    },
    "Library": {"Values": ["Scikit-learn"], "Type": "Class"},
    "Scenario": {"Values": ["Education"], "Type": "Tag"},
    "Description": {"Values": "", "Type": "String"},
    "Name": {"Values": "", "Type": "String"},
    "License": {"Values": ["MIT"], "Type": "Class"},
 }


 def check_learnware(learnware_name, dir_path=os.path.join(curr_root, "learnware_pool")):
    print(f"Checking Learnware: {learnware_name}")
    zip_file_path = os.path.join(dir_path, learnware_name)
    client = LearnwareClient()
    # if check_learnware doesn't raise an exception, return True, otherwise, return false
    try:
        client.check_learnware(zip_file_path)
        return True
    except Exception as e:
        print(f"Learnware {learnware_name} failed the check: {e}")
        return False


 class TestMarket(unittest.TestCase):
    @classmethod
    def setUpClass(cls) -> None:
        np.random.seed(2023)
        learnware.init()

    def _init_learnware_market(self, organizer_kwargs=None):
        """initialize learnware market"""
        hetero_market = instantiate_learnware_market(
            market_id="hetero_toy", name="hetero", rebuild=True, organizer_kwargs=organizer_kwargs
        )
        return hetero_market

    def test_prepare_learnware_randomly(self, learnware_num=5):
        self.zip_path_list = []

        for i in range(learnware_num):
            dir_path = os.path.join(curr_root, "learnware_pool", "ridge_%d" % (i))
            os.makedirs(dir_path, exist_ok=True)

            print("Preparing Learnware: %d" % (i))

            example_learnware_idx = i % 2
            input_dim = input_shape_list[example_learnware_idx]
            learnware_example_dir = "example_learnwares"

            X, y = make_regression(n_samples=5000, n_informative=15, n_features=input_dim, noise=0.1, random_state=42)

            clf = Ridge(alpha=1.0)
            clf.fit(X, y)

            joblib.dump(clf, os.path.join(dir_path, "ridge.pkl"))

            spec = generate_rkme_table_spec(X=X, gamma=0.1, cuda_idx=0)
            spec.save(os.path.join(dir_path, "stat.json"))

            init_file = os.path.join(dir_path, "__init__.py")
            copyfile(
                os.path.join(curr_root, learnware_example_dir, f"model{example_learnware_idx}.py"), init_file
            )  # cp example_init.py init_file

            yaml_file = os.path.join(dir_path, "learnware.yaml")
            copyfile(
                os.path.join(curr_root, learnware_example_dir, "learnware.yaml"), yaml_file
            )  # cp example.yaml yaml_file

            env_file = os.path.join(dir_path, "requirements.txt")
            copyfile(os.path.join(curr_root, learnware_example_dir, "requirements.txt"), env_file)

            zip_file = dir_path + ".zip"
            # zip -q -r -j zip_file dir_path
            with zipfile.ZipFile(zip_file, "w") as zip_obj:
                for foldername, subfolders, filenames in os.walk(dir_path):
                    for filename in filenames:
                        file_path = os.path.join(foldername, filename)
                        zip_info = zipfile.ZipInfo(filename)
                        zip_info.compress_type = zipfile.ZIP_STORED
                        with open(file_path, "rb") as file:
                            zip_obj.writestr(zip_info, file.read())

            rmtree(dir_path)  # rm -r dir_path

            self.zip_path_list.append(zip_file)

    def test_generated_learnwares(self):
        curr_root = os.path.dirname(os.path.abspath(__file__))
        dir_path = os.path.join(curr_root, "learnware_pool")

        # Execute multi-process checking using Pool
        mp_context = multiprocessing.get_context("spawn")
        with mp_context.Pool() as pool:
            results = pool.starmap(check_learnware, [(name, dir_path) for name in os.listdir(dir_path)])

        # Use an assert statement to ensure that all checks return True
        self.assertTrue(all(results), "Not all learnwares passed the check")

    def test_upload_delete_learnware(self, learnware_num=5, delete=True):
        hetero_market = self._init_learnware_market()
        self.test_prepare_learnware_randomly(learnware_num)
        self.learnware_num = learnware_num

        print("Total Item:", len(hetero_market))
        assert len(hetero_market) == 0, f"The market should be empty!"

        for idx, zip_path in enumerate(self.zip_path_list):
            semantic_spec = copy.deepcopy(user_semantic)
            semantic_spec["Name"]["Values"] = "learnware_%d" % (idx)
            semantic_spec["Description"]["Values"] = "test_learnware_number_%d" % (idx)
            semantic_spec["Input"] = input_description_list[idx % 2]
            semantic_spec["Output"] = output_description_list[idx % 2]
            hetero_market.add_learnware(zip_path, semantic_spec)

        print("Total Item:", len(hetero_market))
        assert len(hetero_market) == self.learnware_num, f"The number of learnwares must be {self.learnware_num}!"
        curr_inds = hetero_market.get_learnware_ids()
        print("Available ids After Uploading Learnwares:", curr_inds)
        assert len(curr_inds) == self.learnware_num, f"The number of learnwares must be {self.learnware_num}!"

        if delete:
            for learnware_id in curr_inds:
                hetero_market.delete_learnware(learnware_id)
                self.learnware_num -= 1
                assert (
                    len(hetero_market) == self.learnware_num
                ), f"The number of learnwares must be {self.learnware_num}!"

            curr_inds = hetero_market.get_learnware_ids()
            print("Available ids After Deleting Learnwares:", curr_inds)
            assert len(curr_inds) == 0, f"The market should be empty!"

        return hetero_market

    def test_train_market_model(self, learnware_num=5):
        hetero_market = self._init_learnware_market(
            organizer_kwargs={"auto_update": False, "auto_update_limit": learnware_num}
        )
        self.test_prepare_learnware_randomly(learnware_num)
        self.learnware_num = learnware_num

        print("Total Item:", len(hetero_market))
        assert len(hetero_market) == 0, f"The market should be empty!"

        for idx, zip_path in enumerate(self.zip_path_list):
            semantic_spec = copy.deepcopy(user_semantic)
            semantic_spec["Name"]["Values"] = "learnware_%d" % (idx)
            semantic_spec["Description"]["Values"] = "test_learnware_number_%d" % (idx)
            semantic_spec["Input"] = input_description_list[idx % 2]
            semantic_spec["Output"] = output_description_list[idx % 2]
            hetero_market.add_learnware(zip_path, semantic_spec)

        print("Total Item:", len(hetero_market))
        assert len(hetero_market) == self.learnware_num, f"The number of learnwares must be {self.learnware_num}!"
        curr_inds = hetero_market.get_learnware_ids()
        print("Available ids After Uploading Learnwares:", curr_inds)
        assert len(curr_inds) == self.learnware_num, f"The number of learnwares must be {self.learnware_num}!"

        # organizer=hetero_market.learnware_organizer
        # organizer.train(hetero_market.learnware_organizer.learnware_list.values())
        return hetero_market

    def test_search_semantics(self, learnware_num=5):
        hetero_market = self.test_upload_delete_learnware(learnware_num, delete=False)
        print("Total Item:", len(hetero_market))
        assert len(hetero_market) == self.learnware_num, f"The number of learnwares must be {self.learnware_num}!"

        semantic_spec = copy.deepcopy(user_semantic)
        semantic_spec["Name"]["Values"] = f"learnware_{learnware_num - 1}"

        user_info = BaseUserInfo(semantic_spec=semantic_spec)
        search_result = hetero_market.search_learnware(user_info)
        single_result = search_result.get_single_results()

        print("User info:", user_info.get_semantic_spec())
        print(f"Search result:")
        assert len(single_result) == 1, f"Exact semantic search failed!"
        for search_item in single_result:
            semantic_spec1 = search_item.learnware.get_specification().get_semantic_spec()
            print("Choose learnware:", search_item.learnware.id, semantic_spec1)
            assert semantic_spec1["Name"]["Values"] == semantic_spec["Name"]["Values"], f"Exact semantic search failed!"

        semantic_spec["Name"]["Values"] = "laernwaer"
        user_info = BaseUserInfo(semantic_spec=semantic_spec)
        search_result = hetero_market.search_learnware(user_info)
        single_result = search_result.get_single_results()

        print("User info:", user_info.get_semantic_spec())
        print(f"Search result:")
        assert len(single_result) == self.learnware_num, f"Fuzzy semantic search failed!"
        for search_item in single_result:
            semantic_spec1 = search_item.learnware.get_specification().get_semantic_spec()
            print("Choose learnware:", search_item.learnware.id, semantic_spec1)

    def test_stat_search(self, learnware_num=5):
        hetero_market = self.test_train_market_model(learnware_num)
        print("Total Item:", len(hetero_market))

        # hetero test
        print("+++++ HETERO TEST ++++++")
        user_dim = 15

        test_folder = os.path.join(curr_root, "test_stat")

        for idx, zip_path in enumerate(self.zip_path_list):
            unzip_dir = os.path.join(test_folder, f"{idx}")

            # unzip -o -q zip_path -d unzip_dir
            if os.path.exists(unzip_dir):
                rmtree(unzip_dir)
            os.makedirs(unzip_dir, exist_ok=True)
            with zipfile.ZipFile(zip_path, "r") as zip_obj:
                zip_obj.extractall(path=unzip_dir)

            user_spec = RKMETableSpecification()
            user_spec.load(os.path.join(unzip_dir, "stat.json"))
            z = user_spec.get_z()
            z = z[:, :user_dim]
            device = user_spec.device
            z = torch.tensor(z, device=device)
            user_spec.z = z

            print(">> normal case test:")
            semantic_spec = copy.deepcopy(user_semantic)
            semantic_spec["Input"] = copy.deepcopy(input_description_list[idx % 2])
            semantic_spec["Input"]["Dimension"] = user_dim
            # keep only the first user_dim descriptions
            semantic_spec["Input"]["Description"] = {
                str(key): semantic_spec["Input"]["Description"][str(key)] for key in range(user_dim)
            }
            user_info = BaseUserInfo(semantic_spec=semantic_spec, stat_info={"RKMETableSpecification": user_spec})

            search_result = hetero_market.search_learnware(user_info)
            single_result = search_result.get_single_results()
            multiple_result = search_result.get_multiple_results()

            print(f"search result of user{idx}:")
            for single_item in single_result:
                print(f"score: {single_item.score}, learnware_id: {single_item.learnware.id}")

            for multiple_item in multiple_result:
                print(
                    f"mixture_score: {multiple_item.score}, mixture_learnware_ids: {[item.id for item in multiple_item.learnwares]}"
                )

            # inproper key "Task" in semantic_spec, use homo search and print invalid semantic_spec
            print(">> test for key 'Task' has empty 'Values':")
            semantic_spec["Task"] = {"Values": ["Segmentation"], "Type": "Class"}

            user_info = BaseUserInfo(semantic_spec=semantic_spec, stat_info={"RKMETableSpecification": user_spec})
            search_result = hetero_market.search_learnware(user_info)
            single_result = search_result.get_single_results()

            assert len(single_result) == 0, f"Statistical search failed!"

            # delete key "Task" in semantic_spec, use homo search and print WARNING INFO with "User doesn't provide correct task type"
            print(">> delele key 'Task' test:")
            semantic_spec.pop("Task")

            user_info = BaseUserInfo(semantic_spec=semantic_spec, stat_info={"RKMETableSpecification": user_spec})
            search_result = hetero_market.search_learnware(user_info)
            single_result = search_result.get_single_results()

            assert len(single_result) == 0, f"Statistical search failed!"

            # modify semantic info with mismatch dim, use homo search and print "User data feature dimensions mismatch with semantic specification."
            print(">> mismatch dim test")
            semantic_spec = copy.deepcopy(user_semantic)
            semantic_spec["Input"] = copy.deepcopy(input_description_list[idx % 2])
            semantic_spec["Input"]["Dimension"] = user_dim - 2
            semantic_spec["Input"]["Description"] = {
                str(key): semantic_spec["Input"]["Description"][str(key)] for key in range(user_dim)
            }

            user_info = BaseUserInfo(semantic_spec=semantic_spec, stat_info={"RKMETableSpecification": user_spec})
            search_result = hetero_market.search_learnware(user_info)
            single_result = search_result.get_single_results()

            assert len(single_result) == 0, f"Statistical search failed!"

        rmtree(test_folder)  # rm -r test_folder

        # homo test
        print("\n+++++ HOMO TEST ++++++")
        test_folder = os.path.join(curr_root, "test_stat")

        for idx, zip_path in enumerate(self.zip_path_list):
            unzip_dir = os.path.join(test_folder, f"{idx}")

            # unzip -o -q zip_path -d unzip_dir
            if os.path.exists(unzip_dir):
                rmtree(unzip_dir)
            os.makedirs(unzip_dir, exist_ok=True)
            with zipfile.ZipFile(zip_path, "r") as zip_obj:
                zip_obj.extractall(path=unzip_dir)

            user_spec = RKMETableSpecification()
            user_spec.load(os.path.join(unzip_dir, "stat.json"))
            user_info = BaseUserInfo(semantic_spec=user_semantic, stat_info={"RKMETableSpecification": user_spec})
            search_result = hetero_market.search_learnware(user_info)
            single_result = search_result.get_single_results()
            multiple_result = search_result.get_multiple_results()

            assert len(single_result) >= 1, f"Statistical search failed!"
            print(f"search result of user{idx}:")
            for single_item in single_result:
                print(f"score: {single_item.score}, learnware_id: {single_item.learnware.id}")

            for multiple_item in multiple_result:
                print(f"mixture_score: {multiple_item.score}\n")
                mixture_id = " ".join([learnware.id for learnware in multiple_item.learnwares])
                print(f"mixture_learnware: {mixture_id}\n")

        rmtree(test_folder)  # rm -r test_folder

    def test_model_reuse(self, learnware_num=5):
        # generate toy regression problem
        X, y = make_regression(n_samples=5000, n_informative=10, n_features=15, noise=0.1, random_state=0)

        # generate rkme
        user_spec = generate_rkme_table_spec(X=X, gamma=0.1, cuda_idx=0)

        # generate specification
        semantic_spec = copy.deepcopy(user_semantic)
        semantic_spec["Input"] = user_description_list[0]
        user_info = BaseUserInfo(semantic_spec=semantic_spec, stat_info={"RKMETableSpecification": user_spec})

        # learnware market search
        hetero_market = self.test_train_market_model(learnware_num)
        search_result = hetero_market.search_learnware(user_info)
        single_result = search_result.get_single_results()
        multiple_result = search_result.get_multiple_results()
        # print search results
        for single_item in single_result:
            print(f"score: {single_item.score}, learnware_id: {single_item.learnware.id}")

        for multiple_item in multiple_result:
            print(
                f"mixture_score: {multiple_item.score}, mixture_learnware_ids: {[item.id for item in multiple_item.learnwares]}"
            )

        # single model reuse
        hetero_learnware = HeteroMapAlignLearnware(single_result[0].learnware, mode="regression")
        hetero_learnware.align(user_spec, X[:100], y[:100])
        single_predict_y = hetero_learnware.predict(X)

        # multi model reuse
        hetero_learnware_list = []
        for learnware in multiple_result[0].learnwares:
            hetero_learnware = HeteroMapAlignLearnware(learnware, mode="regression")
            hetero_learnware.align(user_spec, X[:100], y[:100])
            hetero_learnware_list.append(hetero_learnware)

        # Use averaging ensemble reuser to reuse the searched learnwares to make prediction
        reuse_ensemble = AveragingReuser(learnware_list=hetero_learnware_list, mode="mean")
        ensemble_predict_y = reuse_ensemble.predict(user_data=X)

        # Use ensemble pruning reuser to reuse the searched learnwares to make prediction
        reuse_ensemble = EnsemblePruningReuser(learnware_list=hetero_learnware_list, mode="regression")
        reuse_ensemble.fit(X[:100], y[:100])
        ensemble_pruning_predict_y = reuse_ensemble.predict(user_data=X)

        print("Single model RMSE by finetune:", mean_squared_error(y, single_predict_y, squared=False))
        print("Averaging Reuser RMSE:", mean_squared_error(y, ensemble_predict_y, squared=False))
        print("Ensemble Pruning Reuser RMSE:", mean_squared_error(y, ensemble_pruning_predict_y, squared=False))


 def suite():
    _suite = unittest.TestSuite()
    _suite.addTest(TestMarket("test_prepare_learnware_randomly"))
    _suite.addTest(TestMarket("test_generated_learnwares"))
    _suite.addTest(TestMarket("test_upload_delete_learnware"))
    _suite.addTest(TestMarket("test_train_market_model"))
    _suite.addTest(TestMarket("test_search_semantics"))
    _suite.addTest(TestMarket("test_stat_search"))
    _suite.addTest(TestMarket("test_model_reuse"))
    return _suite


 if __name__ == "__main__":
    runner = unittest.TextTestRunner()
    runner.run(suite())
--- a/tests/test_learnware_client/test_all_learnware.py
+++ b/tests/test_learnware_client/test_all_learnware.py
@@ -3,17 +3,20 @@ import json
 import zipfile
 import unittest
 import tempfile
 import argparse

 from learnware.client import LearnwareClient
 from learnware.specification import Specification
 from learnware.specification import generate_semantic_spec
 from learnware.market import BaseUserInfo


 class TestAllLearnware(unittest.TestCase):
    def setUp(self):
        unittest.TestCase.setUpClass()
        dir_path = os.path.dirname(__file__)
        config_path = os.path.join(dir_path, "config.json")
    client = LearnwareClient()

    @classmethod
    def setUpClass(cls) -> None:
        config_path = os.path.join(os.path.dirname(__file__), "config.json")

        if not os.path.exists(config_path):
            data = {"email": None, "token": None}
            with open(config_path, "w") as file:
@@ -21,42 +24,53 @@ class TestAllLearnware(unittest.TestCase):

        with open(config_path, "r") as file:
            data = json.load(file)
            email = data["email"]
            token = data["token"]
            email = data.get("email")
            token = data.get("token")

        if email is None or token is None:
            raise ValueError("Please set email and token in config.json.")
        self.client = LearnwareClient()
        self.client.login(email, token)
            print("Please set email and token in config.json.")
        else:
            cls.client.login(email, token)

    def _skip_test(self):
        if not self.client.is_login():
            print("Client does not login!")
            return True
        return False

    def test_all_learnware(self):
        max_learnware_num = 2000
        semantic_spec = self.client.create_semantic_specification()
        user_info = BaseUserInfo(semantic_spec=semantic_spec, stat_info={})
        result = self.client.search_learnware(user_info, page_size=max_learnware_num)
        
        learnware_ids = result["single"]["learnware_ids"]
        keys = [key for key in result["single"]["semantic_specifications"][0]]
        print(f"result size: {len(learnware_ids)}")
        print(f"key in result: {keys}")

        failed_ids = []
        with tempfile.TemporaryDirectory(prefix="learnware_") as tempdir:
            for idx in learnware_ids:
                zip_path = os.path.join(tempdir, f"test_{idx}.zip")
                self.client.download_learnware(idx, zip_path)
                with zipfile.ZipFile(zip_path, "r") as zip_file:
                    with zip_file.open("semantic_specification.json") as json_file:
                        semantic_spec = json.load(json_file)
                try:
                    LearnwareClient.check_learnware(zip_path, semantic_spec)
                    print(f"check learnware {idx} succeed")
                except:
                    failed_ids.append(idx)
                    print(f"check learnware {idx} failed!!!")

                print(f"The currently failed learnware ids: {failed_ids}")
        if not self._skip_test():
            semantic_spec = generate_semantic_spec()
            user_info = BaseUserInfo(semantic_spec=semantic_spec, stat_info={})
            result = self.client.search_learnware(user_info, page_size=None)

            learnware_ids = result["single"]["learnware_ids"]
            keys = [key for key in result["single"]["semantic_specifications"][0]]
            print(f"result size: {len(learnware_ids)}")
            print(f"key in result: {keys}")

            failed_ids = []
            with tempfile.TemporaryDirectory(prefix="learnware_") as tempdir:
                for idx in learnware_ids:
                    zip_path = os.path.join(tempdir, f"test_{idx}.zip")
                    self.client.download_learnware(idx, zip_path)
                    try:
                        semantic_spec = self.client.get_semantic_specification(idx)
                        LearnwareClient.check_learnware(zip_path, semantic_spec)
                        print(f"check learnware {idx} succeed")
                    except:
                        failed_ids.append(idx)
                        print(f"check learnware {idx} failed!!!")

                    print(f"The currently failed learnware ids: {failed_ids}")


 def suite():
    _suite = unittest.TestSuite()
    _suite.addTest(TestAllLearnware("test_all_learnware"))
    return _suite


 if __name__ == "__main__":
    unittest.main()
    runner = unittest.TextTestRunner()
    runner.run(suite())
--- a/tests/test_learnware_client/test_check_learnware.py
+++ b/tests/test_learnware_client/test_check_learnware.py
@@ -4,7 +4,6 @@ import zipfile
 import unittest
 import tempfile


 from learnware.client import LearnwareClient


@@ -13,15 +12,19 @@ class TestCheckLearnware(unittest.TestCase):
        unittest.TestCase.setUpClass()
        self.client = LearnwareClient()

    def test_check_learnware_pip(self):
    def test_check_learnware_pip_only_zip(self):
        learnware_id = "00000208"
        with tempfile.TemporaryDirectory(prefix="learnware_") as tempdir:
            self.zip_path = os.path.join(tempdir, "test.zip")
            self.client.download_learnware(learnware_id, self.zip_path)
            LearnwareClient.check_learnware(self.zip_path)

            with zipfile.ZipFile(self.zip_path, "r") as zip_file:
                with zip_file.open("semantic_specification.json") as json_file:
                    semantic_spec = json.load(json_file)
    def test_check_learnware_pip(self):
        learnware_id = "00000208"
        with tempfile.TemporaryDirectory(prefix="learnware_") as tempdir:
            self.zip_path = os.path.join(tempdir, "test.zip")
            self.client.download_learnware(learnware_id, self.zip_path)
            semantic_spec = self.client.get_semantic_specification(learnware_id)
            LearnwareClient.check_learnware(self.zip_path, semantic_spec)

    def test_check_learnware_conda(self):
@@ -29,10 +32,7 @@ class TestCheckLearnware(unittest.TestCase):
        with tempfile.TemporaryDirectory(prefix="learnware_") as tempdir:
            self.zip_path = os.path.join(tempdir, "test.zip")
            self.client.download_learnware(learnware_id, self.zip_path)

            with zipfile.ZipFile(self.zip_path, "r") as zip_file:
                with zip_file.open("semantic_specification.json") as json_file:
                    semantic_spec = json.load(json_file)
            semantic_spec = self.client.get_semantic_specification(learnware_id)
            LearnwareClient.check_learnware(self.zip_path, semantic_spec)

    def test_check_learnware_dependency(self):
@@ -40,10 +40,7 @@ class TestCheckLearnware(unittest.TestCase):
        with tempfile.TemporaryDirectory(prefix="learnware_") as tempdir:
            self.zip_path = os.path.join(tempdir, "test.zip")
            self.client.download_learnware(learnware_id, self.zip_path)

            with zipfile.ZipFile(self.zip_path, "r") as zip_file:
                with zip_file.open("semantic_specification.json") as json_file:
                    semantic_spec = json.load(json_file)
            semantic_spec = self.client.get_semantic_specification(learnware_id)
            LearnwareClient.check_learnware(self.zip_path, semantic_spec)

    def test_check_learnware_image(self):
@@ -51,10 +48,7 @@ class TestCheckLearnware(unittest.TestCase):
        with tempfile.TemporaryDirectory(prefix="learnware_") as tempdir:
            self.zip_path = os.path.join(tempdir, "test.zip")
            self.client.download_learnware(learnware_id, self.zip_path)

            with zipfile.ZipFile(self.zip_path, "r") as zip_file:
                with zip_file.open("semantic_specification.json") as json_file:
                    semantic_spec = json.load(json_file)
            semantic_spec = self.client.get_semantic_specification(learnware_id)
            LearnwareClient.check_learnware(self.zip_path, semantic_spec)

    def test_check_learnware_text(self):
@@ -62,12 +56,21 @@ class TestCheckLearnware(unittest.TestCase):
        with tempfile.TemporaryDirectory(prefix="learnware_") as tempdir:
            self.zip_path = os.path.join(tempdir, "test.zip")
            self.client.download_learnware(learnware_id, self.zip_path)

            with zipfile.ZipFile(self.zip_path, "r") as zip_file:
                with zip_file.open("semantic_specification.json") as json_file:
                    semantic_spec = json.load(json_file)
            semantic_spec = self.client.get_semantic_specification(learnware_id)
            LearnwareClient.check_learnware(self.zip_path, semantic_spec)


 def suite():
    _suite = unittest.TestSuite()
    _suite.addTest(TestCheckLearnware("test_check_learnware_pip_only_zip"))
    _suite.addTest(TestCheckLearnware("test_check_learnware_pip"))
    _suite.addTest(TestCheckLearnware("test_check_learnware_conda"))
    _suite.addTest(TestCheckLearnware("test_check_learnware_dependency"))
    _suite.addTest(TestCheckLearnware("test_check_learnware_image"))
    _suite.addTest(TestCheckLearnware("test_check_learnware_text"))
    return _suite


 if __name__ == "__main__":
    unittest.main()
    runner = unittest.TextTestRunner()
    runner.run(suite())
--- a/tests/test_learnware_client/test_container.py
+++ b/tests/test_learnware_client/test_container.py
@@ -0,0 +1,51 @@
 import unittest
 import numpy as np

 from learnware.client import LearnwareClient
 from learnware.client.container import LearnwaresContainer

 class TestContainer(unittest.TestCase):
    def __init__(self, method_name='runTest', mode="all"):
        super(TestContainer, self).__init__(method_name)
        self.modes = []
        if mode in {"all", "conda"}:
            self.modes.append("conda")
        if mode in {"all", "docker"}:
            self.modes.append("docker")
    
    def setUp(self):
        self.client = LearnwareClient()

    def _test_container_with_pip(self, mode):
        learnware_id = "00000147"
        learnware = self.client.load_learnware(learnware_id=learnware_id)
        with LearnwaresContainer(learnware, ignore_error=False, mode=mode) as env_container:
            learnware = env_container.get_learnwares_with_container()[0]
            input_array = np.random.random(size=(20, 23))
            print(learnware.predict(input_array))

    def _test_container_with_conda(self, mode):
        learnware_id = "00000148"
        learnware = self.client.load_learnware(learnware_id=learnware_id)
        with LearnwaresContainer(learnware, ignore_error=False, mode=mode) as env_container:
            learnware = env_container.get_learnwares_with_container()[0]
            input_array = np.random.random(size=(20, 204))
            print(learnware.predict(input_array))

    def test_container_with_pip(self):
        for mode in self.modes:
            self._test_container_with_pip(mode=mode)
    
    def test_container_with_conda(self):
        for mode in self.modes:
            self._test_container_with_conda(mode=mode)

 def suite():
    _suite = unittest.TestSuite()
    _suite.addTest(TestContainer("test_container_with_pip", mode="all"))
    _suite.addTest(TestContainer("test_container_with_conda", mode="all"))
    return _suite

 if __name__ == "__main__":
    runner = unittest.TextTestRunner()
    runner.run(suite())
--- a/tests/test_learnware_client/test_load_conda.py
+++ b/tests/test_learnware_client/test_load_conda.py
@@ -1,82 +0,0 @@
 import os
 import unittest
 import zipfile
 import numpy as np

 import learnware
 from learnware.learnware import get_learnware_from_dirpath
 from learnware.client import LearnwareClient
 from learnware.client.container import ModelCondaContainer, LearnwaresContainer
 from learnware.reuse import AveragingReuser


 class TestLearnwareLoad(unittest.TestCase):
    def setUp(self):
        unittest.TestCase.setUpClass()
        self.client = LearnwareClient()

        root = os.path.dirname(__file__)
        self.learnware_ids = ["00000910", "00000899", "00000900"]
        self.zip_paths = [os.path.join(root, x) for x in ["1.zip", "2.zip", "3.zip"]]

    def test_load_single_learnware_by_zippath(self):
        for learnware_id, zip_path in zip(self.learnware_ids, self.zip_paths):
            self.client.download_learnware(learnware_id, zip_path)

        learnware_list = [
            self.client.load_learnware(learnware_path=zippath, runnable_option="conda") for zippath in self.zip_paths
        ]
        reuser = AveragingReuser(learnware_list, mode="mean")
        input_array = np.random.random(size=(20, 40))
        print(reuser.predict(input_array))

        for learnware in learnware_list:
            print(learnware.id, learnware.predict(input_array))

    def test_load_multi_learnware_by_zippath(self):
        for learnware_id, zip_path in zip(self.learnware_ids, self.zip_paths):
            self.client.download_learnware(learnware_id, zip_path)

        learnware_list = self.client.load_learnware(learnware_path=self.zip_paths, runnable_option="conda")
        reuser = AveragingReuser(learnware_list, mode="mean")
        input_array = np.random.random(size=(20, 40))
        print(reuser.predict(input_array))

        for learnware in learnware_list:
            print(learnware.id, learnware.predict(input_array))

    def test_load_single_learnware_by_id(self):
        learnware_list = [
            self.client.load_learnware(learnware_id=idx, runnable_option="conda") for idx in self.learnware_ids
        ]
        reuser = AveragingReuser(learnware_list, mode="mean")
        input_array = np.random.random(size=(20, 40))
        print(reuser.predict(input_array))

        for learnware in learnware_list:
            print(learnware.id, learnware.predict(input_array))

    def test_load_multi_learnware_by_id(self):
        learnware_list = self.client.load_learnware(learnware_id=self.learnware_ids, runnable_option="conda")
        reuser = AveragingReuser(learnware_list, mode="mean")
        input_array = np.random.random(size=(20, 40))
        print(reuser.predict(input_array))

        for learnware in learnware_list:
            print(learnware.id, learnware.predict(input_array))

    def test_load_single_learnware_by_id_pip(self):
        learnware_id = "00000147"
        learnware = self.client.load_learnware(learnware_id=learnware_id, runnable_option="conda")
        input_array = np.random.random(size=(20, 23))
        print(learnware.predict(input_array))

    def test_load_single_learnware_by_id_conda(self):
        learnware_id = "00000148"
        learnware = self.client.load_learnware(learnware_id=learnware_id, runnable_option="conda")
        input_array = np.random.random(size=(20, 204))
        print(learnware.predict(input_array))


 if __name__ == "__main__":
    unittest.main()
--- a/tests/test_learnware_client/test_load_docker.py
+++ b/tests/test_learnware_client/test_load_docker.py
@@ -1,57 +0,0 @@
 import os
 import unittest
 import zipfile
 import numpy as np

 import learnware
 from learnware.learnware import get_learnware_from_dirpath
 from learnware.client import LearnwareClient
 from learnware.client.container import ModelCondaContainer, LearnwaresContainer
 from learnware.reuse import AveragingReuser


 class TestLearnwareLoad(unittest.TestCase):
    def setUp(self):
        unittest.TestCase.setUpClass()
        self.client = LearnwareClient()

        root = os.path.dirname(__file__)
        self.learnware_ids = ["00000910", "00000899", "00000900"]
        self.zip_paths = [os.path.join(root, x) for x in ["1.zip", "2.zip", "3.zip"]]

    def test_load_multi_learnware_by_zippath(self):
        for learnware_id, zip_path in zip(self.learnware_ids, self.zip_paths):
            self.client.download_learnware(learnware_id, zip_path)

        learnware_list = self.client.load_learnware(learnware_path=self.zip_paths, runnable_option="docker")
        reuser = AveragingReuser(learnware_list, mode="mean")
        input_array = np.random.random(size=(20, 40))
        print(reuser.predict(input_array))

        for learnware in learnware_list:
            print(learnware.id, learnware.predict(input_array))

    def test_load_multi_learnware_by_id(self):
        learnware_list = self.client.load_learnware(learnware_id=self.learnware_ids, runnable_option="docker")
        reuser = AveragingReuser(learnware_list, mode="mean")
        input_array = np.random.random(size=(20, 40))
        print(reuser.predict(input_array))

        for learnware in learnware_list:
            print(learnware.id, learnware.predict(input_array))

    def test_load_single_learnware_by_id_pip(self):
        learnware_id = "00000147"
        learnware = self.client.load_learnware(learnware_id=learnware_id, runnable_option="docker")
        input_array = np.random.random(size=(20, 23))
        print(learnware.predict(input_array))

    def test_load_single_learnware_by_id_conda(self):
        learnware_id = "00000148"
        learnware = self.client.load_learnware(learnware_id=learnware_id, runnable_option="docker")
        input_array = np.random.random(size=(20, 204))
        print(learnware.predict(input_array))


 if __name__ == "__main__":
    unittest.main()
--- a/tests/test_learnware_client/test_load_learnware.py
+++ b/tests/test_learnware_client/test_load_learnware.py
@@ -0,0 +1,61 @@
 import os
 import unittest
 import numpy as np

 from learnware.client import LearnwareClient
 from learnware.reuse import AveragingReuser

 class TestLearnwareLoad(unittest.TestCase):
    def __init__(self, method_name='runTest', mode="all"):
        super(TestLearnwareLoad, self).__init__(method_name)
        self.runnable_options = []
        if mode in {"all", "conda"}:
            self.runnable_options.append("conda")
        if mode in {"all", "docker"}:
            self.runnable_options.append("docker")

    def setUp(self):
        self.client = LearnwareClient()
        root = os.path.dirname(__file__)
        self.learnware_ids = ["00000910", "00000899", "00000900"]
        self.zip_paths = [os.path.join(root, x) for x in ["1.zip", "2.zip", "3.zip"]]

    def _test_load_learnware_by_zippath(self, runnable_option):
        for learnware_id, zip_path in zip(self.learnware_ids, self.zip_paths):
            self.client.download_learnware(learnware_id, zip_path)

        learnware_list = self.client.load_learnware(learnware_path=self.zip_paths, runnable_option=runnable_option)
        reuser = AveragingReuser(learnware_list, mode="vote_by_label")
        input_array = np.random.random(size=(20, 13))
        print(reuser.predict(input_array))
        for learnware in learnware_list:
            print(learnware.id, learnware.predict(input_array))


    def _test_load_learnware_by_id(self, runnable_option):
        learnware_list = self.client.load_learnware(learnware_id=self.learnware_ids, runnable_option=runnable_option)
        reuser = AveragingReuser(learnware_list, mode="vote_by_label")
        input_array = np.random.random(size=(20, 13))
        print(reuser.predict(input_array))

        for learnware in learnware_list:
            print(learnware.id, learnware.predict(input_array))

    def test_load_learnware_by_zippath(self):
        for runnable_option in self.runnable_options:
            self._test_load_learnware_by_zippath(runnable_option=runnable_option)
    
    def test_load_learnware_by_id(self):
        for runnable_option in self.runnable_options:
            self._test_load_learnware_by_id(runnable_option=runnable_option)
            

 def suite():
    _suite = unittest.TestSuite()
    _suite.addTest(TestLearnwareLoad("test_load_learnware_by_zippath", mode="all"))
    _suite.addTest(TestLearnwareLoad("test_load_learnware_by_id", mode="all"))
    return _suite

 if __name__ == "__main__":
    runner = unittest.TextTestRunner()
    runner.run(suite())
--- a/tests/test_learnware_client/test_reuse.py
+++ b/tests/test_learnware_client/test_reuse.py
@@ -1,34 +0,0 @@
 import zipfile
 import numpy as np

 from learnware.learnware import get_learnware_from_dirpath
 from learnware.client.container import LearnwaresContainer
 from learnware.reuse import AveragingReuser
 from learnware.tests.module import get_semantic_specification

 if __name__ == "__main__":
    semantic_specification = get_semantic_specification()
    zip_paths = [
        "/home/bixd/workspace/learnware/Learnware/tests/test_learnware_client/rf_tic.zip",
        "/home/bixd/workspace/learnware/Learnware/tests/test_learnware_client/svc_tic.zip",
    ]
    dir_paths = [
        "/home/bixd/workspace/learnware/Learnware/tests/test_learnware_client/rf_tic",
        "/home/bixd/workspace/learnware/Learnware/tests/test_learnware_client/svc_tic",
    ]

    learnware_list = []
    for id, (zip_path, dir_path) in enumerate(zip(zip_paths, dir_paths)):
        with zipfile.ZipFile(zip_path, "r") as z_file:
            z_file.extractall(dir_path)

        learnware = get_learnware_from_dirpath(f"test_id{id}", semantic_specification, dir_path)
        learnware_list.append(learnware)

    with LearnwaresContainer(learnware_list) as env_container:
        learnware_list = env_container.get_learnwares_with_container()
        reuser = AveragingReuser(learnware_list, mode="vote")
        input_array = np.random.randint(0, 3, size=(20, 9))
        print(reuser.predict(input_array).argmax(axis=1))
        for id, ind_learner in enumerate(learnware_list):
            print(f"learner_{id}", reuser.predict(input_array).argmax(axis=1))
--- a/tests/test_learnware_client/test_upload.py
+++ b/tests/test_learnware_client/test_upload.py
@@ -4,13 +4,16 @@ import unittest
 import tempfile

 from learnware.client import LearnwareClient
 from learnware.specification import generate_semantic_spec


 class TestAllLearnware(unittest.TestCase):
    def setUp(self):
        unittest.TestCase.setUpClass()
        dir_path = os.path.dirname(__file__)
        config_path = os.path.join(dir_path, "config.json")
 class TestUpload(unittest.TestCase):
    client = LearnwareClient()

    @classmethod
    def setUpClass(cls) -> None:
        config_path = os.path.join(os.path.dirname(__file__), "config.json")

        if not os.path.exists(config_path):
            data = {"email": None, "token": None}
            with open(config_path, "w") as file:
@@ -18,52 +21,65 @@ class TestAllLearnware(unittest.TestCase):

        with open(config_path, "r") as file:
            data = json.load(file)
            email = data["email"]
            token = data["token"]
            email = data.get("email")
            token = data.get("token")

        if email is None or token is None:
            raise ValueError("Please set email and token in config.json.")
        self.client = LearnwareClient()
        self.client.login(email, token)
            print("Please set email and token in config.json.")
        else:
            cls.client.login(email, token)

    def _skip_test(self):
        if not self.client.is_login():
            print("Client does not login!")
            return True
        return False

    def test_upload(self):
        input_description = {
            "Dimension": 13,
            "Description": {"0": "age", "1": "weight", "2": "body length", "3": "animal type", "4": "claw length"},
        }
        output_description = {
            "Dimension": 1,
            "Description": {
                "0": "the probability of being a cat",
            },
        }
        semantic_spec = self.client.create_semantic_specification(
            name="learnware_example",
            description="Just a example for uploading a learnware",
            data_type="Table",
            task_type="Classification",
            library_type="Scikit-learn",
            scenarios=["Business", "Financial"],
            input_description=input_description,
            output_description=output_description,
        )
        assert isinstance(semantic_spec, dict)

        download_learnware_id = "00000084"
        with tempfile.TemporaryDirectory(prefix="learnware_") as tempdir:
            zip_path = os.path.join(tempdir, f"test.zip")
            self.client.download_learnware(download_learnware_id, zip_path)
            learnware_id = self.client.upload_learnware(
                learnware_zip_path=zip_path, semantic_specification=semantic_spec
        if not self._skip_test():
            input_description = {
                "Dimension": 13,
                "Description": {"0": "age", "1": "weight", "2": "body length", "3": "animal type", "4": "claw length"},
            }
            output_description = {
                "Dimension": 2,
                "Description": {"0": "cat", "1": "not cat"},
            }
            semantic_spec = generate_semantic_spec(
                name="learnware_example",
                description="Just a example for uploading a learnware",
                data_type="Table",
                task_type="Classification",
                library_type="Scikit-learn",
                scenarios=["Business", "Financial"],
                license="MIT",
                input_description=input_description,
                output_description=output_description,
            )
            assert isinstance(semantic_spec, dict)

            download_learnware_id = "00000084"
            with tempfile.TemporaryDirectory(prefix="learnware_") as tempdir:
                zip_path = os.path.join(tempdir, f"test.zip")
                self.client.download_learnware(download_learnware_id, zip_path)
                learnware_id = self.client.upload_learnware(
                    learnware_zip_path=zip_path, semantic_specification=semantic_spec
                )

                uploaded_ids = [learnware["learnware_id"] for learnware in self.client.list_learnware()]
                assert learnware_id in uploaded_ids

                self.client.delete_learnware(learnware_id)
                uploaded_ids = [learnware["learnware_id"] for learnware in self.client.list_learnware()]
                assert learnware_id not in uploaded_ids

            uploaded_ids = [learnware["learnware_id"] for learnware in self.client.list_learnware()]
            assert learnware_id in uploaded_ids

            self.client.delete_learnware(learnware_id)
            uploaded_ids = [learnware["learnware_id"] for learnware in self.client.list_learnware()]
            assert learnware_id not in uploaded_ids
 def suite():
    _suite = unittest.TestSuite()
    _suite.addTest(TestUpload("test_upload"))
    return _suite


 if __name__ == "__main__":
    unittest.main()
    runner = unittest.TextTestRunner()
    runner.run(suite())
--- a/tests/test_specification/test_hetero_spec.py
+++ b/tests/test_specification/test_hetero_spec.py
@@ -0,0 +1,43 @@
 import os
 import json
 import string
 import random
 import torch
 import unittest
 import tempfile
 import numpy as np

 from learnware.specification import RKMETableSpecification, HeteroMapTableSpecification
 from learnware.specification import generate_stat_spec
 from learnware.market.heterogeneous.organizer import HeteroMap

 class TestTableRKME(unittest.TestCase):
    
    def setUp(self):
        self.hetero_map = HeteroMap()
        
    def _test_hetero_spec(self, X):
        rkme: RKMETableSpecification = generate_stat_spec(type="table", X=X)
        hetero_spec = self.hetero_map.hetero_mapping(rkme_spec=rkme, features=dict())
        with tempfile.TemporaryDirectory(prefix="learnware_") as tempdir:
            rkme_path = os.path.join(tempdir, "rkme.json")
            hetero_spec.save(rkme_path)

            with open(rkme_path, "r") as f:
                data = json.load(f)
                assert data["type"] == "HeteroMapTableSpecification"

            rkme2 = HeteroMapTableSpecification()
            rkme2.load(rkme_path)
            assert rkme2.type == "HeteroMapTableSpecification"
        
            
    def test_hetero_rkme(self):
        self._test_hetero_spec(np.random.uniform(-10000, 10000, size=(5000, 200)))
        self._test_hetero_spec(np.random.uniform(-10000, 10000, size=(10000, 100)))
        self._test_hetero_spec(np.random.uniform(-10000, 10000, size=(5, 20)))
        self._test_hetero_spec(np.random.uniform(-10000, 10000, size=(1, 50)))
        self._test_hetero_spec(np.random.uniform(-10000, 10000, size=(100, 150)))
        
 if __name__ == "__main__":
    unittest.main()
--- a/tests/test_specification/test_image_rkme.py
+++ b/tests/test_specification/test_image_rkme.py
@@ -0,0 +1,38 @@
 import os
 import json
 import torch
 import unittest
 import tempfile
 import numpy as np

 from learnware.specification import RKMEImageSpecification
 from learnware.specification import generate_stat_spec


 class TestImageRKME(unittest.TestCase):
    @staticmethod
    def _test_image_rkme(X):
        image_rkme = generate_stat_spec(type="image", X=X, steps=10)

        with tempfile.TemporaryDirectory(prefix="learnware_") as tempdir:
            rkme_path = os.path.join(tempdir, "rkme.json")
            image_rkme.save(rkme_path)

            with open(rkme_path, "r") as f:
                data = json.load(f)
                assert data["type"] == "RKMEImageSpecification"

            rkme2 = RKMEImageSpecification()
            rkme2.load(rkme_path)
            assert rkme2.type == "RKMEImageSpecification"
                
    def test_image_rkme(self):
        self._test_image_rkme(np.random.randint(0, 255, size=(2000, 3, 32, 32)))
        self._test_image_rkme(np.random.randint(0, 255, size=(100, 1, 128, 128)))
        self._test_image_rkme(np.random.randint(0, 255, size=(50, 3, 128, 128)) / 255)
        self._test_image_rkme(torch.randint(0, 255, (2000, 3, 32, 32)))
        self._test_image_rkme(torch.randint(0, 255, (20, 3, 128, 128)))
        self._test_image_rkme(torch.randint(0, 255, (1, 1, 128, 128)) / 255)

 if __name__ == "__main__":
    unittest.main()
--- a/tests/test_specification/test_rkme.py
+++ b/tests/test_specification/test_rkme.py
@@ -1,104 +0,0 @@
 import os
 import json
 import string
 import random
 import torch
 import unittest
 import tempfile
 import numpy as np

 from learnware.specification import RKMETableSpecification, RKMEImageSpecification, RKMETextSpecification
 from learnware.specification import generate_stat_spec


 class TestRKME(unittest.TestCase):
    def test_rkme(self):
        def _test_table_rkme(X):
            rkme = generate_stat_spec(type="table", X=X)

            with tempfile.TemporaryDirectory(prefix="learnware_") as tempdir:
                rkme_path = os.path.join(tempdir, "rkme.json")
                rkme.save(rkme_path)

                with open(rkme_path, "r") as f:
                    data = json.load(f)
                    assert data["type"] == "RKMETableSpecification"

                rkme2 = RKMETableSpecification()
                rkme2.load(rkme_path)
                assert rkme2.type == "RKMETableSpecification"

        _test_table_rkme(np.random.uniform(-10000, 10000, size=(5000, 200)))
        _test_table_rkme(np.random.uniform(-10000, 10000, size=(10000, 100)))
        _test_table_rkme(np.random.uniform(-10000, 10000, size=(5, 20)))
        _test_table_rkme(np.random.uniform(-10000, 10000, size=(1, 50)))
        _test_table_rkme(np.random.uniform(-10000, 10000, size=(100, 150)))

    def test_image_rkme(self):
        def _test_image_rkme(X):
            image_rkme = generate_stat_spec(type="image", X=X, steps=10)

            with tempfile.TemporaryDirectory(prefix="learnware_") as tempdir:
                rkme_path = os.path.join(tempdir, "rkme.json")
                image_rkme.save(rkme_path)

                with open(rkme_path, "r") as f:
                    data = json.load(f)
                    assert data["type"] == "RKMEImageSpecification"

                rkme2 = RKMEImageSpecification()
                rkme2.load(rkme_path)
                assert rkme2.type == "RKMEImageSpecification"

        _test_image_rkme(np.random.randint(0, 255, size=(2000, 3, 32, 32)))
        _test_image_rkme(np.random.randint(0, 255, size=(100, 1, 128, 128)))
        _test_image_rkme(np.random.randint(0, 255, size=(50, 3, 128, 128)) / 255)

        _test_image_rkme(torch.randint(0, 255, (2000, 3, 32, 32)))
        _test_image_rkme(torch.randint(0, 255, (20, 3, 128, 128)))
        _test_image_rkme(torch.randint(0, 255, (1, 1, 128, 128)) / 255)

    def test_text_rkme(self):
        def generate_random_text_list(num, text_type="en", min_len=10, max_len=1000):
            text_list = []
            for i in range(num):
                length = random.randint(min_len, max_len)
                if text_type == "en":
                    characters = string.ascii_letters + string.digits + string.punctuation
                    result_str = "".join(random.choice(characters) for i in range(length))
                    text_list.append(result_str)
                elif text_type == "zh":
                    result_str = "".join(chr(random.randint(0x4E00, 0x9FFF)) for i in range(length))
                    text_list.append(result_str)
                else:
                    raise ValueError("Type should be en or zh")
            return text_list

        def _test_text_rkme(X):
            rkme = generate_stat_spec(type="text", X=X)

            with tempfile.TemporaryDirectory(prefix="learnware_") as tempdir:
                rkme_path = os.path.join(tempdir, "rkme.json")
                rkme.save(rkme_path)

                with open(rkme_path, "r") as f:
                    data = json.load(f)
                    assert data["type"] == "RKMETextSpecification"

                rkme2 = RKMETextSpecification()
                rkme2.load(rkme_path)
                assert rkme2.type == "RKMETextSpecification"

                return rkme2.get_z().shape[1]

        dim1 = _test_text_rkme(generate_random_text_list(3000, "en"))
        dim2 = _test_text_rkme(generate_random_text_list(100, "en"))
        dim3 = _test_text_rkme(generate_random_text_list(50, "zh"))
        dim4 = _test_text_rkme(generate_random_text_list(5000, "zh"))
        dim5 = _test_text_rkme(generate_random_text_list(1, "zh"))

        assert dim1 == dim2 and dim2 == dim3 and dim3 == dim4 and dim4 == dim5


 if __name__ == "__main__":
    unittest.main()
--- a/tests/test_specification/test_table_rkme.py
+++ b/tests/test_specification/test_table_rkme.py
@@ -0,0 +1,36 @@
 import os
 import json
 import unittest
 import tempfile
 import numpy as np

 from learnware.specification import RKMETableSpecification, RKMEImageSpecification, RKMETextSpecification
 from learnware.specification import generate_stat_spec


 class TestTableRKME(unittest.TestCase):
    @staticmethod
    def _test_table_rkme(X):
        rkme = generate_stat_spec(type="table", X=X)

        with tempfile.TemporaryDirectory(prefix="learnware_") as tempdir:
            rkme_path = os.path.join(tempdir, "rkme.json")
            rkme.save(rkme_path)

            with open(rkme_path, "r") as f:
                data = json.load(f)
                assert data["type"] == "RKMETableSpecification"

            rkme2 = RKMETableSpecification()
            rkme2.load(rkme_path)
            assert rkme2.type == "RKMETableSpecification"
            
    def test_table_rkme(self):
        self._test_table_rkme(np.random.uniform(-10000, 10000, size=(5000, 200)))
        self._test_table_rkme(np.random.uniform(-10000, 10000, size=(10000, 100)))
        self._test_table_rkme(np.random.uniform(-10000, 10000, size=(5, 20)))
        self._test_table_rkme(np.random.uniform(-10000, 10000, size=(1, 50)))
        self._test_table_rkme(np.random.uniform(-10000, 10000, size=(100, 150)))

 if __name__ == "__main__":
    unittest.main()
--- a/tests/test_specification/test_text_rkme.py
+++ b/tests/test_specification/test_text_rkme.py
@@ -0,0 +1,58 @@
 import os
 import json
 import string
 import random
 import unittest
 import tempfile

 from learnware.specification import RKMETextSpecification
 from learnware.specification import generate_stat_spec


 class TestTextRKME(unittest.TestCase):
    @staticmethod
    def generate_random_text_list(num, text_type="en", min_len=10, max_len=1000):
            text_list = []
            for i in range(num):
                length = random.randint(min_len, max_len)
                if text_type == "en":
                    characters = string.ascii_letters + string.digits + string.punctuation
                    result_str = "".join(random.choice(characters) for i in range(length))
                    text_list.append(result_str)
                elif text_type == "zh":
                    result_str = "".join(chr(random.randint(0x4E00, 0x9FFF)) for i in range(length))
                    text_list.append(result_str)
                else:
                    raise ValueError("Type should be en or zh")
            return text_list

    @staticmethod
    def _test_text_rkme(X):
        rkme = generate_stat_spec(type="text", X=X)

        with tempfile.TemporaryDirectory(prefix="learnware_") as tempdir:
            rkme_path = os.path.join(tempdir, "rkme.json")
            rkme.save(rkme_path)

            with open(rkme_path, "r") as f:
                data = json.load(f)
                assert data["type"] == "RKMETextSpecification"

            rkme2 = RKMETextSpecification()
            rkme2.load(rkme_path)
            assert rkme2.type == "RKMETextSpecification"

            return rkme2.get_z().shape[1]

    def test_text_rkme(self):
        dim1 = self._test_text_rkme(self.generate_random_text_list(3000, "en"))
        dim2 = self._test_text_rkme(self.generate_random_text_list(100, "en"))
        dim3 = self._test_text_rkme(self.generate_random_text_list(50, "zh"))
        dim4 = self._test_text_rkme(self.generate_random_text_list(5000, "zh"))
        dim5 = self._test_text_rkme(self.generate_random_text_list(1, "zh"))

        assert dim1 == dim2 and dim2 == dim3 and dim3 == dim4 and dim4 == dim5


 if __name__ == "__main__":
    unittest.main()
--- a/tests/test_hetero_market/example_learnwares/config.py
+++ b/tests/test_hetero_market/example_learnwares/config.py
--- a/tests/test_workflow/learnware_example/README.md
+++ b/tests/test_workflow/learnware_example/README.md
@@ -1,10 +0,0 @@
 ## How to Generate Environment Yaml

 * create env config for conda:
 ```shell
 conda env export | grep -v "^prefix: " > environment.yml
 ```
 * recover env from config
 ```
 conda env create -f environment.yml
 ```
--- a/tests/test_workflow/learnware_example/environment.yaml
+++ b/tests/test_workflow/learnware_example/environment.yaml
@@ -1,27 +0,0 @@
 name: learnware_example_env
 channels:
  - defaults
 dependencies:
  - _libgcc_mutex=0.1=main
  - _openmp_mutex=5.1=1_gnu
  - ca-certificates=2023.01.10=h06a4308_0
  - ld_impl_linux-64=2.38=h1181459_1
  - libffi=3.4.2=h6a678d5_6
  - libgcc-ng=11.2.0=h1234567_1
  - libgomp=11.2.0=h1234567_1
  - libstdcxx-ng=11.2.0=h1234567_1
  - ncurses=6.4=h6a678d5_0
  - openssl=1.1.1t=h7f8727e_0
  - pip=23.0.1=py38h06a4308_0
  - python=3.8.16=h7a1cb2a_3
  - readline=8.2=h5eee18b_0
  - setuptools=66.0.0=py38h06a4308_0
  - sqlite=3.41.2=h5eee18b_0
  - tk=8.6.12=h1ccaba5_0
  - wheel=0.38.4=py38h06a4308_0
  - xz=5.2.10=h5eee18b_1
  - zlib=1.2.13=h5eee18b_0
  - pip:
    - joblib==1.2.0
    - learnware==0.0.1.99
    - numpy==1.19.5
--- a/tests/test_workflow/learnware_example/example.yaml
+++ b/tests/test_workflow/learnware_example/example.yaml
@@ -1,8 +0,0 @@
 model:
  class_name: SVM
  kwargs: {}
 stat_specifications:
  - module_path: learnware.specification
    class_name: RKMETableSpecification
    file_name: svm.json
    kwargs: {}  
--- a/tests/test_workflow/learnware_example/example_init.py
+++ b/tests/test_workflow/learnware_example/example_init.py
@@ -1,20 +0,0 @@
 import os
 import joblib
 import numpy as np
 from learnware.model import BaseModel


 class SVM(BaseModel):
    def __init__(self):
        super(SVM, self).__init__(input_shape=(64,), output_shape=(10,))
        dir_path = os.path.dirname(os.path.abspath(__file__))
        self.model = joblib.load(os.path.join(dir_path, "svm.pkl"))

    def fit(self, X: np.ndarray, y: np.ndarray):
        pass

    def predict(self, X: np.ndarray) -> np.ndarray:
        return self.model.predict_proba(X)

    def finetune(self, X: np.ndarray, y: np.ndarray):
        pass
--- a/tests/test_workflow/test_hetero_workflow.py
+++ b/tests/test_workflow/test_hetero_workflow.py
@@ -0,0 +1,321 @@
 import torch
 import pickle
 import unittest
 import os
 import logging
 import tempfile
 import zipfile
 from sklearn.linear_model import Ridge
 from sklearn.datasets import make_regression
 from shutil import copyfile, rmtree
 from sklearn.metrics import mean_squared_error

 import learnware
 learnware.init(logging_level=logging.WARNING)

 from learnware.market import instantiate_learnware_market, BaseUserInfo
 from learnware.specification import RKMETableSpecification, generate_rkme_table_spec, generate_semantic_spec
 from learnware.reuse import HeteroMapAlignLearnware, AveragingReuser, EnsemblePruningReuser
 from learnware.tests.templates import LearnwareTemplate, PickleModelTemplate, StatSpecTemplate

 from hetero_config import input_shape_list, input_description_list, output_description_list, user_description_list


 curr_root = os.path.dirname(os.path.abspath(__file__))

 class TestHeteroWorkflow(unittest.TestCase):
    universal_semantic_config = {
        "data_type": "Table",
        "task_type": "Regression",
        "library_type": "Scikit-learn",
        "scenarios": "Education",
        "license": "MIT",
    }

    def _init_learnware_market(self, organizer_kwargs=None):
        """initialize learnware market"""
        hetero_market = instantiate_learnware_market(
            market_id="hetero_toy", name="hetero", rebuild=True, organizer_kwargs=organizer_kwargs
        )
        return hetero_market

    def test_prepare_learnware_randomly(self, learnware_num=5):
        self.zip_path_list = []

        for i in range(learnware_num):
            learnware_pool_dirpath = os.path.join(curr_root, "learnware_pool_hetero")
            os.makedirs(learnware_pool_dirpath, exist_ok=True)
            learnware_zippath = os.path.join(learnware_pool_dirpath, "ridge_%d.zip" % (i))
            
            print("Preparing Learnware: %d" % (i))

            X, y = make_regression(n_samples=5000, n_informative=15, n_features=input_shape_list[i % 2], noise=0.1, random_state=42)
            clf = Ridge(alpha=1.0)
            clf.fit(X, y)
            pickle_filepath = os.path.join(learnware_pool_dirpath, "ridge.pkl")
            with open(pickle_filepath, "wb") as fout:
                pickle.dump(clf, fout)

            spec = generate_rkme_table_spec(X=X, gamma=0.1)
            spec_filepath = os.path.join(learnware_pool_dirpath, "stat_spec.json")
            spec.save(spec_filepath)

            LearnwareTemplate.generate_learnware_zipfile(
                learnware_zippath=learnware_zippath,
                model_template=PickleModelTemplate(pickle_filepath=pickle_filepath, model_kwargs={"input_shape":(input_shape_list[i % 2],), "output_shape": (1,)}),
                stat_spec_template=StatSpecTemplate(filepath=spec_filepath, type="RKMETableSpecification"),
                requirements=["scikit-learn==0.22"],
            )
            
            self.zip_path_list.append(learnware_zippath)

    
    def _upload_delete_learnware(self, hetero_market, learnware_num, delete):
        self.test_prepare_learnware_randomly(learnware_num)
        self.learnware_num = learnware_num

        print("Total Item:", len(hetero_market))
        assert len(hetero_market) == 0, f"The market should be empty!"

        for idx, zip_path in enumerate(self.zip_path_list):
            semantic_spec = generate_semantic_spec(
                name=f"learnware_{idx}",
                description=f"test_learnware_number_{idx}",
                input_description=input_description_list[idx % 2],
                output_description=output_description_list[idx % 2],
                **self.universal_semantic_config
            )
            hetero_market.add_learnware(zip_path, semantic_spec)

        print("Total Item:", len(hetero_market))
        assert len(hetero_market) == self.learnware_num, f"The number of learnwares must be {self.learnware_num}!"
        curr_inds = hetero_market.get_learnware_ids()
        print("Available ids After Uploading Learnwares:", curr_inds)
        assert len(curr_inds) == self.learnware_num, f"The number of learnwares must be {self.learnware_num}!"

        if delete:
            for learnware_id in curr_inds:
                hetero_market.delete_learnware(learnware_id)
                self.learnware_num -= 1
                assert (
                    len(hetero_market) == self.learnware_num
                ), f"The number of learnwares must be {self.learnware_num}!"

            curr_inds = hetero_market.get_learnware_ids()
            print("Available ids After Deleting Learnwares:", curr_inds)
            assert len(curr_inds) == 0, f"The market should be empty!"

        return hetero_market
    
    def test_upload_delete_learnware(self, learnware_num=5, delete=True):
        hetero_market = self._init_learnware_market()
        return self._upload_delete_learnware(hetero_market, learnware_num, delete)

    def test_train_market_model(self, learnware_num=5, delete=False):
        hetero_market = self._init_learnware_market(
            organizer_kwargs={"auto_update": True, "auto_update_limit": learnware_num}
        )
        hetero_market = self._upload_delete_learnware(hetero_market, learnware_num, delete)
        # organizer=hetero_market.learnware_organizer
        # organizer.train(hetero_market.learnware_organizer.learnware_list.values())
        return hetero_market

    def test_search_semantics(self, learnware_num=5):
        hetero_market = self.test_upload_delete_learnware(learnware_num, delete=False)
        print("Total Item:", len(hetero_market))
        assert len(hetero_market) == self.learnware_num, f"The number of learnwares must be {self.learnware_num}!"

        semantic_spec = generate_semantic_spec(
            name=f"learnware_{learnware_num - 1}",
            **self.universal_semantic_config,
        )
        
        user_info = BaseUserInfo(semantic_spec=semantic_spec)
        search_result = hetero_market.search_learnware(user_info)
        single_result = search_result.get_single_results()

        print(f"Search result1:")
        assert len(single_result) == 1, f"Exact semantic search failed!"
        for search_item in single_result:
            semantic_spec1 = search_item.learnware.get_specification().get_semantic_spec()
            print("Choose learnware:", search_item.learnware.id)
            assert semantic_spec1["Name"]["Values"] == semantic_spec["Name"]["Values"], f"Exact semantic search failed!"

        semantic_spec["Name"]["Values"] = "laernwaer"
        user_info = BaseUserInfo(semantic_spec=semantic_spec)
        search_result = hetero_market.search_learnware(user_info)
        single_result = search_result.get_single_results()

        print(f"Search result2:")
        assert len(single_result) == self.learnware_num, f"Fuzzy semantic search failed!"
        for search_item in single_result:
            print("Choose learnware:", search_item.learnware.id)

    def test_hetero_stat_search(self, learnware_num=5):
        hetero_market = self.test_train_market_model(learnware_num, delete=False)
        print("Total Item:", len(hetero_market))
        
        user_dim = 15

        with tempfile.TemporaryDirectory(prefix="learnware_test_hetero") as test_folder:
            for idx, zip_path in enumerate(self.zip_path_list):
                with zipfile.ZipFile(zip_path, "r") as zip_obj:
                    zip_obj.extractall(path=test_folder)

                user_spec = RKMETableSpecification()
                user_spec.load(os.path.join(test_folder, "stat_spec.json"))
                z = user_spec.get_z()
                z = z[:, :user_dim]
                device = user_spec.device
                z = torch.tensor(z, device=device)
                user_spec.z = z

                print(">> normal case test:")
                semantic_spec = generate_semantic_spec(
                    input_description={
                        "Dimension": user_dim,
                        "Description": {str(key): input_description_list[idx % 2]["Description"][str(key)] for key in range(user_dim)},
                    },
                    **self.universal_semantic_config,
                )
                user_info = BaseUserInfo(semantic_spec=semantic_spec, stat_info={"RKMETableSpecification": user_spec})
                search_result = hetero_market.search_learnware(user_info)
                single_result = search_result.get_single_results()
                multiple_result = search_result.get_multiple_results()
                
                print(f"search result of user{idx}:")
                for single_item in single_result:
                    print(f"score: {single_item.score}, learnware_id: {single_item.learnware.id}")

                for multiple_item in multiple_result:
                    print(
                        f"mixture_score: {multiple_item.score}, mixture_learnware_ids: {[item.id for item in multiple_item.learnwares]}"
                    )

                # inproper key "Task" in semantic_spec, use homo search and print invalid semantic_spec
                print(">> test for key 'Task' has empty 'Values':")
                semantic_spec["Task"] = {"Values": ["Segmentation"], "Type": "Class"}
                user_info = BaseUserInfo(semantic_spec=semantic_spec, stat_info={"RKMETableSpecification": user_spec})
                search_result = hetero_market.search_learnware(user_info)
                single_result = search_result.get_single_results()

                assert len(single_result) == 0, f"Statistical search failed!"

                # delete key "Task" in semantic_spec, use homo search and print WARNING INFO with "User doesn't provide correct task type"
                print(">> delele key 'Task' test:")
                semantic_spec.pop("Task")
                user_info = BaseUserInfo(semantic_spec=semantic_spec, stat_info={"RKMETableSpecification": user_spec})
                search_result = hetero_market.search_learnware(user_info)
                single_result = search_result.get_single_results()

                assert len(single_result) == 0, f"Statistical search failed!"

                # modify semantic info with mismatch dim, use homo search and print "User data feature dimensions mismatch with semantic specification."
                print(">> mismatch dim test")
                semantic_spec = generate_semantic_spec(
                    input_description={
                        "Dimension": user_dim - 2,
                        "Description": {str(key): input_description_list[idx % 2]["Description"][str(key)] for key in range(user_dim)},
                    },
                    **self.universal_semantic_config,
                )
                user_info = BaseUserInfo(semantic_spec=semantic_spec, stat_info={"RKMETableSpecification": user_spec})
                search_result = hetero_market.search_learnware(user_info)
                single_result = search_result.get_single_results()

                assert len(single_result) == 0, f"Statistical search failed!"

    def test_homo_stat_search(self, learnware_num=5):
        hetero_market = self.test_train_market_model(learnware_num, delete=False)
        print("Total Item:", len(hetero_market))
        
        with tempfile.TemporaryDirectory(prefix="learnware_test_hetero") as test_folder:
            for idx, zip_path in enumerate(self.zip_path_list):
                with zipfile.ZipFile(zip_path, "r") as zip_obj:
                    zip_obj.extractall(path=test_folder)

                user_spec = RKMETableSpecification()
                user_spec.load(os.path.join(test_folder, "stat_spec.json"))
                user_semantic = generate_semantic_spec(**self.universal_semantic_config)
                user_info = BaseUserInfo(semantic_spec=user_semantic, stat_info={"RKMETableSpecification": user_spec})
                search_result = hetero_market.search_learnware(user_info)
                single_result = search_result.get_single_results()
                multiple_result = search_result.get_multiple_results()

                assert len(single_result) >= 1, f"Statistical search failed!"
                print(f"search result of user{idx}:")
                for single_item in single_result:
                    print(f"score: {single_item.score}, learnware_id: {single_item.learnware.id}")

                for multiple_item in multiple_result:
                    print(f"mixture_score: {multiple_item.score}\n")
                    mixture_id = " ".join([learnware.id for learnware in multiple_item.learnwares])
                    print(f"mixture_learnware: {mixture_id}\n")

    def test_model_reuse(self, learnware_num=5):
        # generate toy regression problem
        X, y = make_regression(n_samples=5000, n_informative=10, n_features=15, noise=0.1, random_state=0)

        # generate rkme
        user_spec = generate_rkme_table_spec(X=X, gamma=0.1, cuda_idx=0)

        # generate specification
        semantic_spec = generate_semantic_spec(input_description=user_description_list[0], **self.universal_semantic_config)
        user_info = BaseUserInfo(semantic_spec=semantic_spec, stat_info={"RKMETableSpecification": user_spec})

        # learnware market search
        hetero_market = self.test_train_market_model(learnware_num, delete=False)
        search_result = hetero_market.search_learnware(user_info)
        single_result = search_result.get_single_results()
        multiple_result = search_result.get_multiple_results()
        
        # print search results
        for single_item in single_result:
            print(f"score: {single_item.score}, learnware_id: {single_item.learnware.id}")

        for multiple_item in multiple_result:
            print(
                f"mixture_score: {multiple_item.score}, mixture_learnware_ids: {[item.id for item in multiple_item.learnwares]}"
            )

        # single model reuse
        hetero_learnware = HeteroMapAlignLearnware(single_result[0].learnware, mode="regression")
        hetero_learnware.align(user_spec, X[:100], y[:100])
        single_predict_y = hetero_learnware.predict(X)

        # multi model reuse
        hetero_learnware_list = []
        for learnware in multiple_result[0].learnwares:
            hetero_learnware = HeteroMapAlignLearnware(learnware, mode="regression")
            hetero_learnware.align(user_spec, X[:100], y[:100])
            hetero_learnware_list.append(hetero_learnware)

        # Use averaging ensemble reuser to reuse the searched learnwares to make prediction
        reuse_ensemble = AveragingReuser(learnware_list=hetero_learnware_list, mode="mean")
        ensemble_predict_y = reuse_ensemble.predict(user_data=X)

        # Use ensemble pruning reuser to reuse the searched learnwares to make prediction
        reuse_ensemble = EnsemblePruningReuser(learnware_list=hetero_learnware_list, mode="regression")
        reuse_ensemble.fit(X[:100], y[:100])
        ensemble_pruning_predict_y = reuse_ensemble.predict(user_data=X)

        print("Single model RMSE by finetune:", mean_squared_error(y, single_predict_y, squared=False))
        print("Averaging Reuser RMSE:", mean_squared_error(y, ensemble_predict_y, squared=False))
        print("Ensemble Pruning Reuser RMSE:", mean_squared_error(y, ensemble_pruning_predict_y, squared=False))


 def suite():
    _suite = unittest.TestSuite()
    #_suite.addTest(TestHeteroWorkflow("test_prepare_learnware_randomly"))
    #_suite.addTest(TestHeteroWorkflow("test_upload_delete_learnware"))
    #_suite.addTest(TestHeteroWorkflow("test_train_market_model"))
    _suite.addTest(TestHeteroWorkflow("test_search_semantics"))
    _suite.addTest(TestHeteroWorkflow("test_hetero_stat_search"))
    _suite.addTest(TestHeteroWorkflow("test_homo_stat_search"))
    _suite.addTest(TestHeteroWorkflow("test_model_reuse"))
    return _suite


 if __name__ == "__main__":
    runner = unittest.TextTestRunner(verbosity=2)
    runner.run(suite())
--- a/tests/test_workflow/test_workflow.py
+++ b/tests/test_workflow/test_workflow.py
@@ -1,37 +1,34 @@
 import sys
 import unittest
 import os
 import copy
 import joblib
 import logging
 import tempfile
 import pickle
 import zipfile
 import numpy as np
 from sklearn import svm
 from sklearn.datasets import load_digits
 from sklearn.model_selection import train_test_split
 from shutil import copyfile, rmtree

 import learnware
 learnware.init(logging_level=logging.WARNING)

 from learnware.market import instantiate_learnware_market, BaseUserInfo
 from learnware.specification import RKMETableSpecification, generate_rkme_table_spec
 from learnware.specification import RKMETableSpecification, generate_rkme_table_spec, generate_semantic_spec
 from learnware.reuse import JobSelectorReuser, AveragingReuser, EnsemblePruningReuser, FeatureAugmentReuser
 from learnware.tests.templates import LearnwareTemplate, PickleModelTemplate, StatSpecTemplate

 curr_root = os.path.dirname(os.path.abspath(__file__))

 user_semantic = {
    "Data": {"Values": ["Table"], "Type": "Class"},
    "Task": {
        "Values": ["Classification"],
        "Type": "Class",
    },
    "Library": {"Values": ["Scikit-learn"], "Type": "Class"},
    "Scenario": {"Values": ["Education"], "Type": "Tag"},
    "Description": {"Values": "", "Type": "String"},
    "Name": {"Values": "", "Type": "String"},
    "License": {"Values": ["MIT"], "Type": "Class"},
 }


 class TestWorkflow(unittest.TestCase):
    
    universal_semantic_config = {
        "data_type": "Table",
        "task_type": "Classification",
        "library_type": "Scikit-learn",
        "scenarios": "Education",
        "license": "MIT",
    }
    
    def _init_learnware_market(self):
        """initialize learnware market"""
        easy_market = instantiate_learnware_market(market_id="sklearn_digits_easy", name="easy", rebuild=True)
@@ -42,45 +39,30 @@ class TestWorkflow(unittest.TestCase):
        X, y = load_digits(return_X_y=True)

        for i in range(learnware_num):
            dir_path = os.path.join(curr_root, "learnware_pool", "svm_%d" % (i))
            os.makedirs(dir_path, exist_ok=True)

            learnware_pool_dirpath = os.path.join(curr_root, "learnware_pool")
            os.makedirs(learnware_pool_dirpath, exist_ok=True)
            learnware_zippath = os.path.join(learnware_pool_dirpath, "svm_%d.zip" % (i))
            
            print("Preparing Learnware: %d" % (i))

            data_X, _, data_y, _ = train_test_split(X, y, test_size=0.3, shuffle=True)
            clf = svm.SVC(kernel="linear", probability=True)
            clf.fit(data_X, data_y)

            joblib.dump(clf, os.path.join(dir_path, "svm.pkl"))
            pickle_filepath = os.path.join(learnware_pool_dirpath, "model.pkl")
            with open(pickle_filepath, "wb") as fout:
                pickle.dump(clf, fout)

            spec = generate_rkme_table_spec(X=data_X, gamma=0.1, cuda_idx=0)
            spec.save(os.path.join(dir_path, "svm.json"))

            init_file = os.path.join(dir_path, "__init__.py")
            copyfile(
                os.path.join(curr_root, "learnware_example/example_init.py"), init_file
            )  # cp example_init.py init_file

            yaml_file = os.path.join(dir_path, "learnware.yaml")
            copyfile(os.path.join(curr_root, "learnware_example/example.yaml"), yaml_file)  # cp example.yaml yaml_file

            env_file = os.path.join(dir_path, "environment.yaml")
            copyfile(os.path.join(curr_root, "learnware_example/environment.yaml"), env_file)

            zip_file = dir_path + ".zip"
            # zip -q -r -j zip_file dir_path
            with zipfile.ZipFile(zip_file, "w") as zip_obj:
                for foldername, subfolders, filenames in os.walk(dir_path):
                    for filename in filenames:
                        file_path = os.path.join(foldername, filename)
                        zip_info = zipfile.ZipInfo(filename)
                        zip_info.compress_type = zipfile.ZIP_STORED
                        with open(file_path, "rb") as file:
                            zip_obj.writestr(zip_info, file.read())

            rmtree(dir_path)  # rm -r dir_path

            self.zip_path_list.append(zip_file)
            spec_filepath = os.path.join(learnware_pool_dirpath, "stat_spec.json")
            spec.save(spec_filepath)
            
            LearnwareTemplate.generate_learnware_zipfile(
                learnware_zippath=learnware_zippath,
                model_template=PickleModelTemplate(pickle_filepath=pickle_filepath, model_kwargs={"input_shape":(64,), "output_shape": (10,), "predict_method": "predict_proba"}),
                stat_spec_template=StatSpecTemplate(filepath=spec_filepath, type="RKMETableSpecification"),
                requirements=["scikit-learn==0.22"],
            )
           
            self.zip_path_list.append(learnware_zippath)

    def test_upload_delete_learnware(self, learnware_num=5, delete=True):
        easy_market = self._init_learnware_market()
@@ -91,20 +73,22 @@ class TestWorkflow(unittest.TestCase):
        assert len(easy_market) == 0, f"The market should be empty!"

        for idx, zip_path in enumerate(self.zip_path_list):
            semantic_spec = copy.deepcopy(user_semantic)
            semantic_spec["Name"]["Values"] = "learnware_%d" % (idx)
            semantic_spec["Description"]["Values"] = "test_learnware_number_%d" % (idx)
            semantic_spec["Input"] = {
                "Dimension": 64,
                "Description": {
                    f"{i}": f"The value in the grid {i // 8}{i % 8} of the image of hand-written digit."
                    for i in range(64)
            semantic_spec = generate_semantic_spec(
                name=f"learnware_{idx}",
                description=f"test_learnware_number_{idx}",
                input_description={
                    "Dimension": 64,
                    "Description": {
                        f"{i}": f"The value in the grid {i // 8}{i % 8} of the image of hand-written digit."
                        for i in range(64)
                    },
                },
                output_description={
                    "Dimension": 10,
                    "Description": {f"{i}": "The probability for each digit for 0 to 9." for i in range(10)},
                },
            }
            semantic_spec["Output"] = {
                "Dimension": 10,
                "Description": {f"{i}": "The probability for each digit for 0 to 9." for i in range(10)},
            }
                **self.universal_semantic_config
            )
            easy_market.add_learnware(zip_path, semantic_spec)

        print("Total Item:", len(easy_market))
@@ -129,70 +113,52 @@ class TestWorkflow(unittest.TestCase):
        easy_market = self.test_upload_delete_learnware(learnware_num, delete=False)
        print("Total Item:", len(easy_market))
        assert len(easy_market) == self.learnware_num, f"The number of learnwares must be {self.learnware_num}!"
        test_folder = os.path.join(curr_root, "test_semantics")

        # unzip -o -q zip_path -d unzip_dir
        if os.path.exists(test_folder):
            rmtree(test_folder)
        os.makedirs(test_folder, exist_ok=True)

        with zipfile.ZipFile(self.zip_path_list[0], "r") as zip_obj:
            zip_obj.extractall(path=test_folder)

        semantic_spec = copy.deepcopy(user_semantic)
        semantic_spec["Name"]["Values"] = f"learnware_{learnware_num - 1}"
        semantic_spec["Description"]["Values"] = f"test_learnware_number_{learnware_num - 1}"

        user_info = BaseUserInfo(semantic_spec=semantic_spec)
        search_result = easy_market.search_learnware(user_info)
        single_result = search_result.get_single_results()

        print("User info:", user_info.get_semantic_spec())
        print(f"Search result:")
        for search_item in single_result:
            print(
                "Choose learnware:",
                search_item.learnware.id,
                search_item.learnware.get_specification().get_semantic_spec(),
        
        with tempfile.TemporaryDirectory(prefix="learnware_test_workflow") as test_folder:
            with zipfile.ZipFile(self.zip_path_list[0], "r") as zip_obj:
                zip_obj.extractall(path=test_folder)

            semantic_spec = generate_semantic_spec(
                name=f"learnware_{learnware_num - 1}",
                description=f"test_learnware_number_{learnware_num - 1}",
                **self.universal_semantic_config,
            )
            
            user_info = BaseUserInfo(semantic_spec=semantic_spec)
            search_result = easy_market.search_learnware(user_info)
            single_result = search_result.get_single_results()

        rmtree(test_folder)  # rm -r test_folder

            print(f"Search result:")
            for search_item in single_result:
                print("Choose learnware:",search_item.learnware.id)
      
    def test_stat_search(self, learnware_num=5):
        easy_market = self.test_upload_delete_learnware(learnware_num, delete=False)
        print("Total Item:", len(easy_market))

        test_folder = os.path.join(curr_root, "test_stat")
        with tempfile.TemporaryDirectory(prefix="learnware_test_workflow") as test_folder:
            for idx, zip_path in enumerate(self.zip_path_list):
                with zipfile.ZipFile(zip_path, "r") as zip_obj:
                    zip_obj.extractall(path=test_folder)

        for idx, zip_path in enumerate(self.zip_path_list):
            unzip_dir = os.path.join(test_folder, f"{idx}")

            # unzip -o -q zip_path -d unzip_dir
            if os.path.exists(unzip_dir):
                rmtree(unzip_dir)
            os.makedirs(unzip_dir, exist_ok=True)
            with zipfile.ZipFile(zip_path, "r") as zip_obj:
                zip_obj.extractall(path=unzip_dir)

            user_spec = RKMETableSpecification()
            user_spec.load(os.path.join(unzip_dir, "svm.json"))
            user_info = BaseUserInfo(semantic_spec=user_semantic, stat_info={"RKMETableSpecification": user_spec})
            search_results = easy_market.search_learnware(user_info)
                user_spec = RKMETableSpecification()
                user_spec.load(os.path.join(test_folder, "stat_spec.json"))
                user_semantic = generate_semantic_spec(**self.universal_semantic_config)
                user_info = BaseUserInfo(semantic_spec=user_semantic, stat_info={"RKMETableSpecification": user_spec})
                search_results = easy_market.search_learnware(user_info)

            single_result = search_results.get_single_results()
            multiple_result = search_results.get_multiple_results()

            assert len(single_result) >= 1, f"Statistical search failed!"
            print(f"search result of user{idx}:")
            for search_item in single_result:
                print(f"score: {search_item.score}, learnware_id: {search_item.learnware.id}")
                single_result = search_results.get_single_results()
                multiple_result = search_results.get_multiple_results()

            for mixture_item in multiple_result:
                print(f"mixture_score: {mixture_item.score}\n")
                mixture_id = " ".join([learnware.id for learnware in mixture_item.learnwares])
                print(f"mixture_learnware: {mixture_id}\n")
                assert len(single_result) >= 1, f"Statistical search failed!"
                print(f"search result of user{idx}:")
                for search_item in single_result:
                    print(f"score: {search_item.score}, learnware_id: {search_item.learnware.id}")

        rmtree(test_folder)  # rm -r test_folder
                for mixture_item in multiple_result:
                    print(f"mixture_score: {mixture_item.score}\n")
                    mixture_id = " ".join([learnware.id for learnware in mixture_item.learnwares])
                    print(f"mixture_learnware: {mixture_id}\n")

    def test_learnware_reuse(self, learnware_num=5):
        easy_market = self.test_upload_delete_learnware(learnware_num, delete=False)
@@ -202,6 +168,7 @@ class TestWorkflow(unittest.TestCase):
        train_X, data_X, train_y, data_y = train_test_split(X, y, test_size=0.3, shuffle=True)

        stat_spec = generate_rkme_table_spec(X=data_X, gamma=0.1, cuda_idx=0)
        user_semantic = generate_semantic_spec(**self.universal_semantic_config)
        user_info = BaseUserInfo(semantic_spec=user_semantic, stat_info={"RKMETableSpecification": stat_spec})

        search_results = easy_market.search_learnware(user_info)
@@ -243,5 +210,5 @@ def suite():


 if __name__ == "__main__":
    runner = unittest.TextTestRunner()
    runner = unittest.TextTestRunner(verbosity=2)
    runner.run(suite())