[DOC, MNT] update text example name, and add README for text example

2 years ago · 7b02f4667a
--- a/docs/start/exp.rst
+++ b/docs/start/exp.rst
@@ -133,13 +133,13 @@ Model training comprised two parts: the first part involved training a tfidf fea

 Our experiments comprises two components:

 * ``test_unlabeled`` is designed to evaluate performance when users possess only testing data, searching and reusing learnware available in the market.
 * ``test_labeled`` aims to assess performance when users have both testing and limited training data, searching and reusing learnware directly from the market instead of training a model from scratch. This helps determine the amount of training data saved for the user.
 * ``unlabeled_text_example`` is designed to evaluate performance when users possess only testing data, searching and reusing learnware available in the market.
 * ``labeled_text_example`` aims to assess performance when users have both testing and limited training data, searching and reusing learnware directly from the market instead of training a model from scratch. This helps determine the amount of training data saved for the user.

 Results
 ----------------

 * ``test_unlabeled``:
 * ``unlabeled_text_example``:

 The accuracy of search and reuse is presented in the table below:

@@ -149,7 +149,7 @@ The accuracy of search and reuse is presented in the table below:
  0.859 +/- 0.051          0.844 +/- 0.053                    0.858 +/- 0.051
 ==================== ================================= =================================

 * ``test_labeled``:
 * ``labeled_text_example``:

 We present the change curves in classification error rates for both the user's self-trained model and the multiple learnware reuse(EnsemblePrune), showcasing their performance on the user's test data as the user's training data increases. The average results across 10 users are depicted below:

@@ -192,6 +192,6 @@ Examples for the `20-newsgroup` dataset are available at [examples/dataset_text_
 We utilize the `fire` module to construct our experiments. You can execute the experiment with the following commands:

 * `python main.py prepare_market`: Prepares the market.
 * `python main.py test_unlabeled`: Executes the test_unlabeled experiment; the results will be printed in the terminal.
 * `python main.py test_labeled`: Executes the test_labeled experiment; result curves will be automatically saved in the `figs` directory.
 * Additionally, you can use `python main.py test_unlabeled True` to combine steps 1 and 2. The same approach applies to running test_labeled directly.
 * `python main.py unlabeled_text_example`: Executes the unlabeled_text_example experiment; the results will be printed in the terminal.
 * `python main.py labeled_text_example`: Executes the labeled_text_example experiment; result curves will be automatically saved in the `figs` directory.
 * Additionally, you can use `python main.py unlabeled_text_example True` to combine steps 1 and 2. The same approach applies to running labeled_text_example directly.
--- a/examples/dataset_text_workflow/README.md
+++ b/examples/dataset_text_workflow/README.md
@@ -0,0 +1,55 @@
 # Text Dataset Workflow Example

 ## Introduction

 We conducted experiments on the widely used text benchmark dataset: `20-newsgroup <http://qwone.com/~jason/20Newsgroups/>`_.
 20-newsgroup is a renowned text classification benchmark with a hierarchical structure, featuring 5 superclasses {comp, rec, sci, talk, misc}.

 In the submitting stage, we enumerated all combinations of three superclasses from the five available, randomly sampling 50% of each combination from the training set to create datasets for 50 uploaders.

 In the deploying stage, we considered all combinations of two superclasses out of the five, selecting all data for each combination from the testing set as a test dataset for one user. This resulted in 10 users.
 The user's own training data was generated using the same sampling procedure as the user test data, despite originating from the training dataset.

 Model training comprised two parts: the first part involved training a tfidf feature extractor, and the second part used the extracted text feature vectors to train a naive Bayes classifier.

 Our experiments comprises two components:

 * ``unlabeled_text_example`` is designed to evaluate performance when users possess only testing data, searching and reusing learnware available in the market.

 * ``labeled_text_example`` aims to assess performance when users have both testing and limited training data, searching and reusing learnware directly from the market instead of training a model from scratch. This helps determine the amount of training data saved for the user.


 ## Run the code

 Run the following command to start the ``unlabeled_text_example`.

 ```bash
 python workflow.py unlabeled_text_example
 ```

 Run the following command to start the ``labeled_text_example`.

 ```bash
 python workflow.py labeled_text_example
 ```

 ## Results

 ### ``unlabeled_text_example``:

 The accuracy of search and reuse is presented in the table below:

 | Top-1 Performance   | Job Selector Reuse  | Average Ensemble Reuse |
 |---------------------|----------------------|-------------------------|
 | 0.859 +/- 0.051     | 0.844 +/- 0.053      | 0.858 +/- 0.051         |


 ### ``labeled_text_example``:

 We present the change curves in classification error rates for both the user's self-trained model and the multiple learnware reuse(EnsemblePrune), showcasing their performance on the user's test data as the user's training data increases. The average results across 10 users are depicted below:

 <div style="text-align:center;">
  <img src="../../docs/_static/img/text_example_labeled_curves.png" alt="Text Limited Labeled Data" style="width:50%;" />
 </div>

 From the figure above, it is evident that when the user's own training data is limited, the performance of multiple learnware reuse surpasses that of the user's own model. As the user's training data grows, it is expected that the user's model will eventually outperform the learnware reuse. This underscores the value of reusing learnware to significantly conserve training data and achieve superior performance when user training data is limited.
--- a/examples/dataset_text_workflow/workflow.py
+++ b/examples/dataset_text_workflow/workflow.py
@@ -92,7 +92,7 @@ class TextDatasetWorkflow:

        logger.info("Total Item: %d" % (len(self.text_market)))

    def test_unlabeled(self, rebuild=False):
    def unlabeled_text_example(self, rebuild=False):
        self._prepare_market(rebuild)

        select_list = []
@@ -183,7 +183,7 @@ class TextDatasetWorkflow:
            % (np.mean(ensemble_score_list), np.std(ensemble_score_list))
        )

    def test_labeled(self, rebuild=False, train_flag=True):
    def labeled_text_example(self, rebuild=False, train_flag=True):
        self.n_labeled_list = [100, 200, 500, 1000, 2000, 4000]
        self.repeated_list = [10, 10, 10, 3, 3, 3]
        self.root_path = os.path.dirname(os.path.abspath(__file__))