Browse Source

Merge pull request #20 from LAMDA-NJU/client_load

[ENH]  1. add load learnware in client. 2. add doc for client
tags/v0.3.2
zouxiaochuan GitHub 2 years ago
parent
commit
e94bbc71ba
No known key found for this signature in database GPG Key ID: 4AEE18F83AFDEB23
6 changed files with 519 additions and 14 deletions
  1. +1
    -0
      docs/index.rst
  2. +176
    -0
      docs/start/client.rst
  3. +1
    -1
      learnware/client/__init__.py
  4. +156
    -12
      learnware/client/learnware_client.py
  5. +184
    -0
      learnware/client/package_utils.py
  6. +1
    -1
      learnware/config.py

+ 1
- 0
docs/index.rst View File

@@ -25,6 +25,7 @@ Document Structure

Introduction <start/intro.rst>
Quick Start <start/quick.rst>
Use Api <start/client.rst>
Installation <start/install.rst>
Experiments and Examples <start/performance.rst>



+ 176
- 0
docs/start/client.rst View File

@@ -0,0 +1,176 @@
============================================================
Learnware Client
============================================================


Introduction
====================

``Learnware Client`` is a python api that provides a convenient interface for interacting with the official market. You can easily use the client to upload, download and search learnwares.


Installation
====================

``Learnware Client`` is contained in the ``learnware`` package. You can install it using pip:

.. code-block:: bash

pip install learnware


Prepare access token
====================

Before using the ``Learnware Client``, you'll need to obtain a token from the `official website <https://www.lamda.nju.edu.cn/learnware/>`_. Just login to the website and click "client token" tab in the user center.


Use Client
============================


Initialize a Learware Client
-------------------------------


.. code-block:: python
import learnware
from learnware.client import LearnwareClient

client = LearnwareClient()

# login to official market
client.login(email="your email", token="your token")


Upload Leanware
-------------------------------

Before uploading a learnware, you'll need to prepare the semantic specification of your learnware. You can create a semantic specification by a helper function ``create_semantic_specification``.

.. code-block:: python

input_description = {
"Dimension": 16,
{
"Description": {
"0": "gender",
"1": "age",
"2": "f2",
"5": "f5"
}
}
}
output_description = {
"Dimension": 3,
"Description": {
"0": "the probability of being a cat",
"1": "the probability of being a dog",
"2": "the probability of being a bird"
}
}
semantic_spec = client.create_semantic_specification(
name="mylearnware1",
description="this is my learnware",
data_type="Table",
task_type="Classification",
library_type="Scikit-learn",
senarioes=["Business", "Financial"],
input_description, output_description)
# data_type, task_type, library_type, senarioes are enums, you can find possible values in `learnware.C`

After defining the semantic specification,
you can upload your learnware using ``upload_learnware`` function:
.. code-block:: python
learnware_id = client.upload_learnware(
semantic_spec=semantic_spec,
zip_path="path to your learnware zipfile")

Here, ``zip_path`` is the local path of your learnware zipfile.


Semantic Specification Search
-------------------------------

You can search learnwares in official market using semantic specification. All the learnwares that match the semantic specification will be returned by the api. For example, the code below searches learnwares with `Table` data type:

.. code-block:: python

semantic_spec = client.create_semantic_specification(
name="",
description="",
data_type="Table",
task_type="",
library_type="",
senarioes=[],
input_description={}, output_description={})
specification = learnware.specification.Specification()
specification.update_semantic_spec(specification)
learnware_list = client.search_learnware(specification)

Statistical Specification Search
---------------------------------

You can search learnware by providing a statistical specification. The statistical specification is a json file that contains the statistical information of your training data. For example, the code below searches learnwares with `RKMEStatSpecification`:

.. code-block:: python

import learnware.specification as specification

user_spec = specification.rkme.RKMEStatSpecification()
user_spec.load(os.path.join(unzip_path, "rkme.json"))
specification = learnware.specification.Specification()
specification.update_stat_spec(user_spec)

learnware_list = client.search_learnware(specification)

# you can view the scores of the searched learnwares
for learnware in learnware_list:
print(f'learnware_id: {learnware["learnware_id"]}, score: {learnware["matching"]}')


Combine Semantic and Statistical Search
----------------------------------------
You can provide both semantic and statistical specification to search learnwares. The engine will first filter learnwares by semantic specification and then search by statistical specification. For example, the code below searches learnwares with `Table` data type and `RKMEStatSpecification`:

.. code-block:: python

semantic_spec = client.create_semantic_specification(
name="",
description="",
data_type="Table",
task_type="",
library_type="",
senarioes=[],
input_description={}, output_description={})

stat_spec = specification.rkme.RKMEStatSpecification()
stat_spec.load(os.path.join(unzip_path, "rkme.json"))
specification = learnware.specification.Specification()
specification.update_semantic_spec(semantic_spec)
specification.update_stat_spec(stat_spec)

learnware_list = client.search_learnware(specification)


Download and Use Learnware
-------------------------------
When you get a learnware id, you can download and initiate the learnware with the following code:

.. code-block:: python

client.download_learnware(learnware_id, zip_path)
client.install_environment(zip_path)
learnware = client.load_learnware(zip_path)
# you can use the learnware to make prediction now





+ 1
- 1
learnware/client/__init__.py View File

@@ -1,2 +1,2 @@

from .learnware_client import LearnwareClient
from .learnware_client import LearnwareClient, SemanticSpecificationKey

+ 156
- 12
learnware/client/learnware_client.py View File

@@ -1,11 +1,17 @@
from ..specification import Specification
from ..config import C
from .. import learnware
from ..market.easy import EasyMarket
from . import package_utils
import requests
import json
from tqdm import tqdm
import hashlib
import os
import tempfile
import zipfile
import yaml
from enum import Enum


CHUNK_SIZE = 1024 * 1024
@@ -39,6 +45,13 @@ def compute_file_hash(file_path):
return file_hash.hexdigest()


class SemanticSpecificationKey(Enum):
DATA_TYPE = "Data"
TASK_TYPE = "Task"
LIBRARY_TYPE = "Library"
SENARIOES = "Scenario"
pass

class LearnwareClient:
def __init__(self, host=None):
self.headers = None
@@ -52,14 +65,10 @@ class LearnwareClient:
self.chunk_size = 1024 * 1024
pass

def login(self, email, password, hash_password=True):
url = f"{self.host}/auth/login"

if hash_password:
password = hashlib.md5(password.encode()).hexdigest()
pass
def login(self, email, token):
url = f"{self.host}/auth/login_by_token"
response = requests.post(url, json={'email': email, 'password': password})
response = requests.post(url, json={'email': email, 'token': token})

result = response.json()
if result['code'] != 0:
@@ -84,7 +93,7 @@ class LearnwareClient:
def upload_learnware(self, semantic_specification, learnware_file):
file_hash = compute_file_hash(learnware_file)

url_upload = f"{self.host}/storage/chunked_upload"
url_upload = f"{self.host}/user/chunked_upload"

num_chunks = os.path.getsize(learnware_file) // CHUNK_SIZE + 1
bar = tqdm(total=num_chunks, desc="Uploading", unit="MB")
@@ -107,7 +116,7 @@ class LearnwareClient:
pass
bar.close()
url_add = f"{self.host}/storage/add_learnware_uploaded"
url_add = f"{self.host}/user/add_learnware_uploaded"

response = requests.post(url_add, json={
"file_hash": file_hash,
@@ -159,13 +168,13 @@ class LearnwareClient:
return learnware_list

@require_login
def search_learnware(self, specification: Specification):
def search_learnware(self, specification: Specification, page_size=10, page_index=0):
url = f"{self.host}/engine/search_learnware"

stat_spec = specification.get_stat_spec()
if len(stat_spec) > 1:
raise Exception("statistical specification must have only one key.")
if len(stat_spec) == 1:
stat_spec = list(stat_spec.values())[0]
else:
@@ -195,7 +204,7 @@ class LearnwareClient:

response = requests.post(
url, files=files,
data={"semantic_specification": json.dumps(specification.get_semantic_spec())},
data={"semantic_specification": json.dumps(specification.get_semantic_spec()), "limit": page_size, "page": page_index},
headers=self.headers)
result = response.json()
@@ -226,4 +235,139 @@ class LearnwareClient:
if result['code'] != 0:
raise Exception('delete failed: ' + json.dumps(result))
pass

def check_learnware(self, path, semantic_specification):
if os.path.isfile(path):
with tempfile.TemporaryDirectory() as tempdir:
with zipfile.ZipFile(path, "r") as z_file:
z_file.extractall(tempdir)
pass
return self.check_learnware_folder(tempdir, semantic_specification)
pass
else:
return self.check_learnware_folder(path, semantic_specification)
pass
pass

def check_learnware_folder(self, folder, semantic_specification):
learnware_obj = learnware.get_learnware_from_dirpath('test_id', semantic_specification, folder)

check_result = EasyMarket.check_learnware(learnware_obj)
if check_result == EasyMarket.USABLE_LEARWARE:
return True
else:
return False
pass

def create_semantic_specification(
self, name, description, data_type, task_type, library_type, senarioes, input_description,
output_description):
semantic_specification = dict()
semantic_specification["Input"] = input_description
semantic_specification["Output"] = output_description
semantic_specification["Data"] = {"Type": "Class", "Values": [data_type]}
semantic_specification["Task"] = {"Type": "Class", "Values": [task_type]}
semantic_specification["Library"] = {"Type": "Class", "Values": [library_type]}
semantic_specification["Scenario"] = {"Type": "Tag", "Values": senarioes}
semantic_specification["Name"] = {"Type": "String", "Values": name}
semantic_specification["Description"] = {"Type": "String", "Values": description}
return semantic_specification
def list_semantic_specification_values(self, key: SemanticSpecificationKey):
url = f"{self.host}/engine/semantic_specification"
response = requests.get(url, headers=self.headers)
result = response.json()
semantic_conf = result['data']['semantic_specification']

return semantic_conf[key.value]['Values']

def load_learnware(self, learnware_file: str, load_model: bool=True):
with tempfile.TemporaryDirectory(prefix='learnware_') as tempdir:
with zipfile.ZipFile(learnware_file, "r") as z_file:
z_file.extractall(tempdir)
pass

yaml_file = C.learnware_folder_config["yaml_file"]

with open(os.path.join(tempdir, yaml_file), "r") as fin:
learnware_info = yaml.safe_load(fin)
pass

learnware_id = learnware_info.get('id')
if learnware_id is None:
learnware_id = "test_id"
pass

semantic_specification = learnware_info.get('semantic_specification')
if semantic_specification is None:
semantic_specification = {}
pass
else:
semantic_file = semantic_specification.get('file_name')

with open(os.path.join(tempdir, semantic_file), "r") as fin:
semantic_specification = json.load(fin)
pass
pass

learnware_obj = learnware.get_learnware_from_dirpath(learnware_id, semantic_specification, tempdir)

if load_model:
learnware_obj.instantiate_model()
pass

return learnware_obj
pass
pass

def system(self, command):
retcd = os.system(command)
if retcd != 0:
raise RuntimeError(f"Command {command} failed with return code {retcd}")
pass


def install_environment(self, zip_path, conda_env=None):
'''install environment of a learnware

@param: zip_path: path of the learnware zip file
@param: conda_env: if it is not None, a new conda environment will be created with the given name
if it is None, use current environment
'''
with tempfile.TemporaryDirectory(prefix='learnware_') as tempdir:
with zipfile.ZipFile(zip_path, "r") as z_file:
print(z_file.namelist)
if 'environment.yaml' in z_file.namelist():
z_file.extract('environment.yaml', tempdir)
yaml_path = os.path.join(tempdir, 'environment.yaml')
yaml_path_filter = os.path.join(tempdir, 'environment_filter.yaml')
package_utils.filter_nonexist_conda_packages_file(yaml_path, yaml_path_filter)
# create environment
if conda_env is not None:
self.system(f'conda env update --name {conda_env} --file {yaml_path_filter}')
pass
else:
self.system(f'conda env update --file {yaml_path_filter}')
pass
pass
elif 'requirements.txt' in z_file.namelist():
z_file.extract('requirements.txt', tempdir)
requirements_path = os.path.join(tempdir, 'requirements.txt')
requirements_path_filter = os.path.join(tempdir, 'requirements_filter.txt')
package_utils.filter_nonexist_pip_packages_file(requirements_path, requirements_path_filter)

if conda_env is not None:
self.system(f'conda create --name {conda_env}')
self.system(f'conda run --no-capture-output python3 -m pip install -r {requirements_path_filter}')
else:
self.system(f'python3 -m pip install -r {requirements_path_filter}')
pass
pass
else:
raise Exception("environment.yaml or requirements.txt not found in the learnware zip file.")
pass
pass
pass
pass

+ 184
- 0
learnware/client/package_utils.py View File

@@ -0,0 +1,184 @@
from typing import List, Tuple
import subprocess
import yaml
import os
import time


def try_to_run(args, timeout=5, retry=5):
sucess = False
for i in range(retry):
try:
subprocess.check_call(args=args, timeout=timeout)
sucess = True
break
except subprocess.TimeoutExpired as e:
pass
pass

if not sucess:
raise subprocess.TimeoutExpired(args, timeout)
pass

def parse_pip_requirement(line: str):
'''parse pip requirement line to package name
'''
line = line.strip()

if len(line) == 0:
return None

if line[0] in ('#', '-'):
return None

package_str = line
for split_ch in ('=', '>', '<', '!', '~', ' '):
split_ch_index = package_str.find(split_ch)
if split_ch_index != -1:
package_str = package_str[:split_ch_index]
pass
pass

return package_str

def read_pip_packages_from_requirements(requirements_file: str) -> List[str]:
'''read requiremnts.txt and parse it to list
'''

packages = []
lines = []
with open(requirements_file, 'r') as fin:
for line in fin:
package_str = parse_pip_requirement(line)
packages.append(package_str)
lines.append(line)
pass

return packages, lines


def filter_nonexist_pip_packages(packages: list) -> Tuple[List[str], List[str]]:
'''filter non-exist pip requirements

Returns:
exist_packages: list of exist packages
nonexist_packages: list of non-exist packages
'''

exist_packages = []
nonexist_packages = []
for package in packages:
try:
# os.system("python3 -m pip index versions {0}".format(package))
print('check package existence: {0}'.format(package))
try_to_run(args=["python3", "-m", "pip", "index", "versions", package], timeout=5)
exist_packages.append(package)
except Exception as e:
print(e)
nonexist_packages.append(package)
pass
pass

return exist_packages, nonexist_packages


def filter_nonexist_conda_packages(packages: list) -> Tuple[List[str], List[str]]:
'''filter non-exist conda requirements

Returns:
exist_packages: list of exist packages
nonexist_packages: list of non-exist packages
'''

exist_packages = []
nonexist_packages = []
for package in packages:
try:
try_to_run(args=["conda", "search", package], timeout=5)
exist_packages.append(package)
except Exception as e:
nonexist_packages.append(package)
pass
pass

return exist_packages, nonexist_packages


def read_conda_packages_from_dict(
env_desc: dict) -> Tuple[List[str], List[str]]:
'''

:param env_desc: dict of environment description

:return conda packages: list of conda packages
:return pip packages: list of pip packages
'''

conda_packages = env_desc.get('dependencies')
if conda_packages is None:
conda_packages = []
pip_packages = []
pass
else:
pip_packages = []
conda_packages_ = []
for package in conda_packages:
if isinstance(package, dict) and 'pip' in package:
pip_packages = package['pip']
pip_packages = [parse_pip_requirement(line) for line in pip_packages]
pass
elif isinstance(package, str):
conda_packages_.append(package)
pass
pass

conda_packages = conda_packages_
pass

return conda_packages, pip_packages
pass


def filter_nonexist_conda_packages_file(yaml_file: str, output_yaml_file: str):
with open(yaml_file, 'r') as fin:
env_desc = yaml.safe_load(fin)
pass

conda_packages, pip_packages = read_conda_packages_from_dict(env_desc)

conda_packages, nonexist_conda_packages = filter_nonexist_conda_packages(conda_packages)
pip_packages, nonexist_pip_packages = filter_nonexist_pip_packages(pip_packages)

env_desc['dependencies'] = conda_packages
if len(pip_packages) > 0:
env_desc['dependencies'].append({'pip': pip_packages})
pass

with open(output_yaml_file, 'w') as fout:
yaml.safe_dump(env_desc, fout)
pass

return conda_packages, pip_packages, nonexist_conda_packages, nonexist_pip_packages
pass


def filter_nonexist_pip_packages_file(requirements_file: str, output_file: str):
packages, lines = read_pip_packages_from_requirements(requirements_file)

exist_packages, nonexist_packages = filter_nonexist_pip_packages(packages)

exist_packages = set(exist_packages)

with open(output_file, 'w') as fout:
for package, line in zip(packages, lines):
if package is not None and package in exist_packages:
fout.write(line + '\n')
pass
pass
pass
pass

print(f"exist packages: {packages}")
return exist_packages, nonexist_packages
pass

+ 1
- 1
learnware/config.py View File

@@ -140,7 +140,7 @@ _DEFAULT_CONFIG = {
},
"database_url": f"sqlite:///{DATABASE_PATH}",
"max_reduced_set_size": 1310720,
"backend_host": "http://36.111.128.21:30008"
"backend_host": "http://www.lamda.nju.edu.cn/learnware/api"
}

C = Config(_DEFAULT_CONFIG)

Loading…
Cancel
Save