| @@ -0,0 +1,134 @@ | |||
| # Content | |||
| <!-- TOC --> | |||
| - [Overview](#overview) | |||
| - [Model Architecture](#model-architecture) | |||
| - [Dataset](#dataset) | |||
| - [Environment Requirements](#environment-requirements) | |||
| - [Quick Start](#quick-start) | |||
| - [Script Detailed Description](#script-detailed-description) | |||
| <!-- /TOC --> | |||
| # Overview | |||
| This folder holds code for Training-on-Device of a LeNet model. Part of the code runs on a server using MindSpore infrastructure, another part uses MindSpore Lite conversion utility, and the last part is the actual training of the model on some android-based device. | |||
| # Model Architecture | |||
| LeNet is a very simple network which is composed of only 5 layers, 2 of which are convolutional layers and the remaining 3 are fully connected layers. Such a small network can be fully trained (from scratch) on a device in a short time. Therefore, it is a good example. | |||
| # Dataset | |||
| In this example we use the MNIST dataset of handwritten digits as published in [THE MNIST DATABASE](<http://yann.lecun.com/exdb/mnist/>) | |||
| - Dataset size:52.4M,60,000 28*28 in 10 classes | |||
| - Test:10,000 images | |||
| - Train:60,000 images | |||
| - Data format:binary files | |||
| - Note:Data will be processed in dataset.cc | |||
| - The dataset directory structure is as follows: | |||
| ```python | |||
| mnist/ | |||
| ├── test | |||
| │ ├── t10k-images-idx3-ubyte | |||
| │ └── t10k-labels-idx1-ubyte | |||
| └── train | |||
| ├── train-images-idx3-ubyte | |||
| └── train-labels-idx1-ubyte | |||
| ``` | |||
| # Environment Requirements | |||
| - Server side | |||
| - [MindSpore Framework](https://www.mindspore.cn/install/en): it is recommended to install a docker image | |||
| - [MindSpore ToD Framework](https://www.mindspore.cn/tutorial/tod/en/use/prparation.html) | |||
| - [Android NDK r20b](https://dl.google.com/android/repository/android-ndk-r20b-linux-x86_64.zip) | |||
| - [Android SDK](https://developer.android.com/studio?hl=zh-cn#cmdline-tools) | |||
| - A connected Android device | |||
| # Quick Start | |||
| After installing all the above mentioned, the script in the home directory could be run with the following arguments: | |||
| ```python | |||
| sh ./prepare_and_run.sh DATASET_PATH [MINDSPORE_DOCKER] [RELEASE.tar.gz] | |||
| ``` | |||
| where: | |||
| - DATASET_PATH is the path to the [dataset](#dataset), | |||
| - MINDSPORE_DOCKER is the image name of the docker that runs [MindSpore](#environment-requirements). If not provided MindSpore will be run locally | |||
| - and REALEASE.tar.gz is a pointer to the MindSpore ToD release tar ball. If not provided, the script will attempt to find MindSpore ToD compilation output. | |||
| # Script Detailed Description | |||
| The provided `prepare_and_run.sh` script is performing the followings: | |||
| - Prepare the trainable lenet model in a `.ms` format | |||
| - Prepare the folder that should be pushed into the device | |||
| - Copy this folder into the device and run the scripts on the device | |||
| See how to run the script and paramaters definitions in the [Quick Start Section](#quick-start) | |||
| ## Preparing the model | |||
| Within the model folder a `prepare_model.sh` script uses MindSpore infrastructure to export the model into a `.mindir` file. The user can specify a docker image on which MindSpore is installed. Otherwise, the pyhton script will be run locally. | |||
| The script then converts the `.mindir` to a `.ms` format using the MindSpore ToD converter. | |||
| The script accepts a tar ball where the converter resides. Otherwise, the script will attempt to find the converter in the MindSpore ToD build output directory. | |||
| ## Preparing the Folder | |||
| The `lenet_tod.ms` model file is then copied into the `package` folder as well as scripts, the MindSpore ToD library and the MNIST dataset. | |||
| Finally, the code (in src) is compiled for arm64 and the binary is copied into the `package` folder. | |||
| ### Running the code on the device | |||
| To run the code on the device the script first uses `adb` tool to push the `package` folder into the device. It then runs training (which takes some time) and finally runs evaluation of the trained model using the test data. | |||
| # Folder Directory tree | |||
| ``` python | |||
| train_lenet/ | |||
| ├── Makefile # Makefile of src code | |||
| ├── model | |||
| │ ├── lenet_export.py # Python script that exports the LeNet model to .mindir | |||
| │ ├── prepare_model.sh # script that export model (using docker) then converts it | |||
| │ └── train_utils.py # utility function used during the export | |||
| ├── prepare_and_run.sh # main script that creates model, compiles it and send to device for running | |||
| ├── README.md # this manual | |||
| ├── scripts | |||
| │ ├── eval.sh # on-device script that load the train model and evaluates its accuracy | |||
| │ ├── run_eval.sh # adb script that launches eval.sh | |||
| │ ├── run_train.sh # adb script that launches train.sh | |||
| │ └── train.sh # on-device script that load the initial model and train it | |||
| ├── src | |||
| │ ├── dataset.cc # dataset handler | |||
| │ ├── dataset.h # dataset class header | |||
| │ ├── net_runner.cc # program that runs training/evaluation of models | |||
| │ └── net_runner.h # net_runner header | |||
| ``` | |||
| When the `prepare_and_run.sh` script is run, the following folder is prepared. It is pushed to the device and then training runs | |||
| ``` python | |||
| ├── package | |||
| │ ├── bin | |||
| │ │ └── net_runner # the executable that performs the training/evaluation | |||
| │ ├── dataset | |||
| │ │ ├── test | |||
| │ │ │ ├── t10k-images-idx3-ubyte # test images | |||
| │ │ │ └── t10k-labels-idx1-ubyte # test labels | |||
| │ │ └── train | |||
| │ │ ├── train-images-idx3-ubyte # train images | |||
| │ │ └── train-labels-idx1-ubyte # train labels | |||
| │ ├── eval.sh # on-device script that load the train model and evaluates its accuracy | |||
| │ ├── lib | |||
| │ │ └── libmindspore-lite.so # MindSpore Lite library | |||
| │ ├── model | |||
| │ │ └── lenet_tod.ms # model to train | |||
| │ └── train.sh # on-device script that load the initial model and train it | |||
| ``` | |||
| @@ -0,0 +1,37 @@ | |||
| # Copyright 2020 Huawei Technologies Co., Ltd | |||
| # | |||
| # Licensed under the Apache License, Version 2.0 (the "License"); | |||
| # you may not use this file except in compliance with the License. | |||
| # You may obtain a copy of the License at | |||
| # | |||
| # http://www.apache.org/licenses/LICENSE-2.0 | |||
| # | |||
| # Unless required by applicable law or agreed to in writing, software | |||
| # distributed under the License is distributed on an "AS IS" BASIS, | |||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | |||
| # See the License for the specific language governing permissions and | |||
| # limitations under the License. | |||
| # ============================================================================ | |||
| """lenet_export.""" | |||
| import sys | |||
| from mindspore import context, Tensor | |||
| import mindspore.common.dtype as mstype | |||
| from mindspore.train.serialization import export | |||
| from lenet import LeNet5 | |||
| import numpy as np | |||
| from train_utils import TrainWrap | |||
| sys.path.append('../../../cv/lenet/src/') | |||
| n = LeNet5() | |||
| n.set_train() | |||
| context.set_context(mode=context.PYNATIVE_MODE, device_target="GPU", save_graphs=False) | |||
| batch_size = 32 | |||
| x = Tensor(np.ones((batch_size, 1, 32, 32)), mstype.float32) | |||
| label = Tensor(np.zeros([batch_size, 10]).astype(np.float32)) | |||
| net = TrainWrap(n) | |||
| export(net, x, label, file_name="lenet_tod.mindir", file_format='MINDIR') | |||
| print("finished exporting") | |||
| @@ -0,0 +1,24 @@ | |||
| CONVERTER="../../../../../mindspore/lite/build/tools/converter/converter_lite" | |||
| if [ ! -f "$CONVERTER" ]; then | |||
| if ! command -v converter_lite &> /dev/null | |||
| then | |||
| echo "converter_lite could not be found in MindSpore build directory nor in system path" | |||
| exit | |||
| else | |||
| CONVERTER=converter_lite | |||
| fi | |||
| fi | |||
| echo "============Exporting==========" | |||
| if [ -n "$1" ]; then | |||
| DOCKER_IMG=$1 | |||
| docker run -w $PWD --runtime=nvidia -v /home/$USER:/home/$USER --privileged=true ${DOCKER_IMG} /bin/bash -c "python lenet_export.py; chmod 444 lenet_tod.mindir; rm -rf __pycache__" | |||
| else | |||
| echo "MindSpore docker was not provided, attempting to run locally" | |||
| python lenet_export.py | |||
| fi | |||
| echo "============Converting=========" | |||
| $CONVERTER --fmk=MINDIR --trainModel=true --modelFile=lenet_tod.mindir --outputFile=lenet_tod | |||
| @@ -0,0 +1,34 @@ | |||
| # Copyright 2020 Huawei Technologies Co., Ltd | |||
| # | |||
| # Licensed under the Apache License, Version 2.0 (the "License"); | |||
| # you may not use this file except in compliance with the License. | |||
| # You may obtain a copy of the License at | |||
| # | |||
| # http://www.apache.org/licenses/LICENSE-2.0 | |||
| # | |||
| # Unless required by applicable law or agreed to in writing, software | |||
| # distributed under the License is distributed on an "AS IS" BASIS, | |||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | |||
| # See the License for the specific language governing permissions and | |||
| # limitations under the License. | |||
| # ============================================================================ | |||
| """train_utils.""" | |||
| import mindspore.nn as nn | |||
| from mindspore.common.parameter import ParameterTuple | |||
| def TrainWrap(net, loss_fn=None, optimizer=None, weights=None): | |||
| """ | |||
| TrainWrap | |||
| """ | |||
| if loss_fn is None: | |||
| loss_fn = nn.SoftmaxCrossEntropyWithLogits() | |||
| loss_net = nn.WithLossCell(net, loss_fn) | |||
| loss_net.set_train() | |||
| if weights is None: | |||
| weights = ParameterTuple(net.trainable_params()) | |||
| if optimizer is None: | |||
| optimizer = nn.Adam(weights, learning_rate=1e-3, beta1=0.9, beta2=0.999, eps=1e-8, use_locking=False, | |||
| use_nesterov=False, weight_decay=0.0, loss_scale=1.0) | |||
| train_net = nn.TrainOneStepCell(loss_net, optimizer) | |||
| return train_net | |||
| @@ -0,0 +1,82 @@ | |||
| #!/bin/bash | |||
| display_usage() { | |||
| echo -e "\nUsage: prepare_and_run.sh dataset_path [mindspore_docker] [release.tar.gz]\n" | |||
| } | |||
| if [ -n "$1" ]; then | |||
| MNIST_DATA_PATH=$1 | |||
| else | |||
| echo "MNIST Dataset directory path was not provided" | |||
| display_usage | |||
| exit 0 | |||
| fi | |||
| if [ -n "$2" ]; then | |||
| DOCKER=$2 | |||
| else | |||
| DOCKER="" | |||
| #echo "MindSpore docker was not provided" | |||
| #display_usage | |||
| #exit 0 | |||
| fi | |||
| if [ -n "$3" ]; then | |||
| TARBALL=$3 | |||
| else | |||
| if [ -f ../../../../output/mindspore-lite-*-runtime-arm64-cpu-train.tar.gz ]; then | |||
| TARBALL="../../../../output/mindspore-lite-*-runtime-arm64-cpu-train.tar.gz" | |||
| else | |||
| echo "release.tar.gz was not found" | |||
| display_usage | |||
| exit 0 | |||
| fi | |||
| fi | |||
| # Prepare the model | |||
| cd model/ | |||
| rm -f *.ms | |||
| ./prepare_model.sh $DOCKER | |||
| cd - | |||
| # Copy the .ms model to the package folder | |||
| rm -rf package | |||
| mkdir -p package/model | |||
| cp model/*.ms package/model | |||
| # Copy the running script to the package | |||
| cp scripts/train.sh package/ | |||
| cp scripts/eval.sh package/ | |||
| # Copy the shared MindSpore ToD library | |||
| tar -xzvf ${TARBALL} --wildcards --no-anchored libmindspore-lite.so | |||
| tar -xzvf ${TARBALL} --wildcards --no-anchored include | |||
| mv mindspore-*/lib package/ | |||
| mkdir msl | |||
| mv mindspore-*/* msl/ | |||
| rm -rf mindspore-* | |||
| # Copy the dataset to the package | |||
| cp -r ${MNIST_DATA_PATH} package/dataset | |||
| # Compile program | |||
| make TARGET=arm64 | |||
| # Copy the executable to the package | |||
| mv bin package/ | |||
| # Push the folder to the device | |||
| adb push package /data/local/tmp/ | |||
| echo "Training on Device" | |||
| adb shell < scripts/run_train.sh | |||
| echo | |||
| echo "Load trained model and evaluate accuracy" | |||
| adb shell < scripts/run_eval.sh | |||
| echo | |||
| #rm -rf src/*.o package model/__pycache__ model/*.ms | |||
| #./prepare_and_run.sh /opt/share/dataset/mnist mindspore_dev:5 | |||
| @@ -0,0 +1,19 @@ | |||
| #!/bin/bash | |||
| # Copyright 2020 Huawei Technologies Co., Ltd | |||
| # | |||
| # Licensed under the Apache License, Version 2.0 (the "License"); | |||
| # you may not use this file except in compliance with the License. | |||
| # You may obtain a copy of the License at | |||
| # | |||
| # http://www.apache.org/licenses/LICENSE-2.0 | |||
| # | |||
| # Unless required by applicable law or agreed to in writing, software | |||
| # distributed under the License is distributed on an "AS IS" BASIS, | |||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | |||
| # See the License for the specific language governing permissions and | |||
| # limitations under the License. | |||
| # ============================================================================ | |||
| # an simple tutorial as follows, more parameters can be setting | |||
| DATA_PATH=$1 | |||
| LD_LIBRARY_PATH=./lib/ bin/net_runner -f model/lenet_tod_trained_3000.ms -e 0 -d dataset | |||
| @@ -0,0 +1,2 @@ | |||
| cd /data/local/tmp/package | |||
| /system/bin/sh eval.sh | |||
| @@ -0,0 +1,2 @@ | |||
| cd /data/local/tmp/package | |||
| /system/bin/sh train.sh | |||
| @@ -0,0 +1,21 @@ | |||
| #!/bin/bash | |||
| # Copyright 2020 Huawei Technologies Co., Ltd | |||
| # | |||
| # Licensed under the Apache License, Version 2.0 (the "License"); | |||
| # you may not use this file except in compliance with the License. | |||
| # You may obtain a copy of the License at | |||
| # | |||
| # http://www.apache.org/licenses/LICENSE-2.0 | |||
| # | |||
| # Unless required by applicable law or agreed to in writing, software | |||
| # distributed under the License is distributed on an "AS IS" BASIS, | |||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | |||
| # See the License for the specific language governing permissions and | |||
| # limitations under the License. | |||
| # ============================================================================ | |||
| # an simple tutorial as follows, more parameters can be setting | |||
| script_self=$(readlink -f "$0") | |||
| self_path=$(dirname "${script_self}") | |||
| DATA_PATH=$1 | |||
| LD_LIBRARY_PATH=./lib/ bin/net_runner -f model/lenet_tod.ms -e 3000 -d dataset | |||
| @@ -0,0 +1,200 @@ | |||
| /** | |||
| * Copyright 2020 Huawei Technologies Co., Ltd | |||
| * | |||
| * Licensed under the Apache License, Version 2.0 (the "License"); | |||
| * you may not use this file except in compliance with the License. | |||
| * You may obtain a copy of the License at | |||
| * | |||
| * http://www.apache.org/licenses/LICENSE-2.0 | |||
| * | |||
| * Unless required by applicable law or agreed to in writing, software | |||
| * distributed under the License is distributed on an "AS IS" BASIS, | |||
| * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | |||
| * See the License for the specific language governing permissions and | |||
| * limitations under the License. | |||
| */ | |||
| #include "src/dataset.h" | |||
| #include <assert.h> | |||
| #include <arpa/inet.h> | |||
| #include <map> | |||
| #include <iostream> | |||
| #include <fstream> | |||
| #include <memory> | |||
| #include <filesystem> | |||
| using LabelId = std::map<std::string, int>; | |||
| char *ReadFile(const std::string &file, size_t *size) { | |||
| assert(size != nullptr); | |||
| std::string realPath(file); | |||
| std::ifstream ifs(realPath); | |||
| if (!ifs.good()) { | |||
| std::cerr << "file: " << realPath << " does not exist"; | |||
| return nullptr; | |||
| } | |||
| if (!ifs.is_open()) { | |||
| std::cerr << "file: " << realPath << " open failed"; | |||
| return nullptr; | |||
| } | |||
| ifs.seekg(0, std::ios::end); | |||
| *size = ifs.tellg(); | |||
| std::unique_ptr<char[]> buf(new (std::nothrow) char[*size]); | |||
| if (buf == nullptr) { | |||
| std::cerr << "malloc buf failed, file: " << realPath; | |||
| ifs.close(); | |||
| return nullptr; | |||
| } | |||
| ifs.seekg(0, std::ios::beg); | |||
| ifs.read(buf.get(), *size); | |||
| ifs.close(); | |||
| return buf.release(); | |||
| } | |||
| DataSet::~DataSet() { | |||
| for (auto itr = train_data_.begin(); itr != train_data_.end(); ++itr) { | |||
| auto ptr = std::get<0>(*itr); | |||
| delete[] ptr; | |||
| } | |||
| for (auto itr = test_data_.begin(); itr != test_data_.end(); ++itr) { | |||
| auto ptr = std::get<0>(*itr); | |||
| delete[] ptr; | |||
| } | |||
| } | |||
| int DataSet::Init(const std::string &data_base_directory, database_type type) { | |||
| InitializeMNISTDatabase(data_base_directory); | |||
| return 0; | |||
| } | |||
| void DataSet::InitializeMNISTDatabase(std::string dpath) { | |||
| // int total_data = 0; | |||
| num_of_classes_ = 10; | |||
| // total_data += | |||
| ReadMNISTFile(dpath + "/train/train-images-idx3-ubyte", dpath + "/train/train-labels-idx1-ubyte", &train_data_); | |||
| // total_data += | |||
| ReadMNISTFile(dpath + "/test/t10k-images-idx3-ubyte", dpath + "/test/t10k-labels-idx1-ubyte", &test_data_); | |||
| } | |||
| int DataSet::ReadMNISTFile(const std::string &ifile_name, const std::string &lfile_name, | |||
| std::vector<DataLabelTuple> *dataset) { | |||
| std::ifstream lfile(lfile_name, std::ios::binary); | |||
| if (!lfile.is_open()) { | |||
| std::cerr << "Cannot open label file " << lfile_name << std::endl; | |||
| return 0; | |||
| } | |||
| std::ifstream ifile(ifile_name, std::ios::binary); | |||
| if (!ifile.is_open()) { | |||
| std::cerr << "Cannot open data file " << ifile_name << std::endl; | |||
| return 0; | |||
| } | |||
| int magic_number = 0; | |||
| lfile.read(reinterpret_cast<char *>(&magic_number), sizeof(magic_number)); | |||
| magic_number = ntohl(magic_number); | |||
| if (magic_number != 2049) { | |||
| std::cout << "Invalid MNIST label file!" << std::endl; | |||
| return 0; | |||
| } | |||
| int number_of_labels = 0; | |||
| lfile.read(reinterpret_cast<char *>(&number_of_labels), sizeof(number_of_labels)); | |||
| number_of_labels = ntohl(number_of_labels); | |||
| ifile.read(reinterpret_cast<char *>(&magic_number), sizeof(magic_number)); | |||
| magic_number = ntohl(magic_number); | |||
| if (magic_number != 2051) { | |||
| std::cout << "Invalid MNIST image file!" << std::endl; | |||
| return 0; | |||
| } | |||
| int number_of_images = 0; | |||
| ifile.read(reinterpret_cast<char *>(&number_of_images), sizeof(number_of_images)); | |||
| number_of_images = ntohl(number_of_images); | |||
| int n_rows = 0; | |||
| ifile.read(reinterpret_cast<char *>(&n_rows), sizeof(n_rows)); | |||
| n_rows = ntohl(n_rows); | |||
| int n_cols = 0; | |||
| ifile.read(reinterpret_cast<char *>(&n_cols), sizeof(n_cols)); | |||
| n_cols = ntohl(n_cols); | |||
| if (number_of_labels != number_of_images) { | |||
| std::cout << "number of records in labels and images files does not match" << std::endl; | |||
| return 0; | |||
| } | |||
| int image_size = n_rows * n_cols; | |||
| unsigned char labels[number_of_labels]; | |||
| unsigned char data[image_size]; | |||
| lfile.read(reinterpret_cast<char *>(labels), number_of_labels); | |||
| for (int i = 0; i < number_of_labels; ++i) { | |||
| std::unique_ptr<float[]> hwc_bin_image(new (std::nothrow) float[32 * 32]); | |||
| ifile.read(reinterpret_cast<char *>(data), image_size); | |||
| for (size_t r = 0; r < 32; r++) { | |||
| for (size_t c = 0; c < 32; c++) { | |||
| if (r < 2 || r > 29 || c < 2 || c > 29) | |||
| hwc_bin_image[r * 32 + c] = 0.0; | |||
| else | |||
| hwc_bin_image[r * 32 + c] = (static_cast<float>(data[(r - 2) * 28 + (c - 2)])) / 255.0; | |||
| } | |||
| } | |||
| DataLabelTuple data_entry = std::make_tuple(reinterpret_cast<char *>(hwc_bin_image.release()), labels[i]); | |||
| dataset->push_back(data_entry); | |||
| } | |||
| return number_of_labels; | |||
| } | |||
| std::vector<FileTuple> DataSet::ReadFileList(std::string dpath) { | |||
| std::vector<FileTuple> vec; | |||
| std::ifstream ifs(dpath + "/file_list.txt"); | |||
| std::string file_name; | |||
| if (ifs.is_open()) { | |||
| int label; | |||
| while (!ifs.eof()) { | |||
| ifs >> label >> file_name; | |||
| vec.push_back(make_tuple(label, file_name)); | |||
| } | |||
| } | |||
| return vec; | |||
| } | |||
| std::vector<FileTuple> DataSet::ReadDir(const std::string dpath) { | |||
| std::filesystem::directory_iterator dir(dpath); | |||
| std::vector<FileTuple> vec; | |||
| LabelId label_id; | |||
| int class_id = 0; | |||
| int class_label; | |||
| for (const auto p : dir) { | |||
| if (p.is_directory()) { | |||
| std::string path = p.path().stem().string(); | |||
| auto label = label_id.find(path); | |||
| if (label == label_id.end()) { | |||
| label_id[path] = class_id; | |||
| class_label = class_id; | |||
| class_id++; | |||
| num_of_classes_ = class_id; | |||
| } else { | |||
| class_label = label->second; | |||
| } | |||
| std::filesystem::directory_iterator ndir(dpath + "/" + path); | |||
| for (const auto np : ndir) { | |||
| if (np.path().extension().string() == ".bin") { | |||
| std::string entry = | |||
| dpath + "/" + np.path().parent_path().stem().string() + "/" + np.path().filename().string(); | |||
| FileTuple ft = make_tuple(class_label, entry); | |||
| vec.push_back(ft); | |||
| } | |||
| } | |||
| } | |||
| } | |||
| return vec; | |||
| } | |||
| @@ -0,0 +1,56 @@ | |||
| /** | |||
| * Copyright 2020 Huawei Technologies Co., Ltd | |||
| * | |||
| * Licensed under the Apache License, Version 2.0 (the "License"); | |||
| * you may not use this file except in compliance with the License. | |||
| * You may obtain a copy of the License at | |||
| * | |||
| * http://www.apache.org/licenses/LICENSE-2.0 | |||
| * | |||
| * Unless required by applicable law or agreed to in writing, software | |||
| * distributed under the License is distributed on an "AS IS" BASIS, | |||
| * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | |||
| * See the License for the specific language governing permissions and | |||
| * limitations under the License. | |||
| */ | |||
| #ifndef MODEL_ZOO_OFFICIAL_TOD_TRAIN_LENET_SRC_DATASET_H_ | |||
| #define MODEL_ZOO_OFFICIAL_TOD_TRAIN_LENET_SRC_DATASET_H_ | |||
| #include <tuple> | |||
| #include <string> | |||
| #include <vector> | |||
| using DataLabelTuple = std::tuple<char *, int>; | |||
| using FileTuple = std::tuple<int, std::string>; | |||
| enum database_type { DS_CIFAR10_BINARY = 0, DS_MNIST_BINARY, DS_OTHER }; | |||
| char *ReadFile(const std::string &file, size_t *size); // utility function | |||
| class DataSet { | |||
| public: | |||
| DataSet() {} | |||
| ~DataSet(); | |||
| int Init(const std::string &data_base_directory, database_type type = DS_OTHER); | |||
| const std::vector<DataLabelTuple> &train_data() const { return train_data_; } | |||
| const std::vector<DataLabelTuple> &test_data() const { return test_data_; } | |||
| unsigned int num_of_classes() { return num_of_classes_; } | |||
| void set_expected_data_size(unsigned int expected_data_size) { expected_data_size_ = expected_data_size; } | |||
| unsigned int expected_data_size() { return expected_data_size_; } | |||
| private: | |||
| std::vector<FileTuple> ReadFileList(std::string dpath); | |||
| std::vector<FileTuple> ReadDir(const std::string dpath); | |||
| int ReadMNISTFile(const std::string &ifile, const std::string &lfile, std::vector<DataLabelTuple> *dataset); | |||
| void InitializeMNISTDatabase(std::string dpath); | |||
| std::vector<DataLabelTuple> train_data_; | |||
| std::vector<DataLabelTuple> test_data_; | |||
| unsigned int num_of_classes_ = 0; | |||
| unsigned int expected_data_size_ = 0; | |||
| }; | |||
| #endif // MODEL_ZOO_OFFICIAL_TOD_TRAIN_LENET_SRC_DATASET_H_ | |||
| @@ -0,0 +1,247 @@ | |||
| /** | |||
| * Copyright 2020 Huawei Technologies Co., Ltd | |||
| * | |||
| * Licensed under the Apache License, Version 2.0 (the "License"); | |||
| * you may not use this file except in compliance with the License. | |||
| * You may obtain a copy of the License at | |||
| * | |||
| * http://www.apache.org/licenses/LICENSE-2.0 | |||
| * | |||
| * Unless required by applicable law or agreed to in writing, software | |||
| * distributed under the License is distributed on an "AS IS" BASIS, | |||
| * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | |||
| * See the License for the specific language governing permissions and | |||
| * limitations under the License. | |||
| */ | |||
| #include "src/net_runner.h" | |||
| #include <math.h> | |||
| #include <getopt.h> | |||
| #include <iostream> | |||
| #include <fstream> | |||
| #include "include/context.h" | |||
| unsigned int NetRunner::seed_ = time(NULL); | |||
| // Definition of callback function after forwarding operator. | |||
| bool after_callback(const std::vector<mindspore::tensor::MSTensor *> &after_inputs, | |||
| const std::vector<mindspore::tensor::MSTensor *> &after_outputs, | |||
| const mindspore::CallBackParam &call_param) { | |||
| printf("%s\n", call_param.node_name.c_str()); | |||
| for (size_t i = 0; i < after_inputs.size(); i++) { | |||
| int num2p = (after_inputs.at(i)->ElementsNum()); | |||
| printf("in%zu(%d): ", i, num2p); | |||
| if (num2p > 10) num2p = 10; | |||
| if (after_inputs.at(i)->data_type() == mindspore::kNumberTypeInt32) { | |||
| auto d = reinterpret_cast<int *>(after_inputs.at(i)->MutableData()); | |||
| for (int j = 0; j < num2p; j++) printf("%d, ", d[j]); | |||
| } else { | |||
| auto d = reinterpret_cast<float *>(after_inputs.at(i)->MutableData()); | |||
| for (int j = 0; j < num2p; j++) printf("%f, ", d[j]); | |||
| } | |||
| printf("\n"); | |||
| } | |||
| for (size_t i = 0; i < after_outputs.size(); i++) { | |||
| auto d = reinterpret_cast<float *>(after_outputs.at(i)->MutableData()); | |||
| int num2p = (after_outputs.at(i)->ElementsNum()); | |||
| printf("ou%zu(%d): ", i, num2p); | |||
| if (num2p > 10) num2p = 10; | |||
| for (int j = 0; j < num2p; j++) printf("%f, ", d[j]); | |||
| printf("\n"); | |||
| } | |||
| return true; | |||
| } | |||
| NetRunner::~NetRunner() { | |||
| if (session_ != nullptr) delete session_; | |||
| } | |||
| void NetRunner::InitAndFigureInputs() { | |||
| mindspore::lite::Context context; | |||
| context.device_list_[0].device_info_.cpu_device_info_.cpu_bind_mode_ = mindspore::lite::NO_BIND; | |||
| context.thread_num_ = 1; | |||
| session_ = mindspore::session::TrainSession::CreateSession(ms_file_, &context); | |||
| assert(nullptr != session_); | |||
| auto inputs = session_->GetInputs(); | |||
| assert(inputs.size() > 1); | |||
| data_index_ = 0; | |||
| label_index_ = 1; | |||
| batch_size_ = inputs[data_index_]->shape()[0]; | |||
| data_size_ = inputs[data_index_]->Size() / batch_size_; // in bytes | |||
| if (verbose_) { | |||
| std::cout << "data size: " << data_size_ << std::endl << "batch size: " << batch_size_ << std::endl; | |||
| } | |||
| } | |||
| mindspore::tensor::MSTensor *NetRunner::SearchOutputsForSize(size_t size) const { | |||
| auto outputs = session_->GetOutputs(); | |||
| for (auto it = outputs.begin(); it != outputs.end(); ++it) { | |||
| if (it->second->ElementsNum() == size) return it->second; | |||
| } | |||
| std::cout << "Model does not have an output tensor with size " << size << std::endl; | |||
| return nullptr; | |||
| } | |||
| std::vector<int> NetRunner::FillInputData(const std::vector<DataLabelTuple> &dataset, bool serially) const { | |||
| std::vector<int> labels_vec; | |||
| static unsigned int idx = 1; | |||
| int total_size = dataset.size(); | |||
| auto inputs = session_->GetInputs(); | |||
| char *input_data = reinterpret_cast<char *>(inputs.at(data_index_)->MutableData()); | |||
| auto labels = reinterpret_cast<float *>(inputs.at(label_index_)->MutableData()); | |||
| assert(total_size > 0); | |||
| assert(input_data != nullptr); | |||
| std::fill(labels, labels + inputs.at(label_index_)->ElementsNum(), 0.f); | |||
| for (int i = 0; i < batch_size_; i++) { | |||
| if (serially) { | |||
| idx = ++idx % total_size; | |||
| } else { | |||
| idx = rand_r(&seed_) % total_size; | |||
| } | |||
| int label = 0; | |||
| char *data = nullptr; | |||
| std::tie(data, label) = dataset[idx]; | |||
| memcpy(input_data + i * data_size_, data, data_size_); | |||
| labels[i * num_of_classes_ + label] = 1.0; // Model expects labels in onehot representation | |||
| labels_vec.push_back(label); | |||
| } | |||
| return labels_vec; | |||
| } | |||
| float NetRunner::CalculateAccuracy(int max_tests) const { | |||
| float accuracy = 0.0; | |||
| const std::vector<DataLabelTuple> test_set = ds_.test_data(); | |||
| int tests = test_set.size() / batch_size_; | |||
| if (max_tests != -1 && tests < max_tests) tests = max_tests; | |||
| session_->Eval(); | |||
| for (int i = 0; i < tests; i++) { | |||
| auto labels = FillInputData(test_set, (max_tests == -1)); | |||
| session_->RunGraph(); | |||
| auto outputsv = SearchOutputsForSize(batch_size_ * num_of_classes_); | |||
| assert(outputsv != nullptr); | |||
| auto scores = reinterpret_cast<float *>(outputsv->MutableData()); | |||
| for (int b = 0; b < batch_size_; b++) { | |||
| int max_idx = 0; | |||
| float max_score = scores[num_of_classes_ * b]; | |||
| for (int c = 0; c < num_of_classes_; c++) { | |||
| if (scores[num_of_classes_ * b + c] > max_score) { | |||
| max_score = scores[num_of_classes_ * b + c]; | |||
| max_idx = c; | |||
| } | |||
| } | |||
| if (labels[b] == max_idx) accuracy += 1.0; | |||
| } | |||
| } | |||
| session_->Train(); | |||
| accuracy /= static_cast<float>(batch_size_ * tests); | |||
| return accuracy; | |||
| } | |||
| int NetRunner::InitDB() { | |||
| if (data_size_ != 0) ds_.set_expected_data_size(data_size_); | |||
| int ret = ds_.Init(data_dir_, DS_MNIST_BINARY); | |||
| num_of_classes_ = ds_.num_of_classes(); | |||
| if (ds_.test_data().size() == 0) { | |||
| std::cout << "No relevant data was found in " << data_dir_ << std::endl; | |||
| assert(ds_.test_data().size() != 0); | |||
| } | |||
| return ret; | |||
| } | |||
| float NetRunner::GetLoss() const { | |||
| auto outputsv = SearchOutputsForSize(1); // Search for Loss which is a single value tensor | |||
| assert(outputsv != nullptr); | |||
| auto loss = reinterpret_cast<float *>(outputsv->MutableData()); | |||
| return loss[0]; | |||
| } | |||
| int NetRunner::TrainLoop() { | |||
| session_->Train(); | |||
| float min_loss = 1000.; | |||
| float max_acc = 0.; | |||
| for (int i = 0; i < cycles_; i++) { | |||
| FillInputData(ds_.train_data()); | |||
| session_->RunGraph(nullptr, verbose_ ? after_callback : nullptr); | |||
| float loss = GetLoss(); | |||
| if (min_loss > loss) min_loss = loss; | |||
| if (save_checkpoint_ != 0 && (i + 1) % save_checkpoint_ == 0) { | |||
| auto cpkt_fn = ms_file_.substr(0, ms_file_.find_last_of('.')) + "_trained_" + std::to_string(i + 1) + ".ms"; | |||
| session_->SaveToFile(cpkt_fn); | |||
| } | |||
| if ((i + 1) % 100 == 0) { | |||
| float acc = CalculateAccuracy(10); | |||
| if (max_acc < acc) max_acc = acc; | |||
| std::cout << i + 1 << ":\tLoss is " << std::setw(7) << loss << " [min=" << min_loss << "] " | |||
| << " max_acc=" << max_acc << std::endl; | |||
| } | |||
| } | |||
| return 0; | |||
| } | |||
| int NetRunner::Main() { | |||
| InitAndFigureInputs(); | |||
| InitDB(); | |||
| TrainLoop(); | |||
| float acc = CalculateAccuracy(); | |||
| std::cout << "accuracy = " << acc << std::endl; | |||
| if (cycles_ > 0) { | |||
| auto trained_fn = ms_file_.substr(0, ms_file_.find_last_of('.')) + "_trained_" + std::to_string(cycles_) + ".ms"; | |||
| session_->SaveToFile(trained_fn); | |||
| } | |||
| return 0; | |||
| } | |||
| void NetRunner::Usage() { | |||
| std::cout << "Usage: net_runner -f <.ms model file> -d <data_dir> [-c <num of training cycles>] " | |||
| << "[-v (verbose mode)] [-s <save checkpoint every X iterations>]" << std::endl; | |||
| } | |||
| bool NetRunner::ReadArgs(int argc, char *argv[]) { | |||
| int opt; | |||
| while ((opt = getopt(argc, argv, "f:e:d:s:ihc:v")) != -1) { | |||
| switch (opt) { | |||
| case 'f': | |||
| ms_file_ = std::string(optarg); | |||
| break; | |||
| case 'e': | |||
| cycles_ = atoi(optarg); | |||
| break; | |||
| case 'd': | |||
| data_dir_ = std::string(optarg); | |||
| break; | |||
| case 'v': | |||
| verbose_ = true; | |||
| break; | |||
| case 's': | |||
| save_checkpoint_ = atoi(optarg); | |||
| break; | |||
| case 'h': | |||
| default: | |||
| Usage(); | |||
| return false; | |||
| } | |||
| } | |||
| return true; | |||
| } | |||
| int main(int argc, char **argv) { | |||
| NetRunner nr; | |||
| if (nr.ReadArgs(argc, argv)) { | |||
| nr.Main(); | |||
| } else { | |||
| return -1; | |||
| } | |||
| return 0; | |||
| } | |||
| @@ -0,0 +1,61 @@ | |||
| /** | |||
| * Copyright 2020 Huawei Technologies Co., Ltd | |||
| * | |||
| * Licensed under the Apache License, Version 2.0 (the "License"); | |||
| * you may not use this file except in compliance with the License. | |||
| * You may obtain a copy of the License at | |||
| * | |||
| * http://www.apache.org/licenses/LICENSE-2.0 | |||
| * | |||
| * Unless required by applicable law or agreed to in writing, software | |||
| * distributed under the License is distributed on an "AS IS" BASIS, | |||
| * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | |||
| * See the License for the specific language governing permissions and | |||
| * limitations under the License. | |||
| */ | |||
| #ifndef MODEL_ZOO_OFFICIAL_TOD_TRAIN_LENET_SRC_NET_RUNNER_H_ | |||
| #define MODEL_ZOO_OFFICIAL_TOD_TRAIN_LENET_SRC_NET_RUNNER_H_ | |||
| #include <tuple> | |||
| #include <filesystem> | |||
| #include <map> | |||
| #include <vector> | |||
| #include <string> | |||
| #include "include/train_session.h" | |||
| #include "include/ms_tensor.h" | |||
| #include "src/dataset.h" | |||
| class NetRunner { | |||
| public: | |||
| int Main(); | |||
| bool ReadArgs(int argc, char *argv[]); | |||
| ~NetRunner(); | |||
| private: | |||
| void Usage(); | |||
| void InitAndFigureInputs(); | |||
| int InitDB(); | |||
| int TrainLoop(); | |||
| std::vector<int> FillInputData(const std::vector<DataLabelTuple> &dataset, bool is_train_set = false) const; | |||
| float CalculateAccuracy(int max_tests = -1) const; | |||
| float GetLoss() const; | |||
| mindspore::tensor::MSTensor *SearchOutputsForSize(size_t size) const; | |||
| DataSet ds_; | |||
| mindspore::session::TrainSession *session_ = nullptr; | |||
| std::string ms_file_ = ""; | |||
| std::string data_dir_ = ""; | |||
| size_t data_size_ = 0; | |||
| size_t batch_size_ = 0; | |||
| unsigned int cycles_ = 100; | |||
| int data_index_ = 0; | |||
| int label_index_ = -1; | |||
| int num_of_classes_ = 0; | |||
| bool verbose_ = false; | |||
| int save_checkpoint_ = 0; | |||
| static unsigned int seed_; | |||
| }; | |||
| #endif // MODEL_ZOO_OFFICIAL_TOD_TRAIN_LENET_SRC_NET_RUNNER_H_ | |||