add mindspore example: lenet_train

5 years ago · f39b365147
--- a/mindspore/lite/example/train_lenet/README.md
+++ b/mindspore/lite/example/train_lenet/README.md
@@ -0,0 +1,134 @@
 # Content

 <!-- TOC -->

 - [Overview](#overview)
 - [Model Architecture](#model-architecture)
 - [Dataset](#dataset)
 - [Environment Requirements](#environment-requirements)
 - [Quick Start](#quick-start)
 - [Script Detailed Description](#script-detailed-description)

 <!-- /TOC -->

 # Overview

 This folder holds code for Training-on-Device of a LeNet model. Part of the code runs on a server using MindSpore infrastructure, another part uses MindSpore Lite conversion utility, and the last part is the actual training of the model on some android-based device.

 # Model Architecture

 LeNet is a very simple network which is composed of only 5 layers, 2 of which are convolutional layers and the remaining 3 are fully connected layers. Such a small network can be fully trained (from scratch) on a device in a short time. Therefore, it is a good example.

 # Dataset

 In this example we use the MNIST dataset of handwritten digits as published in [THE MNIST DATABASE](<http://yann.lecun.com/exdb/mnist/>)

 - Dataset size：52.4M，60,000 28*28 in 10 classes
    - Test：10,000 images
    - Train：60,000 images
 - Data format：binary files
    - Note：Data will be processed in dataset.cc

 - The dataset directory structure is as follows:

 ```python
 mnist/
 ├── test
 │   ├── t10k-images-idx3-ubyte
 │   └── t10k-labels-idx1-ubyte
 └── train
    ├── train-images-idx3-ubyte
    └── train-labels-idx1-ubyte
 ```

 # Environment Requirements

 - Server side
    - [MindSpore Framework](https://www.mindspore.cn/install/en): it is recommended to install a docker image
    - [MindSpore ToD Framework](https://www.mindspore.cn/tutorial/tod/en/use/prparation.html)
    - [Android NDK r20b](https://dl.google.com/android/repository/android-ndk-r20b-linux-x86_64.zip)
    - [Android SDK](https://developer.android.com/studio?hl=zh-cn#cmdline-tools)
 - A connected Android device

 # Quick Start

 After installing all the above mentioned, the script in the home directory could be run with the following arguments:

 ```python
 sh ./prepare_and_run.sh DATASET_PATH [MINDSPORE_DOCKER] [RELEASE.tar.gz]
 ```

 where:

 - DATASET_PATH is the path to the [dataset](#dataset),
 - MINDSPORE_DOCKER is the image name of the docker that runs [MindSpore](#environment-requirements). If not provided MindSpore will be run locally
 - and REALEASE.tar.gz is a pointer to the MindSpore ToD release tar ball. If not provided, the script will attempt to find MindSpore ToD compilation output.

 # Script Detailed Description

 The provided `prepare_and_run.sh` script is performing the followings:

 - Prepare the trainable lenet model in a `.ms` format
 - Prepare the folder that should be pushed into the device
 - Copy this folder into the device and run the scripts on the device

 See how to run the script and paramaters definitions in the [Quick Start Section](#quick-start)

 ## Preparing the model

 Within the model folder a `prepare_model.sh` script uses MindSpore infrastructure to export the model into a `.mindir` file. The user can specify a docker image on which MindSpore is installed. Otherwise, the pyhton script will be run locally.
 The script then converts the `.mindir` to a `.ms` format using the MindSpore ToD converter.
 The script accepts a tar ball where the converter resides. Otherwise, the script will attempt to find the converter in the MindSpore ToD build output directory.

 ## Preparing the Folder

 The `lenet_tod.ms` model file is then copied into the `package` folder as well as scripts, the MindSpore ToD library and the MNIST dataset.
 Finally, the code (in src) is compiled for arm64 and the binary is copied into the `package` folder.

 ### Running the code on the device

 To run the code on the device the script first uses `adb` tool to push the `package` folder into the device. It then runs training (which takes some time) and finally runs evaluation of the trained model using the test data.

 # Folder Directory tree

 ``` python
 train_lenet/
 ├── Makefile              # Makefile of src code
 ├── model
 │   ├── lenet_export.py   # Python script that exports the LeNet model to .mindir
 │   ├── prepare_model.sh  # script that export model (using docker) then converts it
 │   └── train_utils.py    # utility function used during the export
 ├── prepare_and_run.sh    # main script that creates model, compiles it and send to device for running
 ├── README.md             # this manual
 ├── scripts
 │   ├── eval.sh           # on-device script that load the train model and evaluates its accuracy
 │   ├── run_eval.sh       # adb script that launches eval.sh
 │   ├── run_train.sh      # adb script that launches train.sh
 │   └── train.sh          # on-device script that load the initial model and train it
 ├── src
 │   ├── dataset.cc        # dataset handler
 │   ├── dataset.h         # dataset class header
 │   ├── net_runner.cc     # program that runs training/evaluation of models
 │   └── net_runner.h      # net_runner header
 ```

 When the `prepare_and_run.sh` script is run, the following folder is prepared. It is pushed to the device and then training runs

 ``` python
 ├── package
 │   ├── bin
 │   │   └── net_runner                   # the executable that performs the training/evaluation
 │   ├── dataset
 │   │   ├── test
 │   │   │   ├── t10k-images-idx3-ubyte   # test images
 │   │   │   └── t10k-labels-idx1-ubyte   # test labels
 │   │   └── train
 │   │       ├── train-images-idx3-ubyte  # train images
 │   │       └── train-labels-idx1-ubyte  # train labels
 │   ├── eval.sh                          # on-device script that load the train model and evaluates its accuracy
 │   ├── lib
 │   │   └── libmindspore-lite.so         # MindSpore Lite library
 │   ├── model
 │   │   └── lenet_tod.ms                 # model to train
 │   └── train.sh                         # on-device script that load the initial model and train it
 ```
--- a/mindspore/lite/example/train_lenet/model/lenet_export.py
+++ b/mindspore/lite/example/train_lenet/model/lenet_export.py
@@ -0,0 +1,37 @@
 # Copyright 2020 Huawei Technologies Co., Ltd
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 # http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # ============================================================================
 """lenet_export."""

 import sys
 from mindspore import context, Tensor
 import mindspore.common.dtype as mstype
 from mindspore.train.serialization import export
 from lenet import LeNet5
 import numpy as np
 from train_utils import TrainWrap

 sys.path.append('../../../cv/lenet/src/')

 n = LeNet5()
 n.set_train()
 context.set_context(mode=context.PYNATIVE_MODE, device_target="GPU", save_graphs=False)

 batch_size = 32
 x = Tensor(np.ones((batch_size, 1, 32, 32)), mstype.float32)
 label = Tensor(np.zeros([batch_size, 10]).astype(np.float32))
 net = TrainWrap(n)
 export(net, x, label, file_name="lenet_tod.mindir", file_format='MINDIR')

 print("finished exporting")
--- a/mindspore/lite/example/train_lenet/model/prepare_model.sh
+++ b/mindspore/lite/example/train_lenet/model/prepare_model.sh
@@ -0,0 +1,24 @@
 CONVERTER="../../../../../mindspore/lite/build/tools/converter/converter_lite"
 if [ ! -f "$CONVERTER" ]; then
  if ! command -v converter_lite &> /dev/null
  then
    echo "converter_lite could not be found in MindSpore build directory nor in system path"
    exit
  else
    CONVERTER=converter_lite
  fi
 fi

 echo "============Exporting=========="
 if [ -n "$1" ]; then
  DOCKER_IMG=$1
  docker run -w $PWD --runtime=nvidia -v /home/$USER:/home/$USER --privileged=true ${DOCKER_IMG} /bin/bash -c "python lenet_export.py; chmod 444 lenet_tod.mindir; rm -rf __pycache__"
 else
  echo "MindSpore docker was not provided, attempting to run locally"
  python lenet_export.py
 fi


 echo "============Converting========="
 $CONVERTER --fmk=MINDIR --trainModel=true --modelFile=lenet_tod.mindir --outputFile=lenet_tod

--- a/mindspore/lite/example/train_lenet/model/train_utils.py
+++ b/mindspore/lite/example/train_lenet/model/train_utils.py
@@ -0,0 +1,34 @@
 # Copyright 2020 Huawei Technologies Co., Ltd
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 # http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # ============================================================================
 """train_utils."""

 import mindspore.nn as nn
 from mindspore.common.parameter import ParameterTuple

 def TrainWrap(net, loss_fn=None, optimizer=None, weights=None):
    """
    TrainWrap
    """
    if loss_fn is None:
        loss_fn = nn.SoftmaxCrossEntropyWithLogits()
    loss_net = nn.WithLossCell(net, loss_fn)
    loss_net.set_train()
    if weights is None:
        weights = ParameterTuple(net.trainable_params())
    if optimizer is None:
        optimizer = nn.Adam(weights, learning_rate=1e-3, beta1=0.9, beta2=0.999, eps=1e-8, use_locking=False,
                            use_nesterov=False, weight_decay=0.0, loss_scale=1.0)
    train_net = nn.TrainOneStepCell(loss_net, optimizer)
    return train_net
--- a/mindspore/lite/example/train_lenet/prepare_and_run.sh
+++ b/mindspore/lite/example/train_lenet/prepare_and_run.sh
@@ -0,0 +1,82 @@
 #!/bin/bash

 display_usage() {
 	echo -e "\nUsage: prepare_and_run.sh dataset_path [mindspore_docker] [release.tar.gz]\n"
 	}

 if [ -n "$1" ]; then
  MNIST_DATA_PATH=$1
 else
  echo "MNIST Dataset directory path was not provided"
  display_usage
  exit 0
 fi

 if [ -n "$2" ]; then
  DOCKER=$2
 else
  DOCKER=""
  #echo "MindSpore docker was not provided"
  #display_usage
  #exit 0
 fi

 if [ -n "$3" ]; then
  TARBALL=$3
 else 
  if [ -f ../../../../output/mindspore-lite-*-runtime-arm64-cpu-train.tar.gz ]; then
    TARBALL="../../../../output/mindspore-lite-*-runtime-arm64-cpu-train.tar.gz"
  else
    echo "release.tar.gz was not found"
    display_usage
    exit 0
  fi
 fi


 # Prepare the model
 cd model/
 rm -f *.ms
 ./prepare_model.sh $DOCKER 
 cd -

 # Copy the .ms model to the package folder
 rm -rf package
 mkdir -p package/model
 cp model/*.ms package/model

 # Copy the running script to the package
 cp scripts/train.sh package/
 cp scripts/eval.sh package/

 # Copy the shared MindSpore ToD library
 tar -xzvf ${TARBALL} --wildcards --no-anchored libmindspore-lite.so
 tar -xzvf ${TARBALL} --wildcards --no-anchored include
 mv mindspore-*/lib package/
 mkdir msl
 mv mindspore-*/* msl/
 rm -rf mindspore-*

 # Copy the dataset to the package
 cp -r ${MNIST_DATA_PATH} package/dataset

 # Compile program
 make TARGET=arm64
 
 # Copy the executable to the package
 mv bin package/

 # Push the folder to the device
 adb push package /data/local/tmp/

 echo "Training on Device"
 adb shell < scripts/run_train.sh

 echo
 echo "Load trained model and evaluate accuracy"
 adb shell < scripts/run_eval.sh
 echo

 #rm -rf src/*.o package model/__pycache__ model/*.ms

 #./prepare_and_run.sh /opt/share/dataset/mnist mindspore_dev:5
--- a/mindspore/lite/example/train_lenet/scripts/eval.sh
+++ b/mindspore/lite/example/train_lenet/scripts/eval.sh
@@ -0,0 +1,19 @@
 #!/bin/bash
 # Copyright 2020 Huawei Technologies Co., Ltd
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 # http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # ============================================================================

 # an simple tutorial as follows, more parameters can be setting
 DATA_PATH=$1
 LD_LIBRARY_PATH=./lib/ bin/net_runner -f model/lenet_tod_trained_3000.ms -e 0 -d dataset 
--- a/mindspore/lite/example/train_lenet/scripts/run_eval.sh
+++ b/mindspore/lite/example/train_lenet/scripts/run_eval.sh
@@ -0,0 +1,2 @@
 cd /data/local/tmp/package
 /system/bin/sh eval.sh
--- a/mindspore/lite/example/train_lenet/scripts/run_train.sh
+++ b/mindspore/lite/example/train_lenet/scripts/run_train.sh
@@ -0,0 +1,2 @@
 cd /data/local/tmp/package
 /system/bin/sh  train.sh
--- a/mindspore/lite/example/train_lenet/scripts/train.sh
+++ b/mindspore/lite/example/train_lenet/scripts/train.sh
@@ -0,0 +1,21 @@
 #!/bin/bash
 # Copyright 2020 Huawei Technologies Co., Ltd
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 # http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # ============================================================================

 # an simple tutorial as follows, more parameters can be setting
 script_self=$(readlink -f "$0")
 self_path=$(dirname "${script_self}")
 DATA_PATH=$1
 LD_LIBRARY_PATH=./lib/ bin/net_runner -f model/lenet_tod.ms -e 3000 -d dataset 
--- a/mindspore/lite/example/train_lenet/src/dataset.cc
+++ b/mindspore/lite/example/train_lenet/src/dataset.cc
@@ -0,0 +1,200 @@
 /**
 * Copyright 2020 Huawei Technologies Co., Ltd
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 * http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

 #include "src/dataset.h"
 #include <assert.h>
 #include <arpa/inet.h>
 #include <map>
 #include <iostream>
 #include <fstream>
 #include <memory>
 #include <filesystem>

 using LabelId = std::map<std::string, int>;

 char *ReadFile(const std::string &file, size_t *size) {
  assert(size != nullptr);
  std::string realPath(file);
  std::ifstream ifs(realPath);
  if (!ifs.good()) {
    std::cerr << "file: " << realPath << " does not exist";
    return nullptr;
  }

  if (!ifs.is_open()) {
    std::cerr << "file: " << realPath << " open failed";
    return nullptr;
  }

  ifs.seekg(0, std::ios::end);
  *size = ifs.tellg();
  std::unique_ptr<char[]> buf(new (std::nothrow) char[*size]);
  if (buf == nullptr) {
    std::cerr << "malloc buf failed, file: " << realPath;
    ifs.close();
    return nullptr;
  }

  ifs.seekg(0, std::ios::beg);
  ifs.read(buf.get(), *size);
  ifs.close();

  return buf.release();
 }

 DataSet::~DataSet() {
  for (auto itr = train_data_.begin(); itr != train_data_.end(); ++itr) {
    auto ptr = std::get<0>(*itr);
    delete[] ptr;
  }
  for (auto itr = test_data_.begin(); itr != test_data_.end(); ++itr) {
    auto ptr = std::get<0>(*itr);
    delete[] ptr;
  }
 }

 int DataSet::Init(const std::string &data_base_directory, database_type type) {
  InitializeMNISTDatabase(data_base_directory);
  return 0;
 }

 void DataSet::InitializeMNISTDatabase(std::string dpath) {
  //  int total_data = 0;
  num_of_classes_ = 10;
  //  total_data +=
  ReadMNISTFile(dpath + "/train/train-images-idx3-ubyte", dpath + "/train/train-labels-idx1-ubyte", &train_data_);
  //  total_data +=
  ReadMNISTFile(dpath + "/test/t10k-images-idx3-ubyte", dpath + "/test/t10k-labels-idx1-ubyte", &test_data_);
 }

 int DataSet::ReadMNISTFile(const std::string &ifile_name, const std::string &lfile_name,
                           std::vector<DataLabelTuple> *dataset) {
  std::ifstream lfile(lfile_name, std::ios::binary);
  if (!lfile.is_open()) {
    std::cerr << "Cannot open label file " << lfile_name << std::endl;
    return 0;
  }

  std::ifstream ifile(ifile_name, std::ios::binary);
  if (!ifile.is_open()) {
    std::cerr << "Cannot open data file " << ifile_name << std::endl;
    return 0;
  }

  int magic_number = 0;
  lfile.read(reinterpret_cast<char *>(&magic_number), sizeof(magic_number));
  magic_number = ntohl(magic_number);
  if (magic_number != 2049) {
    std::cout << "Invalid MNIST label file!" << std::endl;
    return 0;
  }

  int number_of_labels = 0;
  lfile.read(reinterpret_cast<char *>(&number_of_labels), sizeof(number_of_labels));
  number_of_labels = ntohl(number_of_labels);

  ifile.read(reinterpret_cast<char *>(&magic_number), sizeof(magic_number));
  magic_number = ntohl(magic_number);
  if (magic_number != 2051) {
    std::cout << "Invalid MNIST image file!" << std::endl;
    return 0;
  }

  int number_of_images = 0;
  ifile.read(reinterpret_cast<char *>(&number_of_images), sizeof(number_of_images));
  number_of_images = ntohl(number_of_images);

  int n_rows = 0;
  ifile.read(reinterpret_cast<char *>(&n_rows), sizeof(n_rows));
  n_rows = ntohl(n_rows);

  int n_cols = 0;
  ifile.read(reinterpret_cast<char *>(&n_cols), sizeof(n_cols));
  n_cols = ntohl(n_cols);

  if (number_of_labels != number_of_images) {
    std::cout << "number of records in labels and images files does not match" << std::endl;
    return 0;
  }

  int image_size = n_rows * n_cols;
  unsigned char labels[number_of_labels];
  unsigned char data[image_size];
  lfile.read(reinterpret_cast<char *>(labels), number_of_labels);

  for (int i = 0; i < number_of_labels; ++i) {
    std::unique_ptr<float[]> hwc_bin_image(new (std::nothrow) float[32 * 32]);
    ifile.read(reinterpret_cast<char *>(data), image_size);

    for (size_t r = 0; r < 32; r++) {
      for (size_t c = 0; c < 32; c++) {
        if (r < 2 || r > 29 || c < 2 || c > 29)
          hwc_bin_image[r * 32 + c] = 0.0;
        else
          hwc_bin_image[r * 32 + c] = (static_cast<float>(data[(r - 2) * 28 + (c - 2)])) / 255.0;
      }
    }
    DataLabelTuple data_entry = std::make_tuple(reinterpret_cast<char *>(hwc_bin_image.release()), labels[i]);
    dataset->push_back(data_entry);
  }
  return number_of_labels;
 }

 std::vector<FileTuple> DataSet::ReadFileList(std::string dpath) {
  std::vector<FileTuple> vec;
  std::ifstream ifs(dpath + "/file_list.txt");
  std::string file_name;
  if (ifs.is_open()) {
    int label;
    while (!ifs.eof()) {
      ifs >> label >> file_name;
      vec.push_back(make_tuple(label, file_name));
    }
  }
  return vec;
 }

 std::vector<FileTuple> DataSet::ReadDir(const std::string dpath) {
  std::filesystem::directory_iterator dir(dpath);
  std::vector<FileTuple> vec;
  LabelId label_id;
  int class_id = 0;
  int class_label;
  for (const auto p : dir) {
    if (p.is_directory()) {
      std::string path = p.path().stem().string();
      auto label = label_id.find(path);
      if (label == label_id.end()) {
        label_id[path] = class_id;
        class_label = class_id;
        class_id++;
        num_of_classes_ = class_id;
      } else {
        class_label = label->second;
      }
      std::filesystem::directory_iterator ndir(dpath + "/" + path);
      for (const auto np : ndir) {
        if (np.path().extension().string() == ".bin") {
          std::string entry =
            dpath + "/" + np.path().parent_path().stem().string() + "/" + np.path().filename().string();
          FileTuple ft = make_tuple(class_label, entry);
          vec.push_back(ft);
        }
      }
    }
  }
  return vec;
 }
--- a/mindspore/lite/example/train_lenet/src/dataset.h
+++ b/mindspore/lite/example/train_lenet/src/dataset.h
@@ -0,0 +1,56 @@
 /**
 * Copyright 2020 Huawei Technologies Co., Ltd
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 * http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

 #ifndef MODEL_ZOO_OFFICIAL_TOD_TRAIN_LENET_SRC_DATASET_H_
 #define MODEL_ZOO_OFFICIAL_TOD_TRAIN_LENET_SRC_DATASET_H_

 #include <tuple>
 #include <string>
 #include <vector>

 using DataLabelTuple = std::tuple<char *, int>;
 using FileTuple = std::tuple<int, std::string>;

 enum database_type { DS_CIFAR10_BINARY = 0, DS_MNIST_BINARY, DS_OTHER };

 char *ReadFile(const std::string &file, size_t *size);  // utility function

 class DataSet {
 public:
  DataSet() {}
  ~DataSet();

  int Init(const std::string &data_base_directory, database_type type = DS_OTHER);

  const std::vector<DataLabelTuple> &train_data() const { return train_data_; }
  const std::vector<DataLabelTuple> &test_data() const { return test_data_; }
  unsigned int num_of_classes() { return num_of_classes_; }
  void set_expected_data_size(unsigned int expected_data_size) { expected_data_size_ = expected_data_size; }
  unsigned int expected_data_size() { return expected_data_size_; }

 private:
  std::vector<FileTuple> ReadFileList(std::string dpath);
  std::vector<FileTuple> ReadDir(const std::string dpath);
  int ReadMNISTFile(const std::string &ifile, const std::string &lfile, std::vector<DataLabelTuple> *dataset);
  void InitializeMNISTDatabase(std::string dpath);

  std::vector<DataLabelTuple> train_data_;
  std::vector<DataLabelTuple> test_data_;
  unsigned int num_of_classes_ = 0;
  unsigned int expected_data_size_ = 0;
 };

 #endif  // MODEL_ZOO_OFFICIAL_TOD_TRAIN_LENET_SRC_DATASET_H_
--- a/mindspore/lite/example/train_lenet/src/net_runner.cc
+++ b/mindspore/lite/example/train_lenet/src/net_runner.cc
@@ -0,0 +1,247 @@
 /**
 * Copyright 2020 Huawei Technologies Co., Ltd
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 * http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

 #include "src/net_runner.h"
 #include <math.h>
 #include <getopt.h>
 #include <iostream>
 #include <fstream>
 #include "include/context.h"

 unsigned int NetRunner::seed_ = time(NULL);
 // Definition of callback function after forwarding operator.
 bool after_callback(const std::vector<mindspore::tensor::MSTensor *> &after_inputs,
                    const std::vector<mindspore::tensor::MSTensor *> &after_outputs,
                    const mindspore::CallBackParam &call_param) {
  printf("%s\n", call_param.node_name.c_str());
  for (size_t i = 0; i < after_inputs.size(); i++) {
    int num2p = (after_inputs.at(i)->ElementsNum());
    printf("in%zu(%d): ", i, num2p);
    if (num2p > 10) num2p = 10;
    if (after_inputs.at(i)->data_type() == mindspore::kNumberTypeInt32) {
      auto d = reinterpret_cast<int *>(after_inputs.at(i)->MutableData());
      for (int j = 0; j < num2p; j++) printf("%d, ", d[j]);
    } else {
      auto d = reinterpret_cast<float *>(after_inputs.at(i)->MutableData());
      for (int j = 0; j < num2p; j++) printf("%f, ", d[j]);
    }
    printf("\n");
  }
  for (size_t i = 0; i < after_outputs.size(); i++) {
    auto d = reinterpret_cast<float *>(after_outputs.at(i)->MutableData());
    int num2p = (after_outputs.at(i)->ElementsNum());
    printf("ou%zu(%d): ", i, num2p);
    if (num2p > 10) num2p = 10;
    for (int j = 0; j < num2p; j++) printf("%f, ", d[j]);
    printf("\n");
  }
  return true;
 }

 NetRunner::~NetRunner() {
  if (session_ != nullptr) delete session_;
 }

 void NetRunner::InitAndFigureInputs() {
  mindspore::lite::Context context;
  context.device_list_[0].device_info_.cpu_device_info_.cpu_bind_mode_ = mindspore::lite::NO_BIND;
  context.thread_num_ = 1;

  session_ = mindspore::session::TrainSession::CreateSession(ms_file_, &context);
  assert(nullptr != session_);

  auto inputs = session_->GetInputs();
  assert(inputs.size() > 1);
  data_index_ = 0;
  label_index_ = 1;
  batch_size_ = inputs[data_index_]->shape()[0];
  data_size_ = inputs[data_index_]->Size() / batch_size_;  // in bytes
  if (verbose_) {
    std::cout << "data size: " << data_size_ << std::endl << "batch size: " << batch_size_ << std::endl;
  }
 }

 mindspore::tensor::MSTensor *NetRunner::SearchOutputsForSize(size_t size) const {
  auto outputs = session_->GetOutputs();
  for (auto it = outputs.begin(); it != outputs.end(); ++it) {
    if (it->second->ElementsNum() == size) return it->second;
  }
  std::cout << "Model does not have an output tensor with size " << size << std::endl;
  return nullptr;
 }

 std::vector<int> NetRunner::FillInputData(const std::vector<DataLabelTuple> &dataset, bool serially) const {
  std::vector<int> labels_vec;
  static unsigned int idx = 1;
  int total_size = dataset.size();

  auto inputs = session_->GetInputs();
  char *input_data = reinterpret_cast<char *>(inputs.at(data_index_)->MutableData());
  auto labels = reinterpret_cast<float *>(inputs.at(label_index_)->MutableData());
  assert(total_size > 0);
  assert(input_data != nullptr);
  std::fill(labels, labels + inputs.at(label_index_)->ElementsNum(), 0.f);
  for (int i = 0; i < batch_size_; i++) {
    if (serially) {
      idx = ++idx % total_size;
    } else {
      idx = rand_r(&seed_) % total_size;
    }
    int label = 0;
    char *data = nullptr;
    std::tie(data, label) = dataset[idx];
    memcpy(input_data + i * data_size_, data, data_size_);
    labels[i * num_of_classes_ + label] = 1.0;  // Model expects labels in onehot representation
    labels_vec.push_back(label);
  }

  return labels_vec;
 }

 float NetRunner::CalculateAccuracy(int max_tests) const {
  float accuracy = 0.0;
  const std::vector<DataLabelTuple> test_set = ds_.test_data();
  int tests = test_set.size() / batch_size_;
  if (max_tests != -1 && tests < max_tests) tests = max_tests;

  session_->Eval();
  for (int i = 0; i < tests; i++) {
    auto labels = FillInputData(test_set, (max_tests == -1));
    session_->RunGraph();
    auto outputsv = SearchOutputsForSize(batch_size_ * num_of_classes_);
    assert(outputsv != nullptr);
    auto scores = reinterpret_cast<float *>(outputsv->MutableData());
    for (int b = 0; b < batch_size_; b++) {
      int max_idx = 0;
      float max_score = scores[num_of_classes_ * b];
      for (int c = 0; c < num_of_classes_; c++) {
        if (scores[num_of_classes_ * b + c] > max_score) {
          max_score = scores[num_of_classes_ * b + c];
          max_idx = c;
        }
      }
      if (labels[b] == max_idx) accuracy += 1.0;
    }
  }
  session_->Train();
  accuracy /= static_cast<float>(batch_size_ * tests);
  return accuracy;
 }

 int NetRunner::InitDB() {
  if (data_size_ != 0) ds_.set_expected_data_size(data_size_);
  int ret = ds_.Init(data_dir_, DS_MNIST_BINARY);
  num_of_classes_ = ds_.num_of_classes();
  if (ds_.test_data().size() == 0) {
    std::cout << "No relevant data was found in " << data_dir_ << std::endl;
    assert(ds_.test_data().size() != 0);
  }

  return ret;
 }

 float NetRunner::GetLoss() const {
  auto outputsv = SearchOutputsForSize(1);  // Search for Loss which is a single value tensor
  assert(outputsv != nullptr);
  auto loss = reinterpret_cast<float *>(outputsv->MutableData());
  return loss[0];
 }

 int NetRunner::TrainLoop() {
  session_->Train();
  float min_loss = 1000.;
  float max_acc = 0.;
  for (int i = 0; i < cycles_; i++) {
    FillInputData(ds_.train_data());
    session_->RunGraph(nullptr, verbose_ ? after_callback : nullptr);
    float loss = GetLoss();
    if (min_loss > loss) min_loss = loss;

    if (save_checkpoint_ != 0 && (i + 1) % save_checkpoint_ == 0) {
      auto cpkt_fn = ms_file_.substr(0, ms_file_.find_last_of('.')) + "_trained_" + std::to_string(i + 1) + ".ms";
      session_->SaveToFile(cpkt_fn);
    }

    if ((i + 1) % 100 == 0) {
      float acc = CalculateAccuracy(10);
      if (max_acc < acc) max_acc = acc;
      std::cout << i + 1 << ":\tLoss is " << std::setw(7) << loss << " [min=" << min_loss << "] "
                << " max_acc=" << max_acc << std::endl;
    }
  }
  return 0;
 }

 int NetRunner::Main() {
  InitAndFigureInputs();

  InitDB();

  TrainLoop();

  float acc = CalculateAccuracy();
  std::cout << "accuracy = " << acc << std::endl;

  if (cycles_ > 0) {
    auto trained_fn = ms_file_.substr(0, ms_file_.find_last_of('.')) + "_trained_" + std::to_string(cycles_) + ".ms";
    session_->SaveToFile(trained_fn);
  }
  return 0;
 }

 void NetRunner::Usage() {
  std::cout << "Usage: net_runner -f <.ms model file> -d <data_dir> [-c <num of training cycles>] "
            << "[-v (verbose mode)] [-s <save checkpoint every X iterations>]" << std::endl;
 }

 bool NetRunner::ReadArgs(int argc, char *argv[]) {
  int opt;
  while ((opt = getopt(argc, argv, "f:e:d:s:ihc:v")) != -1) {
    switch (opt) {
      case 'f':
        ms_file_ = std::string(optarg);
        break;
      case 'e':
        cycles_ = atoi(optarg);
        break;
      case 'd':
        data_dir_ = std::string(optarg);
        break;
      case 'v':
        verbose_ = true;
        break;
      case 's':
        save_checkpoint_ = atoi(optarg);
        break;
      case 'h':
      default:
        Usage();
        return false;
    }
  }
  return true;
 }

 int main(int argc, char **argv) {
  NetRunner nr;

  if (nr.ReadArgs(argc, argv)) {
    nr.Main();
  } else {
    return -1;
  }
  return 0;
 }
--- a/mindspore/lite/example/train_lenet/src/net_runner.h
+++ b/mindspore/lite/example/train_lenet/src/net_runner.h
@@ -0,0 +1,61 @@
 /**
 * Copyright 2020 Huawei Technologies Co., Ltd
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 * http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

 #ifndef MODEL_ZOO_OFFICIAL_TOD_TRAIN_LENET_SRC_NET_RUNNER_H_
 #define MODEL_ZOO_OFFICIAL_TOD_TRAIN_LENET_SRC_NET_RUNNER_H_

 #include <tuple>
 #include <filesystem>
 #include <map>
 #include <vector>
 #include <string>
 #include "include/train_session.h"
 #include "include/ms_tensor.h"
 #include "src/dataset.h"

 class NetRunner {
 public:
  int Main();
  bool ReadArgs(int argc, char *argv[]);
  ~NetRunner();

 private:
  void Usage();
  void InitAndFigureInputs();
  int InitDB();
  int TrainLoop();
  std::vector<int> FillInputData(const std::vector<DataLabelTuple> &dataset, bool is_train_set = false) const;
  float CalculateAccuracy(int max_tests = -1) const;
  float GetLoss() const;
  mindspore::tensor::MSTensor *SearchOutputsForSize(size_t size) const;

  DataSet ds_;
  mindspore::session::TrainSession *session_ = nullptr;

  std::string ms_file_ = "";
  std::string data_dir_ = "";
  size_t data_size_ = 0;
  size_t batch_size_ = 0;
  unsigned int cycles_ = 100;
  int data_index_ = 0;
  int label_index_ = -1;
  int num_of_classes_ = 0;
  bool verbose_ = false;
  int save_checkpoint_ = 0;
  static unsigned int seed_;
 };

 #endif  // MODEL_ZOO_OFFICIAL_TOD_TRAIN_LENET_SRC_NET_RUNNER_H_