!26315 [Resnet] revert to PR 24604

Merge pull request !26315 from zhouneng/revert_pr_24604
revert PR 24604
--- a/RELEASE.md
+++ b/RELEASE.md
@@ -1,3 +1,107 @@
 # MindSpore 1.4.0
 ## MindSpore 1.4.0 Release Notes
 ### Major Features and Improvements
 #### NewModels
 #### FrontEnd
 #### Auto Parallel
 - Add distributed operators: Conv2D/Conv2DTranspose/Conv2DBackpropInput/MaxPool/AvgPool/BatchNorm/GatherD
 - Support to configure shard strategy for dataset
 #### Executor
 #### DataSet
 - Add SlicePatchesOperation for Remote Sensing feature（[!18179](https://e.gitee.com/mind_spore/repos/mindspore/mindspore/pulls/18179)）
 #### FederatedLearning
 #### Running Data Recorder
 #### GraphKernel Fusion
 #### Profiler
 - [STABLE]  Support MS_DIAGNOSTIC_DATA_PATH for profiler feature.(Ascend/GPU)
 #### Dump
 - [STABLE]  Support MS_DIAGNOSTIC_DATA_PATH for dump feature.(Ascend/GPU/CPU)
 ### API Change
 #### Backwards Incompatible Change
 ##### Python API
 ##### Command Line Interface
 ###### Dump Config
 Previously, we need to set the dump path in dump config file. To make the dump feature easier to use on cloud, we support new environment parameter `MS_DIAGNOSTIC_DATA_PATH`.
 | 1.3.0                          | 1.4.0                                                                                                                                        |
 | ------------------------------ | -------------------------------------------------------------------------------------------------------------------------------------------- |
 | `path` is a mandatory field. | `path` field is optional.  If `path` field is not provided or is empty string, `MS_DIAGNOSTIC_DATA_PATH` should be set in environment. |
 ### Bug fixes
 #### FrontEnd
 #### Executor
 #### Dataset
 - Fix module 'signal' has no attribute 'SIGCHLD' problem under windows platform. ([!21232](https://gitee.com/mindspore/mindspore/pulls/21232))
 ## MindSpore Lite
 ### Major Features and Improvements
 #### Converter and runtime
 #### x86 backend optimization
 #### ARM backend optimization
 #### Cuda backend optimization
 #### OpenCL backend
 #### Post quantization
 #### Training on Device
 #### Codegen
 ### API Change
 #### API Incompatible Change
 ##### C++ API
 #### New features
 ##### Java API
 ### Bug fixes
 #### Deprecations
 ### Contributors
 Thanks goes to these wonderful people:
 Adel, AGroupofProbiotocs, anthonyaje, anzhengqi, askmiao, baihuawei, baiyangfan, bai-yangfan, bingyaweng, BowenK, buxue, caifubi, CaoJian, caojian05, caozhou, Cathy, changzherui, chenbo116, chenfei, chengxianbin, chenhaozhe, chenjianping, chenzomi, chenzupeng, chujinjin, cj, cjh9368, Corleone, damon0626, danish, Danish, davidmc, dayschan, doitH, dong-li001, eric, Eric, fary86, fuzhiye, Gaoxiong, GAO_HYP_XYJ, gengdongjie, Gogery, gongdaguo, gray0v0, gukecai, guoqi, gzhcv, hangq, hanhuifeng2020, Harshvardhan, He, heleiwang, hexia, Hoai, HuangBingjian, huangdongrun, huanghui, huangxinjing, huqi, huzhifeng, hwjiaorui, Islam Amin, Jesse, , Jiabin Liu, jianghui58, jiangzhiwen, Jiaqi, jin-xiulang, jinyaohui, jjfeing, John, Jonathan, jonyguo, JulyAi, jzg, kai00, kingfo, kingxian, kpy, kswang, laiyongqiang, leonwanghui, Li, liangchenghui, liangzelang, lichen_101010, lichenever, lihongkang, lilei, limingqi107, ling, linqingke, Lin Xh, liubuyu, liuwenhao4, liuxiao78, liuxiao93, liuyang_655, liuzhongkai, Lixia, lixian, liyanliu, liyong, lizhenyu, luopengting, luoyang, lvchangquan, lvliang, lz, mahdi, Mahdi, maning202007, Margaret_wangrui, mayang, mengyuanli, Ming_blue, nhussain, ougongchang, panfengfeng, panyifeng, Payne, Peilin, peixu_ren, Pengyongrong, qianlong, qianjiahong, r1chardf1d0, riemann_penn, rmdyh, Sheng, shenwei41, simson, Simson, Su, sunsuodong, tao_yunhao, tinazhang, VectorSL, , Wan, wandongdong, wangdongxu, wangmin, wangnan39@huawei.com, wangyue01, wangzhe, wanyiming, Wei, wenchunjiang, wilfChen, WilliamLian, wsc, wudenggang, wukesong, wuweikang, wuxuejian, Xiao Tianci, Xiaoda, xiefangqi, xinyunfan, xuanyue, xulei2020, Xun, xuyongfei, yanghaitao, yanghaitao1, yanghaoran, YangLuo, yangruoqi713, yankai, yanzhenxiang2020, yao_yf, yepei6, yeyunpeng, Yi, yoni, yoonlee666, yuchaojie, yujianfeng, yuximiao, zengzitao, Zhang, zhanghaibo5@huawei.com, zhanghuiyao, zhanghui_china, zhangxinfeng3, zhangyihui, zhangz0911gm, zhanke, zhanyuan, zhaodezan, zhaojichen, zhaoting, zhaozhenlong, zhengjun10, Zhenglong Li, zhiqwang, zhoufeng, zhousiyi, zhouyaqiang, zhouyifengCode, Zichun, Zirui, Ziyan, zjun, ZPaC, wangfengwfwf, zymaa, gerayking.
 Contributions of any kind are welcome!
 # MindSpore 1.3.0
 ## MindSpore 1.3.0 Release Notes
--- a/cmake/external_libs/json.cmake
+++ b/cmake/external_libs/json.cmake
@@ -9,7 +9,7 @@ endif()
 if(ENABLE_GITEE)
    set(REQ_URL "https://gitee.com/mirrors/JSON-for-Modern-CPP/repository/archive/v3.6.1.zip")
    set(MD5 "5bda78ce308e6cfcf614dcf1d5ff27a7")
    set(MD5 "36ea0d9a709c6667b2798a62f6b197ae")
    set(INCLUDE "./include")
 else()
    set(REQ_URL "https://github.com/nlohmann/json/releases/download/v3.6.1/include.zip")
@@ -23,4 +23,4 @@ mindspore_add_pkg(nlohmann_json
        URL ${REQ_URL}
        MD5 ${MD5})
 include_directories(${nlohmann_json_INC})
 add_library(mindspore::json ALIAS nlohmann_json)
 add_library(mindspore::json ALIAS nlohmann_json)
--- a/include/api/context.h
+++ b/include/api/context.h
@@ -38,12 +38,19 @@ class Allocator;
 class Delegate;
 class DeviceInfoContext;
 /// \brief Context is used to store environment variables during execution.
 class MS_API Context {
 public:
  Context();
  ~Context() = default;
  /// \brief Set the number of threads at runtime. This option is only valid for MindSpore Lite.
  ///
  /// \param[in] thread_num the number of threads at runtime.
  void SetThreadNum(int32_t thread_num);
  /// \brief Get the current thread number setting.
  ///
  /// \return The current thread number setting.
  int32_t GetThreadNum() const;
  /// \brief Set the thread affinity to CPU cores.
@@ -60,6 +67,10 @@ class MS_API Context {
  void SetDelegate(const std::shared_ptr<Delegate> &delegate);
  std::shared_ptr<Delegate> GetDelegate() const;
  /// \brief Get a mutable reference of DeviceInfoContext vector in this context. Only MindSpore Lite supports
  /// heterogeneous scenarios with multiple members in the vector.
  ///
  /// \return Mutable reference of DeviceInfoContext vector in this context.
  std::vector<std::shared_ptr<DeviceInfoContext>> &MutableDeviceInfo();
 private:
@@ -67,14 +78,24 @@ class MS_API Context {
  std::shared_ptr<Data> data_;
 };
 /// \brief DeviceInfoContext defines different device contexts.
 class MS_API DeviceInfoContext : public std::enable_shared_from_this<DeviceInfoContext> {
 public:
  struct Data;
  DeviceInfoContext();
  virtual ~DeviceInfoContext() = default;
  /// \brief Get the type of this DeviceInfoContext.
  ///
  /// \return Type of this DeviceInfoContext.
  virtual enum DeviceType GetDeviceType() const = 0;
  /// \brief A similar function to RTTI is provided when the -fno-rtti compilation option is turned on, which converts
  /// DeviceInfoContext to a shared pointer of type T, and returns nullptr if the conversion fails.
  ///
  /// \param T Type
  /// \return A pointer of type T after conversion. If the conversion fails, it will be nullptr.
  template <class T>
  std::shared_ptr<T> Cast() {
    static_assert(std::is_base_of<DeviceInfoContext, T>::value, "Wrong cast type.");
@@ -98,27 +119,60 @@ class MS_API DeviceInfoContext : public std::enable_shared_from_this<DeviceInfoC
  std::shared_ptr<Data> data_;
 };
 /// \brief Derived from DeviceInfoContext, The configuration of the model running on the CPU. This option is only valid
 /// for MindSpore Lite.
 class MS_API CPUDeviceInfo : public DeviceInfoContext {
 public:
  /// \brief Get the type of this DeviceInfoContext.
  ///
  /// \return Type of this DeviceInfoContext.
  enum DeviceType GetDeviceType() const override { return DeviceType::kCPU; };
  /// \brief Set enables to perform the float16 inference
  ///
  /// \param[in] is_fp16 Enable float16 inference or not.
  void SetEnableFP16(bool is_fp16);
  /// \brief Get enables to perform the float16 inference
  ///
  /// \return Whether enable float16 inference.
  bool GetEnableFP16() const;
 };
 /// \brief Derived from DeviceInfoContext, The configuration of the model running on the NPU. This option is only valid
 /// for MindSpore Lite.
 class MS_API KirinNPUDeviceInfo : public DeviceInfoContext {
 public:
  /// \brief Get the type of this DeviceInfoContext.
  ///
  /// \return Type of this DeviceInfoContext.
  enum DeviceType GetDeviceType() const override { return DeviceType::kKirinNPU; };
  /// \brief Set the NPU frequency.
  ///
  /// \param[in] frequency Can be set to 1 (low power consumption), 2 (balanced), 3 (high performance), 4 (extreme
  /// performance), default as 3.
  void SetFrequency(int frequency);
  /// \brief Get the NPU frequency.
  ///
  /// \return NPU frequency
  int GetFrequency() const;
 };
 /// \brief Derived from DeviceInfoContext, The configuration of the model running on the GPU.
 class MS_API GPUDeviceInfo : public DeviceInfoContext {
 public:
  /// \brief Get the type of this DeviceInfoContext.
  ///
  /// \return Type of this DeviceInfoContext.
  enum DeviceType GetDeviceType() const override { return DeviceType::kGPU; };
  /// \brief Set device id.
  ///
  /// \param[in] device_id The device id.
  void SetDeviceID(uint32_t device_id);
  /// \brief Get the device id.
  ///
  /// \return The device id.
  uint32_t GetDeviceID() const;
  void SetGpuTrtInferMode(bool gpu_trt_infer_mode);
@@ -127,8 +181,15 @@ class MS_API GPUDeviceInfo : public DeviceInfoContext {
  inline void SetPrecisionMode(const std::string &precison_mode);
  inline std::string GetPrecisionMode() const;
  /// \brief Set enables to perform the float16 inference
  ///
  /// \param[in] is_fp16 Enable float16 inference or not.
  void SetEnableFP16(bool is_fp16);
  /// \brief Get enables to perform the float16 inference
  ///
  /// \return Whether enable float16 inference.
  bool GetEnableFP16() const;
 private:
  void SetPrecisionMode(const std::vector<char> &precision_mode);
  std::vector<char> GetPrecisionModeChar() const;
@@ -139,52 +200,113 @@ void GPUDeviceInfo::SetPrecisionMode(const std::string &precision_mode) {
 }
 std::string GPUDeviceInfo::GetPrecisionMode() const { return CharToString(GetPrecisionModeChar()); }
 /// \brief Derived from DeviceInfoContext, The configuration of the model running on the Ascend910. This option is
 /// invalid for MindSpore Lite.
 class MS_API Ascend910DeviceInfo : public DeviceInfoContext {
 public:
  /// \brief Get the type of this DeviceInfoContext.
  ///
  /// \return Type of this DeviceInfoContext.
  enum DeviceType GetDeviceType() const override { return DeviceType::kAscend910; };
  /// \brief Set device id.
  ///
  /// \param[in] device_id The device id.
  void SetDeviceID(uint32_t device_id);
  /// \brief Get the device id.
  ///
  /// \return The device id.
  uint32_t GetDeviceID() const;
 };
 /// \brief Derived from DeviceInfoContext, The configuration of the model running on the Ascend310. This option is
 /// invalid for MindSpore Lite.
 class MS_API Ascend310DeviceInfo : public DeviceInfoContext {
 public:
  /// \brief Get the type of this DeviceInfoContext.
  ///
  /// \return Type of this DeviceInfoContext.
  enum DeviceType GetDeviceType() const override { return DeviceType::kAscend310; };
  /// \brief Set device id.
  ///
  /// \param[in] device_id The device id.
  void SetDeviceID(uint32_t device_id);
  /// \brief Get the device id.
  ///
  /// \return The device id.
  uint32_t GetDeviceID() const;
  inline void SetDumpConfigPath(const std::string &cfg_path);
  inline std::string GetDumpConfigPath() const;
  // aipp config file
  /// \brief Set AIPP configuration file path.
  ///
  /// \param[in] cfg_path AIPP configuration file path.
  inline void SetInsertOpConfigPath(const std::string &cfg_path);
  /// \brief Get AIPP configuration file path.
  ///
  /// \return AIPP configuration file path.
  inline std::string GetInsertOpConfigPath() const;
  // nchw or nhwc
  /// \brief Set format of model inputs.
  ///
  /// \param[in] format Optional "NCHW", "NHWC", etc.
  inline void SetInputFormat(const std::string &format);
  /// \brief Get format of model inputs.
  ///
  /// \return The format of model inputs.
  inline std::string GetInputFormat() const;
  // Mandatory while dynamic batch: e.g. "input_op_name1: 1,2,3,4;input_op_name2: 4,3,2,1"
  /// \brief Set shape of model inputs.
  ///
  /// \param[in] shape e.g. "input_op_name1: 1,2,3,4;input_op_name2: 4,3,2,1".
  inline void SetInputShape(const std::string &shape);
  /// \brief Get shape of model inputs.
  ///
  /// \return The shape of model inputs.
  inline std::string GetInputShape() const;
  /// \brief Set shape of model inputs.
  ///
  /// \param[in] shape e.g. {{1, {1,2,3,4}}, {2, {4,3,2,1}}} means the first input shape 1,2,3,4 and the second input
  /// shape 4,3,2,1.
  void SetInputShapeMap(const std::map<int, std::vector<int>> &shape);
  /// \brief Get shape of model inputs.
  ///
  /// \return The shape of model inputs.
  std::map<int, std::vector<int>> GetInputShapeMap() const;
  void SetDynamicBatchSize(const std::vector<size_t> &dynamic_batch_size);
  inline std::string GetDynamicBatchSize() const;
  // FP32, UINT8 or FP16, default as FP32
  /// \brief Set type of model outputs.
  ///
  /// \param[in] output_type FP32, UINT8 or FP16, default as FP32.
  void SetOutputType(enum DataType output_type);
  /// \brief Get type of model outputs.
  ///
  /// \return The set type of model outputs.
  enum DataType GetOutputType() const;
  // "force_fp16", "allow_fp32_to_fp16", "must_keep_origin_dtype" or "allow_mix_precision", default as "force_fp16"
  /// \brief Set precision mode of model.
  ///
  /// \param[in] precision_mode Optional "force_fp16", "allow_fp32_to_fp16", "must_keep_origin_dtype" and
  /// "allow_mix_precision", "force_fp16" is set as default
  inline void SetPrecisionMode(const std::string &precision_mode);
  /// \brief Get precision mode of model.
  ///
  /// \return The set type of model outputs
  inline std::string GetPrecisionMode() const;
  // Optional "high_performance" and "high_precision", "high_performance" is set as default
  /// \brief Set op select implementation mode.
  ///
  /// \param[in] op_select_impl_mode Optional "high_performance" and "high_precision", "high_performance" is set as
  /// default.
  inline void SetOpSelectImplMode(const std::string &op_select_impl_mode);
  /// \brief Get op select implementation mode.
  ///
  /// \return The set op select implementation mode.
  inline std::string GetOpSelectImplMode() const;
  inline void SetFusionSwitchConfigPath(const std::string &cfg_path);
--- a/include/api/model.h
+++ b/include/api/model.h
@@ -37,32 +37,75 @@ class Metrics;
 namespace dataset {
 class Dataset;
 }  // namespace dataset
 /// \brief The Model class is used to define a MindSpore model, facilitating computational graph management.
 class MS_API Model {
 public:
  Model();
  ~Model();
  Model(const Model &) = delete;
  void operator=(const Model &) = delete;
  /// \brief Builds a model so that it can run on a device.
  ///
  /// \param[in] graph GraphCell is a derivative of Cell. Cell is not available currently. GraphCell can be constructed
  /// from Graph, for example, model.Build(GraphCell(graph), context).
  /// \param[in] model_context A context used to store options during execution.
  /// \param[in] train_cfg A config used by training.
  ///
  /// \return Status.
  Status Build(GraphCell graph, const std::shared_ptr<Context> &model_context = nullptr,
               const std::shared_ptr<TrainCfg> &train_cfg = nullptr);
  /// \brief Resizes the shapes of inputs.
  ///
  /// \param[in] inputs A vector that includes all input tensors in order.
  /// \param[in] dims Defines the new shapes of inputs, should be consistent with inputs.
  ///
  /// \return Status.
  Status Resize(const std::vector<MSTensor> &inputs, const std::vector<std::vector<int64_t>> &dims);
  /// \brief Inference model.
  ///
  /// \param[in] inputs A vector where model inputs are arranged in sequence.
  /// \param[out] outputs Which is a pointer to a vector. The model outputs are filled in the container in sequence.
  /// \param[in] before CallBack before predict.
  /// \param[in] after CallBack after predict.
  ///
  /// \return Status.
  Status Predict(const std::vector<MSTensor> &inputs, std::vector<MSTensor> *outputs,
                 const MSKernelCallBack &before = nullptr, const MSKernelCallBack &after = nullptr);
  /// \brief Obtains all input tensors of the model.
  ///
  /// \return The vector that includes all input tensors.
  std::vector<MSTensor> GetInputs();
  /// \brief Obtains the input tensor of the model by name.
  ///
  /// \return The input tensor with the given name, if the name is not found, an invalid tensor is returned.
  inline MSTensor GetInputByTensorName(const std::string &tensor_name);
  Status InitMetrics(std::vector<Metrics *> metrics);
  std::vector<Metrics *> GetMetrics();
  /// \brief Obtains all output tensors of the model.
  ///
  /// \return The vector that includes all output tensors.
  std::vector<MSTensor> GetOutputs();
  /// \brief Obtains names of all output tensors of the model.
  ///
  /// \return A vector that includes names of all output tensors.
  inline std::vector<std::string> GetOutputTensorNames();
  /// \brief Obtains the output tensor of the model by name.
  ///
  /// \return The output tensor with the given name, if the name is not found, an invalid tensor is returned.
  inline MSTensor GetOutputByTensorName(const std::string &tensor_name);
  inline std::vector<MSTensor> GetOutputsByNodeName(const std::string &tensor_name);
  /// \brief Inference model.
  ///
  /// \param[in] device_type Device type，options are kGPU, kAscend910, etc.
  /// \param[in] model_type The type of model file, options are ModelType::kMindIR, ModelType::kOM.
  ///
  /// \return Is supported or not.
  static bool CheckModelSupport(enum DeviceType device_type, ModelType model_type);
  Status SetTrainMode(bool train);
--- a/include/api/serialization.h
+++ b/include/api/serialization.h
@@ -27,13 +27,43 @@
 #include "include/api/dual_abi_helper.h"
 namespace mindspore {
 /// \brief The Serialization class is used to summarize methods for reading and writing model files.
 class MS_API Serialization {
 public:
  /// \brief Loads a model file from memory buffer.
  ///
  /// \param[in] model_data A buffer filled by model file.
  /// \param[in] data_size The size of the buffer.
  /// \param[in] model_type The Type of model file, options are ModelType::kMindIR, ModelType::kOM.
  /// \param[out] graph The output parameter, an object saves graph data.
  /// \param[in] dec_key The decryption key, key length is 16, 24, or 32.
  /// \param[in] dec_mode The decryption mode, optional options are AES-GCM, AES-CBC.
  ///
  /// \return Status.
  inline static Status Load(const void *model_data, size_t data_size, ModelType model_type, Graph *graph,
                            const Key &dec_key = {}, const std::string &dec_mode = kDecModeAesGcm);
  /// \brief Loads a model file from path, is not supported on MindSpore Lite.
  ///
  /// \param[in] file The path of model file.
  /// \param[in] model_type The Type of model file, options are ModelType::kMindIR, ModelType::kOM.
  /// \param[out] graph The output parameter, an object saves graph data.
  /// \param[in] dec_key The decryption key, key length is 16, 24, or 32.
  /// \param[in] dec_mode The decryption mode, optional options are AES-GCM, AES-CBC.
  ///
  /// \return Status.
  inline static Status Load(const std::string &file, ModelType model_type, Graph *graph, const Key &dec_key = {},
                            const std::string &dec_mode = kDecModeAesGcm);
  /// \brief Load multiple models from multiple files, MindSpore Lite does not provide this feature.
  ///
  /// \param[in] files The path of model files.
  /// \param[in] model_type The Type of model file, options are ModelType::kMindIR, ModelType::kOM.
  /// \param[out] graph The output parameter, an object saves graph data.
  /// \param[in] dec_key The decryption key, key length is 16, 24, or 32.
  /// \param[in] dec_mode The decryption mode, optional options are AES-GCM, AES-CBC.
  ///
  /// \return Status.
  inline static Status Load(const std::vector<std::string> &files, ModelType model_type, std::vector<Graph> *graphs,
                            const Key &dec_key = {}, const std::string &dec_mode = kDecModeAesGcm);
  static Status SetParameters(const std::map<std::string, Buffer> &parameters, Model *model);
--- a/include/api/types.h
+++ b/include/api/types.h
@@ -64,18 +64,64 @@ struct QuantParam {
 };
 class Allocator;
 /// \brief The MSTensor class defines a tensor in MindSpore.
 class MS_API MSTensor {
 public:
  class Impl;
  /// \brief Creates a MSTensor object, whose data need to be copied before accessed by Model, must be used in pairs
  /// with DestroyTensorPtr.
  ///
  /// \param[in] name The name of the MSTensor.
  /// \param[in] type The data type of the MSTensor.
  /// \param[in] shape The shape of the MSTensor.
  /// \param[in] data The data pointer that points to allocated memory.
  /// \param[in] data_len The length of the memory, in bytes.
  ///
  /// \return A pointer of MSTensor.
  static inline MSTensor *CreateTensor(const std::string &name, DataType type, const std::vector<int64_t> &shape,
                                       const void *data, size_t data_len) noexcept;
  /// \brief Creates a MSTensor object, whose data can be directly accessed by Model, must be used in pairs with
  /// DestroyTensorPtr.
  ///
  /// \param[in] name The name of the MSTensor.
  /// \param[in] type The data type of the MSTensor.
  /// \param[in] shape The shape of the MSTensor.
  /// \param[in] data The data pointer that points to allocated memory.
  /// \param[in] data_len The length of the memory, in bytes.
  ///
  /// \return A pointer of MSTensor.
  static inline MSTensor *CreateRefTensor(const std::string &name, DataType type, const std::vector<int64_t> &shape,
                                          const void *data, size_t data_len) noexcept;
  /// \brief Creates a MSTensor object, whose device data can be directly accessed by Model, must be used in pairs with
  /// DestroyTensorPtr.
  ///
  /// \param[in] name The name of the MSTensor.
  /// \param[in] type The data type of the MSTensor.
  /// \param[in] shape The shape of the MSTensor.
  /// \param[in] data The data pointer that points to device memory.
  /// \param[in] data_len The length of the memory, in bytes.
  ///
  /// \return A pointer of MSTensor.
  static inline MSTensor *CreateDevTensor(const std::string &name, DataType type, const std::vector<int64_t> &shape,
                                          const void *data, size_t data_len) noexcept;
  /// \brief Create a string type MSTensor object whose data can be accessed by Model only after being copied, must be
  /// used in pair with DestroyTensorPtr.
  ///
  /// \param[in] name The name of the MSTensor.
  /// \param[in] str A vector container containing several strings.
  ///
  /// \return A pointer of MSTensor.
  static inline MSTensor *StringsToTensor(const std::string &name, const std::vector<std::string> &str);
  /// \brief Parse the string type MSTensor object into strings.
  ///
  /// \param[in] tensor A MSTensor object.
  ///
  /// \return A vector container containing several strings.
  static inline std::vector<std::string> TensorToStrings(const MSTensor &tensor);
  /// \brief Destroy an object created by Clone, StringsToTensor, CreateRefTensor, CreateDevTensor or CreateTensor. Do
  /// not use it to destroy MSTensor from other sources.
  ///
  /// \param[in] tensor A MSTensor object.
  static void DestroyTensorPtr(MSTensor *tensor) noexcept;
  MSTensor();
@@ -85,19 +131,51 @@ class MS_API MSTensor {
  explicit MSTensor(std::nullptr_t);
  ~MSTensor();
  /// \brief Obtains the name of the MSTensor.
  ///
  /// \return The name of the MSTensor.
  inline std::string Name() const;
  /// \brief Obtains the data type of the MSTensor.
  ///
  /// \return The data type of the MSTensor.
  enum DataType DataType() const;
  /// \brief Obtains the shape of the MSTensor.
  ///
  /// \return The shape of the MSTensor.
  const std::vector<int64_t> &Shape() const;
  /// \brief Obtains the number of elements of the MSTensor.
  ///
  /// \return The number of elements of the MSTensor.
  int64_t ElementNum() const;
  /// \brief Obtains a shared pointer to the copy of data of the MSTensor. The data can be read on host.
  ///
  /// \return A shared pointer to the copy of data of the MSTensor.
  std::shared_ptr<const void> Data() const;
  /// \brief Obtains the pointer to the data of the MSTensor. If the MSTensor is a device tensor, the data cannot be
  /// accessed directly on host.
  ///
  /// \return A pointer to the data of the MSTensor.
  void *MutableData();
  /// \brief Obtains the length of the data of the MSTensor, in bytes.
  ///
  /// \return The length of the data of the MSTensor, in bytes.
  size_t DataSize() const;
  /// \brief Gets the boolean value that indicates whether the memory of MSTensor is on device.
  ///
  /// \return The boolean value that indicates whether the memory of MSTensor is on device.
  bool IsDevice() const;
  /// \brief Gets a deep copy of the MSTensor, must be used in pair with DestroyTensorPtr.
  ///
  /// \return A pointer points to a deep copy of the MSTensor.
  MSTensor *Clone() const;
  /// \brief Gets the boolean value that indicates whether the MSTensor is valid.
  ///
  /// \return The boolean value that indicates whether the MSTensor is valid.
  bool operator==(std::nullptr_t) const;
  /// \brief Gets the boolean value that indicates whether the MSTensor is valid.
  ///
  /// \return The boolean value that indicates whether the MSTensor is valid.
  bool operator!=(std::nullptr_t) const;
  bool operator==(const MSTensor &tensor) const;
--- a/mindspore/ccsrc/backend/kernel_compiler/cpu/l2loss_cpu_kernel.cc
+++ b/mindspore/ccsrc/backend/kernel_compiler/cpu/l2loss_cpu_kernel.cc
@@ -0,0 +1,56 @@
 /**
 * Copyright 2021 Huawei Technologies Co., Ltd
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 * http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */
 #include "backend/kernel_compiler/cpu/l2loss_cpu_kernel.h"
 #include "runtime/device/cpu/cpu_device_address.h"
 namespace mindspore {
 namespace kernel {
 template <typename T>
 void L2LossCPUKernel<T>::InitKernel(const CNodePtr &kernel_node) {
  CheckParam(kernel_node);
  std::vector<size_t> x_shape = AnfAlgo::GetPrevNodeOutputInferShape(kernel_node, 0);
  for (const size_t &d : x_shape) {
    tensor_size_ *= d;
  }
 }
 template <typename T>
 bool L2LossCPUKernel<T>::Launch(const std::vector<kernel::AddressPtr> &inputs, const std::vector<kernel::AddressPtr> &,
                                const std::vector<kernel::AddressPtr> &outputs) {
  auto input_addr = reinterpret_cast<T *>(inputs[0]->addr);
  auto result_addr = reinterpret_cast<T *>(outputs[0]->addr);
  *result_addr = (T)0;
  for (size_t i = 0; i < tensor_size_; i++) {
    *result_addr += input_addr[i] * input_addr[i];
  }
  *result_addr = *result_addr / 2;
  return true;
 }
 template <typename T>
 void L2LossCPUKernel<T>::CheckParam(const CNodePtr &kernel_node) {
  size_t input_num = AnfAlgo::GetInputTensorNum(kernel_node);
  if (input_num != 1) {
    MS_LOG(EXCEPTION) << "Input number is " << input_num << ", but L2LossCPUKernel needs 1 input.";
  }
  size_t output_num = AnfAlgo::GetOutputTensorNum(kernel_node);
  if (output_num != 1) {
    MS_LOG(EXCEPTION) << "Output number is " << output_num << ", but L2LossCPUKernel needs 1 output.";
  }
 }
 }  // namespace kernel
 }  // namespace mindspore
--- a/mindspore/ccsrc/backend/kernel_compiler/cpu/l2loss_cpu_kernel.h
+++ b/mindspore/ccsrc/backend/kernel_compiler/cpu/l2loss_cpu_kernel.h
@@ -0,0 +1,47 @@
 /**
 * Copyright 2021 Huawei Technologies Co., Ltd
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 * http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */
 #ifndef MINDSPORE_CCSRC_BACKEND_KERNEL_COMPILER_CPU_L2_LOSS_CPU_KERNEL_H_
 #define MINDSPORE_CCSRC_BACKEND_KERNEL_COMPILER_CPU_L2_LOSS_CPU_KERNEL_H_
 #include <memory>
 #include <unordered_map>
 #include <vector>
 #include "backend/kernel_compiler/cpu/cpu_kernel.h"
 #include "backend/kernel_compiler/cpu/cpu_kernel_factory.h"
 namespace mindspore {
 namespace kernel {
 template <typename T>
 class L2LossCPUKernel : public CPUKernel {
 public:
  L2LossCPUKernel() = default;
  ~L2LossCPUKernel() override = default;
  void InitKernel(const CNodePtr &kernel_node) override;
  bool Launch(const std::vector<AddressPtr> &inputs, const std::vector<AddressPtr> &workspace,
              const std::vector<AddressPtr> &outputs) override;
 private:
  void CheckParam(const CNodePtr &kernel_node);
  size_t tensor_size_{1};
 };
 MS_REG_CPU_KERNEL_T(L2Loss, KernelAttr(), L2LossCPUKernel, float16);
 MS_REG_CPU_KERNEL_T(L2Loss, KernelAttr(), L2LossCPUKernel, float);
 }  // namespace kernel
 }  // namespace mindspore
 #endif  // MINDSPORE_CCSRC_BACKEND_KERNEL_COMPILER_CPU_L2_LOSS_CPU_KERNEL_H_
--- a/mindspore/ccsrc/backend/kernel_compiler/cpu/nnacl/common_func.c
+++ b/mindspore/ccsrc/backend/kernel_compiler/cpu/nnacl/common_func.c
@@ -16,15 +16,19 @@
 #include "nnacl/common_func.h"
 int offset(const int *shape, const int dim0, const int dim1, const int dim2, const int dim3) {
 int Offset(const int *shape, const int dim0, const int dim1, const int dim2, const int dim3) {
  return ((dim0 * shape[1] + dim1) * shape[2] + dim2) * shape[3] + dim3;
 }
 int offsetComm(const int *shape, const int dim0, const int dim1, const int dim2) {
 int OffsetComm(const int *shape, const int dim0, const int dim1, const int dim2) {
  return ((dim0 * shape[1] + dim1) * shape[2] + dim2) * shape[3];
 }
 int offset4d(const int *shape, const int *dims) { return offset(shape, dims[0], dims[1], dims[2], dims[3]); }
 int Offset4d(const int *shape, const int *dims) { return Offset(shape, dims[0], dims[1], dims[2], dims[3]); }
 int Offset6d(const int *shape, const int *dims) {
  return ((OffsetComm(shape, dims[0], dims[1], dims[2]) + dims[3]) * shape[4] + dims[4]) * shape[5];
 }
 int8_t MinInt8(int8_t a, int8_t b) { return b ^ ((a ^ b) & -(a < b)); }
--- a/mindspore/ccsrc/backend/kernel_compiler/cpu/nnacl/common_func.h
+++ b/mindspore/ccsrc/backend/kernel_compiler/cpu/nnacl/common_func.h
@@ -36,9 +36,10 @@ void ReluFp32C8(float *data, float *dst, int ele_num);
 void Relu6Fp32C8(float *data, float *dst, int ele_num);
 #endif
 #endif
 int offset(const int *shape, const int dim0, const int dim1, const int dim2, const int dim3);
 int offsetComm(const int *shape, const int dim0, const int dim1, const int dim2);
 int offset4d(const int *shape, const int *dims);
 int Offset(const int *shape, const int dim0, const int dim1, const int dim2, const int dim3);
 int OffsetComm(const int *shape, const int dim0, const int dim1, const int dim2);
 int Offset4d(const int *shape, const int *dims);
 int Offset6d(const int *shape, const int *dims);
 static inline bool isAddOverflow(int32_t x, int32_t y) {
  int32_t sum = x + y;
--- a/mindspore/ccsrc/backend/kernel_compiler/cpu/nnacl/fp16/pad_fp16.c
+++ b/mindspore/ccsrc/backend/kernel_compiler/cpu/nnacl/fp16/pad_fp16.c
@@ -19,16 +19,22 @@
 void PadFp16(const float16_t *input_data, float16_t *output_data, const int *input_shape, const int *output_shape,
             const int *paddings, const int tid, const int thread_num) {
  int in[4], out[4];
  int in[DEFAULT_PAD_NDIMS], out[DEFAULT_PAD_NDIMS];
  for (in[0] = 0; in[0] < input_shape[0]; in[0]++) {
    out[0] = in[0] + paddings[0];
    for (in[1] = tid; in[1] < input_shape[1]; in[1] += thread_num) {
      out[1] = in[1] + paddings[2];
      for (in[2] = 0; in[2] < input_shape[2]; in[2]++) {
        out[2] = in[2] + paddings[4];
        float16_t *dst = output_data + offset(output_shape, out[0], out[1], out[2], paddings[6]);
        const float16_t *src = input_data + offset(input_shape, in[0], in[1], in[2], 0);
        memcpy(dst, src, input_shape[3] * sizeof(float16_t));
        for (in[3] = 0; in[3] < input_shape[3]; in[3]++) {
          out[3] = in[3] + paddings[6];
          for (in[4] = 0; in[4] < input_shape[4]; in[4]++) {
            out[4] = in[4] + paddings[8];
            float16_t *dst = output_data + Offset6d(output_shape, out) + paddings[10];
            const float16_t *src = input_data + Offset6d(input_shape, in);
            memcpy(dst, src, input_shape[5] * sizeof(float16_t));
          }
        }
      }
    }
  }
--- a/mindspore/ccsrc/backend/kernel_compiler/cpu/nnacl/fp32/pad_fp32.c
+++ b/mindspore/ccsrc/backend/kernel_compiler/cpu/nnacl/fp32/pad_fp32.c
@@ -23,16 +23,22 @@ void Pad(const float *input_data, float *output_data, const int *input_shape, co
  if (thread_num == 0) {
    return;
  }
  int in[4], out[4];
  int in[DEFAULT_PAD_NDIMS], out[DEFAULT_PAD_NDIMS];
  for (in[0] = 0; in[0] < input_shape[0]; in[0]++) {
    out[0] = in[0] + paddings[0];
    for (in[1] = tid; in[1] < input_shape[1]; in[1] += thread_num) {
      out[1] = in[1] + paddings[2];
      for (in[2] = 0; in[2] < input_shape[2]; in[2]++) {
        out[2] = in[2] + paddings[4];
        float *dst = output_data + offset(output_shape, out[0], out[1], out[2], paddings[6]);
        const float *src = input_data + offset(input_shape, in[0], in[1], in[2], 0);
        memcpy(dst, src, input_shape[3] * sizeof(float));
        for (in[3] = 0; in[3] < input_shape[3]; in[3]++) {
          out[3] = in[3] + paddings[6];
          for (in[4] = 0; in[4] < input_shape[4]; in[4]++) {
            out[4] = in[4] + paddings[8];
            float *dst = output_data + Offset6d(output_shape, out) + paddings[10];
            const float *src = input_data + Offset6d(input_shape, in);
            memcpy(dst, src, input_shape[5] * sizeof(float));
          }
        }
      }
    }
  }
@@ -57,8 +63,7 @@ int TransOut2InputDimIndex(int out_dim_index, int left_pad, int in_dim, int offs
 int GetInputFlattenIndex(int out_flatten_index, const int *input_shape, const PadParameter *pad_param) {
  int in_flatten_index = 0;
  int i;
  for (i = 0; i < COMM_SHAPE_SIZE; ++i) {
  for (int i = 0; i < DEFAULT_PAD_NDIMS; ++i) {
    int left_pad = pad_param->paddings_[i * 2];
    NNACL_CHECK_ZERO_RETURN_ERR(pad_param->out_strides[i])
    int out_dim_index = out_flatten_index / pad_param->out_strides[i];
--- a/mindspore/ccsrc/backend/kernel_compiler/cpu/nnacl/fp32/resize_fp32.c
+++ b/mindspore/ccsrc/backend/kernel_compiler/cpu/nnacl/fp32/resize_fp32.c
@@ -510,8 +510,8 @@ int ResizeNearestNeighbor(const float *input_data, float *output_data, const int
        } else {
          input_x = (int)(floorf(actual_x));
        }
        int in_offset = offset(input_shape, batch, input_y, input_x, 0);
        int out_offset = offset(output_shape, batch, y, x, 0);
        int in_offset = Offset(input_shape, batch, input_y, input_x, 0);
        int out_offset = Offset(output_shape, batch, y, x, 0);
        memcpy(output_data + out_offset, input_data + in_offset, c * sizeof(float));
      }
    }
--- a/mindspore/ccsrc/backend/kernel_compiler/cpu/nnacl/infer/pad_infer.c
+++ b/mindspore/ccsrc/backend/kernel_compiler/cpu/nnacl/infer/pad_infer.c
@@ -32,7 +32,7 @@ int PadInferShape(const TensorC *const *inputs, size_t inputs_size, TensorC **ou
    return NNACL_INFER_INVALID;
  }
  if (input->shape_size_ > 4) {
  if (input->shape_size_ > DEFAULT_PAD_NDIMS) {
    return NNACL_INPUT_TENSOR_ERROR;
  }
  const TensorC *paddings = inputs[1];
@@ -48,7 +48,7 @@ int PadInferShape(const TensorC *const *inputs, size_t inputs_size, TensorC **ou
    param->paddings_[i] = ((int *)paddings->data_)[i];
  }
  int output_shape[MAX_SHAPE_SIZE] = {0};
  int output_shape[DEFAULT_PAD_NDIMS] = {0};
  size_t output_shape_size = 0;
  for (size_t i = 0; i < input->shape_size_; i++) {
    int shape = input->shape_[i] + param->paddings_[2 * i] + param->paddings_[2 * i + 1];
--- a/mindspore/ccsrc/backend/kernel_compiler/cpu/nnacl/int8/pad_int8.c
+++ b/mindspore/ccsrc/backend/kernel_compiler/cpu/nnacl/int8/pad_int8.c
@@ -24,8 +24,8 @@ int PadConstant4D(const int8_t *in_data, int8_t *out_data, const int32_t *in_dim
  for (int n = 0; n < in_dims[0]; n++) {
    for (int h = tid; h < in_dims[1]; h += thread_num) {
      for (int w = 0; w < in_dims[2]; w++) {
        const int8_t *in = in_data + offset(in_dims, n, h, w, 0);
        int8_t *out = out_data + offset(out_dims, n + paddings[0], h + paddings[2], w + paddings[4], paddings[6]);
        const int8_t *in = in_data + Offset(in_dims, n, h, w, 0);
        int8_t *out = out_data + Offset(out_dims, n + paddings[0], h + paddings[2], w + paddings[4], paddings[6]);
        memcpy(out, in, copy_size * sizeof(int8_t));
      }
    }
--- a/mindspore/ccsrc/backend/kernel_compiler/cpu/nnacl/int8/resize_int8.c
+++ b/mindspore/ccsrc/backend/kernel_compiler/cpu/nnacl/int8/resize_int8.c
@@ -173,8 +173,8 @@ int ResizeNearestNeighborInt8Simple(const int8_t *input_data, int8_t *output_dat
      for (x = 0; x < output_shape[2]; x++) {
        int input_x = 0;
        ComputeNearestNeighborInt(x, in_w, new_width, align_corners, &input_x);
        int in_offset = offset(input_shape, batch, input_y, input_x, 0);
        int out_offset = offset(output_shape, batch, y, x, 0);
        int in_offset = Offset(input_shape, batch, input_y, input_x, 0);
        int out_offset = Offset(output_shape, batch, y, x, 0);
        memcpy(output_data + out_offset, input_data + in_offset, c * sizeof(int8_t));
      }
    }
@@ -214,8 +214,8 @@ int ResizeNearestNeighborInt8(const int8_t *input_data, int8_t *output_data, con
        int input_x = 0;
        ComputeNearestNeighborInt(x, in_w, new_width, align_corners, &input_x);
        for (c = 0; c < output_shape[3]; c++) {
          int in_offset = offset(input_shape, batch, input_y, input_x, c);
          int out_offset = offset(output_shape, batch, y, x, c);
          int in_offset = Offset(input_shape, batch, input_y, input_x, c);
          int out_offset = Offset(output_shape, batch, y, x, c);
          int32_t out_value = MultiplyByQuantizedMultiplier(
                                input_data[in_offset] - quant_in->zp_, multiplier->multiplier_,
--- a/mindspore/ccsrc/backend/kernel_compiler/cpu/nnacl/pad_parameter.h
+++ b/mindspore/ccsrc/backend/kernel_compiler/cpu/nnacl/pad_parameter.h
@@ -18,8 +18,8 @@
 #include "nnacl/op_base.h"
 #define MAX_PAD_SIZE 8
 #define DEFAULT_PAD_NDIMS 4
 #define MAX_PAD_SIZE 12
 #define DEFAULT_PAD_NDIMS 6
 typedef struct PadQuantArg {
  QuantArg *in_quant_args_;
@@ -30,13 +30,13 @@ typedef struct PadQuantArg {
 typedef struct PadParameter {
  // Primitive parameter
  OpParameter op_parameter_;
  int paddings_[MAX_SHAPE_SIZE];
  int paddings_[MAX_PAD_SIZE];
  int pad_mode_;
  float constant_value_;
  // shape correlative
  int padding_length;
  // other parameter
  int in_strides[COMM_SHAPE_SIZE];
  int in_strides[DEFAULT_PAD_NDIMS];
  int out_strides[DEFAULT_PAD_NDIMS];
  int mirror_offset_;
  PadQuantArg pad_quant_arg_;
--- a/mindspore/ccsrc/backend/kernel_compiler/hccl/hccl_kernel.cc
+++ b/mindspore/ccsrc/backend/kernel_compiler/hccl/hccl_kernel.cc
@@ -197,7 +197,8 @@ const std::vector<size_t> &HcclKernel::GetWorkspaceSizeList() const {
  MS_EXCEPTION_IF_NULL(context_ptr);
  bool is_task_sink = context_ptr->get_param<bool>(MS_CTX_ENABLE_TASK_SINK);
  auto mode = context_ptr->get_param<int>(MS_CTX_EXECUTION_MODE);
  if (!workspace_size_list_.empty() || hccl_data_type_list_.empty() || (!is_task_sink && mode == kGraphMode)) {
  if (!workspace_size_list_.empty() || hccl_data_type_list_.empty() || (!is_task_sink && mode == kGraphMode) ||
      mode == kPynativeMode) {
    return workspace_size_list_;
  }
  workspace_size_list_.emplace_back(
--- a/mindspore/ccsrc/backend/optimizer/ascend/buffer_fusion/ub_pattern_fusion.cc
+++ b/mindspore/ccsrc/backend/optimizer/ascend/buffer_fusion/ub_pattern_fusion.cc
@@ -70,11 +70,11 @@ CNodePtr CreateFusionOp(const std::vector<AnfNodePtr> &inputs_list, const std::v
  MS_EXCEPTION_IF_NULL(fusion_op);
  std::vector<std::string> input_names;
  for (uint8_t i = 0; i < inputs_list.size(); i++) {
  for (size_t i = 0; i < inputs_list.size(); i++) {
    (void)input_names.emplace_back("input" + std::to_string(i));
  }
  std::vector<std::string> output_names;
  for (uint8_t i = 0; i < outputs_list.size(); i++) {
  for (size_t i = 0; i < outputs_list.size(); i++) {
    (void)output_names.emplace_back("output" + std::to_string(i));
  }
--- a/mindspore/ccsrc/backend/optimizer/ascend/enhancer/insert_tensor_move_for_hccl_op.cc
+++ b/mindspore/ccsrc/backend/optimizer/ascend/enhancer/insert_tensor_move_for_hccl_op.cc
@@ -27,7 +27,7 @@ namespace opt {
 namespace {
 // insert tensormove for some cnode even if not a Ref cnode
 const std::set<std::string> kNeedInsertTensorMoveOpSet = {kLambNextMVOpName, kLambNextMVWithDecayOpName,
                                                          kLambUpdateWithLROpName};
                                                          kLambUpdateWithLROpName, kGetNextOpName};
 bool IsParameterOrValueNode(const AnfNodePtr &node) {
  MS_EXCEPTION_IF_NULL(node);
@@ -86,9 +86,10 @@ bool InsertTensorMoveForHcclOp::NeedInsertTensorMove(const FuncGraphPtr &graph,
    if (kernel_query_->IsTbeRef(input)) {
      return true;
    }
    auto kernel_with_index = AnfAlgo::VisitKernelWithReturnType(input, 0, true);
    auto real_node = kernel_with_index.first;
    // when input is some special cnodes
    if (kNeedInsertTensorMoveOpSet.find(AnfAlgo::GetCNodeName(input)) != kNeedInsertTensorMoveOpSet.end()) {
    if (kNeedInsertTensorMoveOpSet.find(AnfAlgo::GetCNodeName(real_node)) != kNeedInsertTensorMoveOpSet.end()) {
      return true;
    }
--- a/mindspore/ccsrc/backend/optimizer/ascend/format_type/change_axis_of_reduce_kernel.cc
+++ b/mindspore/ccsrc/backend/optimizer/ascend/format_type/change_axis_of_reduce_kernel.cc
@@ -53,6 +53,15 @@ void SafeCheckFunction(const CNodePtr &cnode, const std::vector<int64_t> &reduce
  }
 }
 void DynamicAttrUpdate(const AnfNodePtr &node) {
  MS_EXCEPTION_IF_NULL(node);
  auto primitive = AnfAlgo::GetCNodePrimitive(node);
  MS_EXCEPTION_IF_NULL(primitive);
  auto axis_attr = primitive->GetAttr(kAttrAxis);
  AnfAlgo::SetNodeAttr(kAttrAxes, axis_attr, node);
  AnfAlgo::EraseNodeAttr(kAttrAxis, node);
 }
 void ConvertReduceAttrFraczAnd6HD(const CNodePtr &cnode) {
  auto axis = kernel::GetReduceAttrAxis(cnode);
  std::vector<int64_t> convert_axis;
@@ -95,9 +104,15 @@ const AnfNodePtr ChangeAxisOfReduceKernel::Process(const FuncGraphPtr &, const A
  }
  auto convert_map = kReduceConvertMap.find(AnfAlgo::GetInputFormat(node, 0));
  if (convert_map == kReduceConvertMap.end()) {
    if (AnfAlgo::IsDynamicShape(node)) {
      DynamicAttrUpdate(node);
    }
    return nullptr;
  }
  convert_map->second(node->cast<CNodePtr>());
  if (AnfAlgo::IsDynamicShape(node)) {
    DynamicAttrUpdate(node);
  }
  return nullptr;
 }
 }  // namespace opt
--- a/mindspore/ccsrc/backend/optimizer/ascend/ir_fusion/batchnorm_to_bninfer.cc
+++ b/mindspore/ccsrc/backend/optimizer/ascend/ir_fusion/batchnorm_to_bninfer.cc
@@ -122,7 +122,7 @@ const AnfNodePtr BatchNorm2BNInfer::Process(const FuncGraphPtr &graph, const Anf
    return nullptr;
  }
  auto bn_infer = CreateBNInfer(graph, batchnorm, node);
  TransferDepend(batchnorm, graph, bn_infer);
  TransferDependOrUpdateState(batchnorm, graph, bn_infer);
  return bn_infer;
 }
 }  // namespace opt
--- a/mindspore/ccsrc/backend/optimizer/ascend/ir_fusion/batchnormgrad_to_bninfergrad.cc
+++ b/mindspore/ccsrc/backend/optimizer/ascend/ir_fusion/batchnormgrad_to_bninfergrad.cc
@@ -125,7 +125,7 @@ const AnfNodePtr BatchNormGrad2BNInferGrad::Process(const FuncGraphPtr &graph, c
    return nullptr;
  }
  auto bn_infer_grad = CreateBNInferGrad(graph, batchnorm_grad, node);
  TransferDepend(batchnorm_grad, graph, bn_infer_grad);
  TransferDependOrUpdateState(batchnorm_grad, graph, bn_infer_grad);
  return bn_infer_grad;
 }
 }  // namespace opt
--- a/mindspore/ccsrc/backend/optimizer/common/helper.cc
+++ b/mindspore/ccsrc/backend/optimizer/common/helper.cc
@@ -915,21 +915,34 @@ ValueNodePtr MakeValueNode(const ValueNodePtr &value_node) {
  return new_value_node;
 }
 void TransferDepend(const CNodePtr &old_node, const FuncGraphPtr &graph, const CNodePtr &new_node) {
 void TransferDependOrUpdateState(const CNodePtr &old_node, const FuncGraphPtr &graph, const CNodePtr &new_node) {
  MS_EXCEPTION_IF_NULL(old_node);
  MS_EXCEPTION_IF_NULL(graph);
  auto manager = graph->manager();
  MS_EXCEPTION_IF_NULL(manager);
  // Find BatchNorm's output which is a Depend or UpdateState.
  for (const auto &node_index : manager->node_users()[old_node]) {
  auto node_users = manager->node_users()[old_node];
  for (const auto &node_index : node_users) {
    AnfNodePtr output = node_index.first;
    size_t index = IntToSize(node_index.second);
    MS_EXCEPTION_IF_NULL(output);
    if (AnfAlgo::CheckPrimitiveType(output, prim::kPrimDepend) ||
        AnfAlgo::CheckPrimitiveType(output, prim::kPrimUpdateState)) {
      auto depend = output->cast<CNodePtr>();
      MS_EXCEPTION_IF_NULL(depend);
      depend->set_input(index, new_node);
      auto output_cnode = output->cast<CNodePtr>();
      MS_EXCEPTION_IF_NULL(output_cnode);
      auto inputs = output_cnode->inputs();
      std::vector<AnfNodePtr> new_inputs{output_cnode->input(0)};
      for (size_t i = 1; i < inputs.size(); i++) {
        auto input = inputs[i];
        if (input == old_node) {
          new_inputs.emplace_back(new_node);
        } else {
          new_inputs.emplace_back(input);
        }
      }
      auto new_output = graph->NewCNode(new_inputs);
      new_output->set_abstract(output->abstract());
      new_output->set_scope(output->scope());
      manager->Replace(output, new_output);
    }
  }
 }
--- a/mindspore/ccsrc/backend/optimizer/common/helper.h
+++ b/mindspore/ccsrc/backend/optimizer/common/helper.h
@@ -213,8 +213,8 @@ bool CheckSupportDataType(const AnfNodePtr &node, const std::set<TypeId> &suppor
 // Create a new value node of func graph,not kernel graph
 ValueNodePtr MakeValueNode(const ValueNodePtr &value_node);
 // Transfer depend to the new node
 void TransferDepend(const CNodePtr &old_node, const FuncGraphPtr &graph, const CNodePtr &new_node);
 // Transfer depend or updatestate to the new node
 void TransferDependOrUpdateState(const CNodePtr &old_node, const FuncGraphPtr &graph, const CNodePtr &new_node);
 AbstractBasePtr CppInferShape(const PrimitivePtr &prim, const AbstractBasePtrList &args_spec_list);
--- a/mindspore/ccsrc/backend/session/ascend_session.cc
+++ b/mindspore/ccsrc/backend/session/ascend_session.cc
@@ -381,7 +381,7 @@ void AscendSession::LoadInputData(const std::shared_ptr<KernelGraph> &kernel_gra
        MS_LOG(EXCEPTION) << "SyncHostToDevice failed.";
      }
      if (ms_context->get_param<int>(MS_CTX_EXECUTION_MODE) == kPynativeMode ||
          AnfAlgo::IsParameterWeight(input_param)) {
          AnfAlgo::IsParameterWeight(input_param) || kernel_graph->IsUpdatedParameter(input_param)) {
        tensor->set_device_address(device_address);
      }
      if (kernel_graph->IsUpdatedParameter(input_param)) {
@@ -523,30 +523,14 @@ void AscendSession::BuildGraphImpl(GraphId graph_id) {
  InitRuntimeResource();
  // multiple graph handle
  if (graph_id == final_graph_id_) {
    if (!graph->executable()) {
      return;
    }
    SetFinalGraphSummaryFlag(graph);
    // OptChildGraphs
    auto graph_order = GetGraphOrder(final_graph_id_);
    auto &graph_type = GetGraphOrderType(final_graph_id_);
    for (size_t i = 0; i < graph_order.size(); i++) {
      if (!(graph_type[i] == BRANCH_END || graph_type[i] == BRANCH_START)) {
        auto child_graph = GetGraph(graph_order[i]);
        CompileChildGraph(child_graph);
      }
    }
    SetSummaryNodes(graph.get());
    // merge child graph
    MergeGraphExecOrder();
  } else {
    auto single_graph = GetGraph(graph_id);
    MS_EXCEPTION_IF_NULL(single_graph);
    CompileChildGraph(single_graph);
    // set the distinction label of single graph
    single_graph->set_stream_distinction_label(graph_id);
    single_graph->UpdateExecuteKernelStreamLabel();
  }
    MS_LOG(EXCEPTION) << "Unexpected graph id:" << graph_id << ", final_graph_id_:" << final_graph_id_;
  }
  auto single_graph = GetGraph(graph_id);
  MS_EXCEPTION_IF_NULL(single_graph);
  CompileChildGraph(single_graph);
  // set the distinction label of single graph
  single_graph->set_stream_distinction_label(graph_id);
  single_graph->UpdateExecuteKernelStreamLabel();
  // adjust execution order because  merge child graph and other special operations
  AdjustKernel(graph);
 #if ENABLE_CPU && ENABLE_D
@@ -863,9 +847,10 @@ void AscendSession::InitRuntimeResource() {
  if (!runtime_instance->Init()) {
    MS_LOG(EXCEPTION) << "Kernel runtime init error.";
  }
  auto env_table_file = common::GetEnv("RANK_TABLE_FILE");
  auto ms_context = MsContext::GetInstance();
  MS_EXCEPTION_IF_NULL(ms_context);
  auto env_rank_id = common::GetEnv("RANK_ID");
  if (!(env_table_file.empty() || env_rank_id.empty())) {
  if (ms_context->get_param<bool>(MS_CTX_ENABLE_HCCL) && !env_rank_id.empty()) {
    // get actual rank id if it's distribution training case.
    rank_id_ = GetRankId();
  }
--- a/mindspore/ccsrc/backend/session/gpu_session.cc
+++ b/mindspore/ccsrc/backend/session/gpu_session.cc
@@ -114,12 +114,12 @@ void GPUSession::Init(uint32_t device_id) {
  MS_EXCEPTION_IF_NULL(ms_context);
  ms_context->set_param<uint32_t>(MS_CTX_DEVICE_ID, device_id);
  if (collective_inited) {
    rank_id_ = GetRankId();
    if (collective_handle_ != nullptr) {
      auto init_nccl_comm_funcptr =
        reinterpret_cast<InitNCCLComm>(dlsym(const_cast<void *>(collective_handle_), "InitNCCLComm"));
      MS_EXCEPTION_IF_NULL(init_nccl_comm_funcptr);
      (*init_nccl_comm_funcptr)();
      rank_id_ = GetRankId();
    }
  }
--- a/mindspore/ccsrc/backend/session/kernel_graph.cc
+++ b/mindspore/ccsrc/backend/session/kernel_graph.cc
@@ -470,7 +470,7 @@ void KernelGraph::CreateKernelInfoFromNewParameter(const CNodePtr &cnode) {
  }
 }
 void KernelGraph::ResetAssignInputFeaatureMapFlag(const CNodePtr &cnode) const {
 void KernelGraph::ResetAssignInputFeatureMapFlag(const CNodePtr &cnode) const {
  if (kOpAssignKernelNameList.find(AnfAlgo::GetCNodeName(cnode)) == kOpAssignKernelNameList.end()) {
    MS_LOG(EXCEPTION) << "Only supported to change the node [Assign , AssignSub, AssignAdd] node's input feature map "
                         "flag but got the node :"
@@ -493,7 +493,7 @@ void KernelGraph::SetKernelInfoForNode(const AnfNodePtr &node) const {
  node->set_kernel_info(kernel_info);
  if (node->isa<CNode>()) {
    if (kOpAssignKernelNameList.find(AnfAlgo::GetCNodeName(node)) != kOpAssignKernelNameList.end()) {
      ResetAssignInputFeaatureMapFlag(node->cast<CNodePtr>());
      ResetAssignInputFeatureMapFlag(node->cast<CNodePtr>());
    }
 #if defined(__APPLE__)
    std::vector<int> feature_map_input_indexs;
@@ -1362,7 +1362,9 @@ void KernelGraph::SetOptimizerFlag() {
        continue;
      }
      auto param = real_node->cast<ParameterPtr>();
      if (AnfAlgo::IsParameterWeight(param)) {
      auto abstract = param->abstract();
      MS_EXCEPTION_IF_NULL(abstract);
      if (abstract->isa<abstract::AbstractRef>()) {
        has_optimizer_ = true;
        (void)updated_parameters_.insert(param);
      }
--- a/mindspore/ccsrc/backend/session/kernel_graph.h
+++ b/mindspore/ccsrc/backend/session/kernel_graph.h
@@ -111,7 +111,7 @@ class KernelGraph : public FuncGraph {
  CNodePtr NewCNodeWithInfos(const std::vector<AnfNodePtr> &inputs, const CNodePtr &ori_cnode = nullptr);
  void CreateKernelInfoFromNewParameter(const CNodePtr &cnode);
  CNodePtr NewCNode(const CNodePtr &cnode);
  void ResetAssignInputFeaatureMapFlag(const CNodePtr &cnode) const;
  void ResetAssignInputFeatureMapFlag(const CNodePtr &cnode) const;
  ParameterPtr NewParameter(const ParameterPtr &parameter = nullptr);
  ParameterPtr NewParameter(const abstract::AbstractBasePtr &abstract);
  ValueNodePtr NewValueNode(const AbstractBasePtr &abstract, const ValuePtr &value);
--- a/mindspore/ccsrc/backend/session/session_basic.cc
+++ b/mindspore/ccsrc/backend/session/session_basic.cc
@@ -446,6 +446,45 @@ void UpdateGraphAquireGilAttr(const NotNull<KernelGraphPtr> &root_graph) {
  }
  return;
 }
 void SetReturnNode(const AnfNodePtr &node, KernelGraph *graph) {
  MS_EXCEPTION_IF_NULL(graph);
  MS_EXCEPTION_IF_NULL(node);
  if (AnfAlgo::CheckPrimitiveType(node, prim::kPrimReturn)) {
    constexpr auto kReturnInputIdx = 1;
    auto return_node = node->cast<CNodePtr>();
    graph->set_return(return_node);
    auto graph_output = return_node->input(kReturnInputIdx);
    MS_EXCEPTION_IF_NULL(graph_output);
    // If return's input is value node, then the graph has no kernel, and the pass 'trans tuple to make_tuple' cannot
    // match this pattern because that pass begin with output node but return node. So we add transform value tuple
    // to make_tuple here.
    if (AnfAlgo::IsTupleOutput(graph_output) && graph_output->isa<ValueNode>()) {
      return_node->set_input(kReturnInputIdx, graph->TransTupleToMakeTuple(graph_output));
    }
  }
 }
 bool NoPartialInPartialGraph(const AnfNodePtr &partial_node) {
  MS_EXCEPTION_IF_NULL(partial_node);
  auto partial_cnode = partial_node->cast<CNodePtr>();
  MS_EXCEPTION_IF_NULL(partial_cnode);
  auto partial_graph = GetValueNode<FuncGraphPtr>(partial_cnode->input(kFirstDataInputIndex));
  MS_EXCEPTION_IF_NULL(partial_graph);
  auto graph_nodes = TopoSort(partial_graph->get_return());
  for (auto &node : graph_nodes) {
    if (AnfAlgo::CheckPrimitiveType(node, prim::kPrimPartial) || AnfAlgo::CheckPrimitiveType(node, prim::kPrimCall) ||
        AnfAlgo::CheckPrimitiveType(node, prim::kPrimSwitch)) {
      return false;
    }
    if (node->isa<CNode>() && IsValueNode<FuncGraph>(node->cast<CNodePtr>()->input(kAnfPrimitiveIndex))) {
      return false;
    }
  }
  return true;
 }
 }  // namespace
 GraphId SessionBasic::graph_sum_ = 0;
@@ -1463,9 +1502,7 @@ bool SessionBasic::CreateCNodeOfKernelGraph(const AnfNodePtr &node, KernelGraph
  new_cnode->set_fullname_with_scope(fullname);
  new_cnode->set_scope(cnode->scope());
  graph->FrontBackendlMapAdd(node, new_cnode);
  if (AnfAlgo::CheckPrimitiveType(new_cnode, prim::kPrimReturn)) {
    graph->set_return(new_cnode);
  }
  SetReturnNode(new_cnode, graph);
  return true;
 }
@@ -1957,7 +1994,8 @@ void SessionBasic::HandleInternalOutput(const AnfNodePtr &input_front_node, cons
  if (internal_output) {
    auto users = ExtendNodeUsers(front_func_graph_manager, front_node);
    for (auto &user : users) {
      if (AnfAlgo::CheckPrimitiveType(user, prim::kPrimPartial) && kernel_target != kGPUDevice) {
      if (AnfAlgo::CheckPrimitiveType(user, prim::kPrimPartial) && kernel_target != kGPUDevice &&
          NoPartialInPartialGraph(user)) {
        auto partial_target = AddPartialParametersMap(front_func_graph_manager, user);
        if (partial_target != kNoTarget && partial_target != kernel_target) {
          unique_target = false;
@@ -2659,6 +2697,7 @@ uint32_t GetRankId() {
    world_group = kNcclWorldGroup;
  } else {
    MS_LOG(ERROR) << "Invalid backend: " << backend;
    return rank_id;
  }
  if (!CommManager::GetInstance().GetRankID(world_group, &rank_id)) {
    MS_LOG(INFO) << "Failed to get rank id.";
--- a/mindspore/ccsrc/debug/anf_ir_dump.cc
+++ b/mindspore/ccsrc/debug/anf_ir_dump.cc
@@ -596,7 +596,8 @@ void DumpIR(const std::string &filename, const FuncGraphPtr &graph, bool dump_fu
  std::ofstream fout(realpath.value());
  std::ostringstream buffer;
  if (!fout.is_open()) {
    MS_LOG(ERROR) << "Open dump file '" << realpath.value() << "' failed!";
    MS_LOG(ERROR) << "Open dump file '" << realpath.value() << "' failed!"
                  << " Errno:" << errno << " ErrInfo:" << strerror(errno);
    return;
  }
@@ -638,7 +639,8 @@ void DumpIRForRDR(const std::string &filename, const FuncGraphPtr &graph, bool d
  std::ofstream fout(realpath.value());
  std::ostringstream buffer;
  if (!fout.is_open()) {
    MS_LOG(ERROR) << "Open dump file '" << realpath.value() << "' failed!";
    MS_LOG(ERROR) << "Open dump file '" << realpath.value() << "' failed!"
                  << " Errno:" << errno << " ErrInfo:" << strerror(errno);
    return;
  }
--- a/mindspore/ccsrc/debug/anf_ir_utils.cc
+++ b/mindspore/ccsrc/debug/anf_ir_utils.cc
@@ -606,7 +606,8 @@ void AnfExporter::ExportFuncGraph(const std::string &filename, const FuncGraphPt
  std::ofstream ofs(filename);
  if (!ofs.is_open()) {
    MS_LOG(ERROR) << "Open file '" << filename << "' failed!";
    MS_LOG(ERROR) << "Open file '" << filename << "' failed!"
                  << " Errno:" << errno << " ErrInfo:" << strerror(errno);
    return;
  }
--- a/mindspore/ccsrc/debug/common.cc
+++ b/mindspore/ccsrc/debug/common.cc
@@ -274,7 +274,8 @@ bool Common::SaveStringToFile(const std::string filename, const std::string stri
  ofs.open(real_path.value());
  if (!ofs.is_open()) {
    MS_LOG(ERROR) << "Open dump file '" << real_path.value() << "' failed!";
    MS_LOG(ERROR) << "Open dump file '" << real_path.value() << "' failed!"
                  << " Errno:" << errno << " ErrInfo:" << strerror(errno);
    return false;
  }
  ofs << string_info << std::endl;
--- a/mindspore/ccsrc/debug/data_dump/dump_json_parser.cc
+++ b/mindspore/ccsrc/debug/data_dump/dump_json_parser.cc
@@ -1,5 +1,5 @@
 /**
 * Copyright 2020 Huawei Technologies Co., Ltd
 * Copyright 2020-2021 Huawei Technologies Co., Ltd
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
@@ -92,7 +92,8 @@ void DumpJsonParser::Parse() {
  std::ifstream json_file(dump_config_file.value());
  if (!json_file.is_open()) {
    MS_LOG(EXCEPTION) << "Dump file:" << dump_config_file.value() << " open failed.";
    MS_LOG(EXCEPTION) << "Dump file:" << dump_config_file.value() << " open failed."
                      << " Errno:" << errno << " ErrInfo:" << strerror(errno);
  }
  nlohmann::json j;
@@ -176,7 +177,7 @@ void DumpJsonParser::CopyMSCfgJsonToDir(uint32_t rank_id) {
    auto context = MsContext::GetInstance();
    MS_EXCEPTION_IF_NULL(context);
    ms_info["device_target"] = context->get_param<std::string>(MS_CTX_DEVICE_TARGET);
    ms_info["ms_version"] = "1.3.0";
    ms_info["ms_version"] = "1.4.0";
    const std::string file_path = realpath.value();
    ChangeFileMode(file_path, S_IWUSR);
    std::ofstream json_create(file_path);
@@ -204,7 +205,8 @@ bool DumpJsonParser::DumpToFile(const std::string &filename, const void *data, s
  ChangeFileMode(file_path, S_IWUSR);
  std::ofstream fd(file_path, std::ios::out | std::ios::trunc | std::ios::binary);
  if (!fd.is_open()) {
    MS_LOG(ERROR) << "Open file " << file_path << " failed.";
    MS_LOG(ERROR) << "Open file " << file_path << " failed."
                  << " Errno:" << errno << " ErrInfo:" << strerror(errno);
    return false;
  }
  std::string npy_header = GenerateNpyHeader(shape, type);
@@ -349,7 +351,7 @@ void DumpJsonParser::ParseIteration(const nlohmann::json &content) {
      MS_LOG(EXCEPTION) << "iteration only supports digits, {'-', '|'}, or just \"all\" but got: " << iteration_;
    }
  } else if (context->get_param<std::string>(MS_CTX_DEVICE_TARGET) == kCPUDevice) {
    MS_LOG(WARNING) << "Dump not enabled. ";
    MS_LOG(WARNING) << "Dump is not enabled. ";
  } else {
    MS_LOG(EXCEPTION) << "Dump Json Parse Failed. Async or E2E should be enabled. ";
  }
@@ -484,14 +486,14 @@ void DumpJsonParser::JudgeDumpEnabled() {
  }
  if (!async_dump_enabled_ && !e2e_dump_enabled_) {
    MS_LOG(WARNING) << "Dump json parse failed. Dump not enabled";
    MS_LOG(WARNING) << "Dump json parse failed. Dump is not enabled";
  }
  if (context->get_param<std::string>(MS_CTX_DEVICE_TARGET) != kCPUDevice) {
    auto device_id = context->get_param<uint32_t>(MS_CTX_DEVICE_ID);
    if (support_devices_.find(device_id) == support_devices_.end()) {
      async_dump_enabled_ = false;
      e2e_dump_enabled_ = false;
      MS_LOG(WARNING) << "Dump not enabled. device_id:" << device_id << " not support";
      MS_LOG(WARNING) << "Dump is not enabled. device_id:" << device_id << " not support";
    }
  }
  JsonConfigToString();
@@ -532,9 +534,10 @@ std::string DumpJsonParser::GetOpOverflowBinPath(uint32_t graph_id) const {
  bin_path.append("rank_");
  uint32_t rank_id = 0;
  auto env_table_file = common::GetEnv("RANK_TABLE_FILE");
  auto ms_context = MsContext::GetInstance();
  MS_EXCEPTION_IF_NULL(ms_context);
  auto env_rank_id = common::GetEnv("RANK_ID");
  if (!(env_table_file.empty() || env_rank_id.empty())) {
  if (ms_context->get_param<bool>(MS_CTX_ENABLE_HCCL) && !env_rank_id.empty()) {
    // get actual rank id if it's distribution training case.
    if (!CommManager::GetInstance().GetRankID(kHcclWorldGroup, &rank_id)) {
      MS_LOG(INFO) << "Failed to get rank id.";
--- a/mindspore/ccsrc/debug/debug_services.cc
+++ b/mindspore/ccsrc/debug/debug_services.cc
@@ -26,6 +26,7 @@
 #include <unordered_set>
 #include <utility>
 #include "pybind11/embed.h"
 #include "pybind11/stl.h"
 #ifdef ONLINE_DBG_MODE
 #include "debug/common.h"
 #include "debug/debugger/debugger.h"
@@ -80,45 +81,57 @@ void DebugServices::RemoveWatchpoint(unsigned int id) {
 std::unique_ptr<ITensorSummary> GetSummaryPtr(const std::shared_ptr<TensorData> &tensor,
                                              void *const previous_tensor_ptr, uint32_t num_elements,
                                              int tensor_dtype) {
                                              uint32_t prev_num_elements, int tensor_dtype) {
  switch (tensor_dtype) {
    case DbgDataType::DT_UINT8: {
      return std::make_unique<TensorSummary<uint8_t>>(tensor->GetDataPtr(), previous_tensor_ptr, num_elements);
      return std::make_unique<TensorSummary<uint8_t>>(tensor->GetDataPtr(), previous_tensor_ptr, num_elements,
                                                      prev_num_elements);
    }
    case DbgDataType::DT_INT8: {
      return std::make_unique<TensorSummary<int8_t>>(tensor->GetDataPtr(), previous_tensor_ptr, num_elements);
      return std::make_unique<TensorSummary<int8_t>>(tensor->GetDataPtr(), previous_tensor_ptr, num_elements,
                                                     prev_num_elements);
    }
    case DbgDataType::DT_UINT16: {
      return std::make_unique<TensorSummary<uint16_t>>(tensor->GetDataPtr(), previous_tensor_ptr, num_elements);
      return std::make_unique<TensorSummary<uint16_t>>(tensor->GetDataPtr(), previous_tensor_ptr, num_elements,
                                                       prev_num_elements);
    }
    case DbgDataType::DT_INT16: {
      return std::make_unique<TensorSummary<int16_t>>(tensor->GetDataPtr(), previous_tensor_ptr, num_elements);
      return std::make_unique<TensorSummary<int16_t>>(tensor->GetDataPtr(), previous_tensor_ptr, num_elements,
                                                      prev_num_elements);
    }
    case DbgDataType::DT_UINT32: {
      return std::make_unique<TensorSummary<uint32_t>>(tensor->GetDataPtr(), previous_tensor_ptr, num_elements);
      return std::make_unique<TensorSummary<uint32_t>>(tensor->GetDataPtr(), previous_tensor_ptr, num_elements,
                                                       prev_num_elements);
    }
    case DbgDataType::DT_INT32:
    case DbgDataType::DT_BASE_INT: {
      return std::make_unique<TensorSummary<int32_t>>(tensor->GetDataPtr(), previous_tensor_ptr, num_elements);
      return std::make_unique<TensorSummary<int32_t>>(tensor->GetDataPtr(), previous_tensor_ptr, num_elements,
                                                      prev_num_elements);
    }
    case DbgDataType::DT_UINT64: {
      return std::make_unique<TensorSummary<uint64_t>>(tensor->GetDataPtr(), previous_tensor_ptr, num_elements);
      return std::make_unique<TensorSummary<uint64_t>>(tensor->GetDataPtr(), previous_tensor_ptr, num_elements,
                                                       prev_num_elements);
    }
    case DbgDataType::DT_INT64: {
      return std::make_unique<TensorSummary<int64_t>>(tensor->GetDataPtr(), previous_tensor_ptr, num_elements);
      return std::make_unique<TensorSummary<int64_t>>(tensor->GetDataPtr(), previous_tensor_ptr, num_elements,
                                                      prev_num_elements);
    }
    case DbgDataType::DT_FLOAT16: {
      return std::make_unique<TensorSummary<float16>>(tensor->GetDataPtr(), previous_tensor_ptr, num_elements);
      return std::make_unique<TensorSummary<float16>>(tensor->GetDataPtr(), previous_tensor_ptr, num_elements,
                                                      prev_num_elements);
    }
    case DbgDataType::DT_FLOAT32:
    case DbgDataType::DT_BASE_FLOAT: {
      return std::make_unique<TensorSummary<float>>(tensor->GetDataPtr(), previous_tensor_ptr, num_elements);
      return std::make_unique<TensorSummary<float>>(tensor->GetDataPtr(), previous_tensor_ptr, num_elements,
                                                    prev_num_elements);
    }
    case DbgDataType::DT_FLOAT64: {
      return std::make_unique<TensorSummary<double>>(tensor->GetDataPtr(), previous_tensor_ptr, num_elements);
      return std::make_unique<TensorSummary<double>>(tensor->GetDataPtr(), previous_tensor_ptr, num_elements,
                                                     prev_num_elements);
    }
    case DbgDataType::DT_BOOL: {
      return std::make_unique<TensorSummary<bool>>(tensor->GetDataPtr(), previous_tensor_ptr, num_elements);
      return std::make_unique<TensorSummary<bool>>(tensor->GetDataPtr(), previous_tensor_ptr, num_elements,
                                                   prev_num_elements);
    }
    default:
      MS_LOG(INFO) << "Unsupported tensor type";
@@ -128,7 +141,8 @@ std::unique_ptr<ITensorSummary> GetSummaryPtr(const std::shared_ptr<TensorData>
 }
 #ifdef OFFLINE_DBG_MODE
 void *DebugServices::GetPrevTensor(const std::shared_ptr<TensorData> &tensor, bool previous_iter_tensor_needed) {
 void *DebugServices::GetPrevTensor(const std::shared_ptr<TensorData> &tensor, bool previous_iter_tensor_needed,
                                   uint32_t *prev_num_elements) {
  void *previous_tensor_ptr = nullptr;
  std::shared_ptr<TensorData> tensor_prev;
  if (previous_iter_tensor_needed && tensor->GetIteration() >= 1) {
@@ -151,6 +165,7 @@ void *DebugServices::GetPrevTensor(const std::shared_ptr<TensorData> &tensor, bo
      tensor_prev.reset();
    } else {
      previous_tensor_ptr = tensor_prev->GetDataPtr();
      *prev_num_elements = tensor_prev->GetNumElements();
    }
  }
  return previous_tensor_ptr;
@@ -158,9 +173,17 @@ void *DebugServices::GetPrevTensor(const std::shared_ptr<TensorData> &tensor, bo
 #endif
 void DebugServices::AddWatchPointsToCheck(bool init_dbg_suspend, bool step_end, bool recheck,
                                          const std::string &tensor_name, const std::string &tensor_name_no_slot,
                                          bool *previous_iter_tensor_needed, std::string *const qualified_tensor_name,
                                          const std::shared_ptr<TensorData> &tensor, bool *previous_iter_tensor_needed,
                                          std::string *const qualified_tensor_name,
                                          std::vector<watchpoint_t> *const watchpoints_to_check) {
  if (tensor == nullptr) {
    MS_LOG(DEBUG) << "tensor is nullptr.";
    return;
  }
  const auto tensor_name = tensor->GetName();
  const auto tensor_name_no_slot = tensor_name.substr(0, tensor_name.find_first_of(':'));
  const auto tensor_device_id = tensor->GetDeviceId();
  const auto tensor_root_graph_id = tensor->GetRootGraphId();
  for (auto w_table_item : watchpoint_table) {
    auto wp = std::get<1>(w_table_item);
    // check ONLY init conditions on initial suspended state.
@@ -178,7 +201,7 @@ void DebugServices::AddWatchPointsToCheck(bool init_dbg_suspend, bool step_end,
      wp_lock_.unlock();
      if (wp_cache_hit) continue;
    }
    std::string found = wp.FindQualifiedTensorName(tensor_name_no_slot);
    std::string found = wp.FindQualifiedTensorName(tensor_name_no_slot, tensor_device_id, tensor_root_graph_id);
    if (!found.empty()) {
      *qualified_tensor_name = found;
      watchpoints_to_check->push_back(w_table_item.second);
@@ -238,21 +261,26 @@ void DebugServices::CheckWatchpointsForTensor(
    bool previous_iter_tensor_needed = false;
    // Add do nothing line in case offline debug is off, prevent unused var warning
    (void)previous_iter_tensor_needed;
    AddWatchPointsToCheck(init_dbg_suspend, step_end, recheck, tensor_name, tensor_name_no_slot,
                          &previous_iter_tensor_needed, &qualified_tensor_name, &watchpoints_to_check);
    AddWatchPointsToCheck(init_dbg_suspend, step_end, recheck, tensor, &previous_iter_tensor_needed,
                          &qualified_tensor_name, &watchpoints_to_check);
    // no wp set on current tensor
    if (watchpoints_to_check.empty()) continue;
    uint32_t num_elements = tensor->GetNumElements();
    uint32_t prev_num_elements = 0;
    void *previous_tensor_ptr = nullptr;
 #ifdef OFFLINE_DBG_MODE
    void *previous_tensor_ptr = GetPrevTensor(tensor, previous_iter_tensor_needed);
    previous_tensor_ptr = GetPrevTensor(tensor, previous_iter_tensor_needed, &prev_num_elements);
 #else
    void *previous_tensor_ptr =
      tensor_loader_->GetPrevTensor(tensor_name) ? tensor_loader_->GetPrevTensor(tensor_name)->GetDataPtr() : nullptr;
    std::shared_ptr<TensorData> prev_tensor_data = tensor_loader_->GetPrevTensor(tensor_name);
    if (prev_tensor_data) {
      previous_tensor_ptr = prev_tensor_data->GetDataPtr();
      prev_num_elements = prev_tensor_data->GetNumElements();
    }
 #endif
    std::unique_ptr<ITensorSummary> base_summary_ptr;
    if (!(watchpoints_to_check.size() == 1 && watchpoints_to_check[0].condition.type == IS_OVERFLOW)) {
      base_summary_ptr = GetSummaryPtr(tensor, previous_tensor_ptr, num_elements, tensor_dtype);
      base_summary_ptr = GetSummaryPtr(tensor, previous_tensor_ptr, num_elements, prev_num_elements, tensor_dtype);
      if (base_summary_ptr != nullptr) {
        base_summary_ptr->SummarizeTensor(watchpoints_to_check);
      }
@@ -399,7 +427,8 @@ void DebugServices::ReadTensorFromNpy(const std::string &file_name, std::string
  MS_LOG(INFO) << "Reading in file: " << file_path;
  infile.open(file_path.c_str(), std::ios::ate | std::ios::binary | std::ios::in);
  if (!infile.is_open()) {
    MS_LOG(ERROR) << "Failed to open file (In ReadTensorFromNpy) " << file_path;
    MS_LOG(ERROR) << "Failed to open file (In ReadTensorFromNpy) " << file_path << " Errno:" << errno
                  << " ErrInfo:" << strerror(errno);
    return;
  }
  uint64_t file_size = infile.tellg();
@@ -439,6 +468,7 @@ void DebugServices::ConvertToHostFormat(const std::map<std::string, std::vector<
  std::string file_format = "npy";
  for (auto const &d : dir_to_files_map) {
    std::vector<std::string> files_to_convert_in_dir;
    std::vector<std::string> files_after_convert_in_dir;
    std::string dump_key = d.first;
    for (auto const &file_name : d.second) {
      bool already_converted = false;
@@ -457,26 +487,19 @@ void DebugServices::ConvertToHostFormat(const std::map<std::string, std::vector<
      }
      if (!already_converted) {
        files_to_convert_in_dir.push_back(dump_key + "/" + file_name);
        files_after_convert_in_dir.push_back(dump_key + "/" + file_name_without_scope);
      }
    }
    std::ostringstream input_file_o;
    const char *const delim = " ";
    std::copy(files_to_convert_in_dir.begin(), files_to_convert_in_dir.end(),
              std::ostream_iterator<std::string>(input_file_o, delim));
    std::string input_files = input_file_o.str();
    MS_LOG(INFO) << "Ops to convert: " << input_files;
    if (input_files != "") {
    MS_LOG(INFO) << "Number of files to convert: " << files_to_convert_in_dir.size();
    if (!files_to_convert_in_dir.empty()) {
      // Look for the installation path to the conver_async package. If not found, throw exception and terminate the
      // later task.
      try {
        auto pkg = pybind11::module::import("mindspore.offline_debug.convert_async");
        std::string convert_pkg_path = pkg.attr("__file__").cast<std::string>();
        MS_LOG(INFO) << "The file for converting async dump data is in " << convert_pkg_path;
        std::string convert_command = "python " + convert_pkg_path + " -out " + dump_key + " -t " + file_format +
                                      " -d " + dump_key + " -f NCHW -l " + input_files;
        (void)(system(convert_command.c_str()) + 1);
        auto convert_obj = pkg.attr("AsyncDumpConverter")(pybind11::cast(files_to_convert_in_dir), dump_key);
        (void)convert_obj.attr("convert_files")();
      } catch (pybind11::error_already_set &e) {
        MS_LOG(EXCEPTION) << "Can't find package mindspore.offline_debug.convert_async";
        MS_LOG(EXCEPTION) << "Failed to convert async dump data: " << e.what();
      }
      DIR *d_handle = opendir(dump_key.c_str());
@@ -485,7 +508,7 @@ void DebugServices::ConvertToHostFormat(const std::map<std::string, std::vector<
        while ((dir = readdir(d_handle)) != NULL) {
          if (dir->d_type == DT_REG) {
            std::string candidate = dir->d_name;
            for (const std::string &file_to_find : files_to_convert_in_dir) {
            for (const std::string &file_to_find : files_after_convert_in_dir) {
              std::string file_n = file_to_find.substr(file_to_find.find_last_of("\\/") + 1);
              if (candidate.find(file_n) != std::string::npos && candidate.rfind(file_format) != std::string::npos) {
                // we found a converted file for this op
@@ -736,7 +759,7 @@ void DebugServices::ReadDumpedTensor(std::vector<std::string> backend_name, std:
    std::string slot_string_to_check;
    std::string prefix_dump_file_name;
    SetPrefixToCheck(&prefix_dump_file_name, &slot_string_to_check, &dump_style_kernel_name, slot[i], is_output[i]);
    std::string prefix_dump_to_check = dump_style_kernel_name;
    std::string prefix_dump_to_check = dump_style_kernel_name + '.';
    GetNodeNameWithoutScope(&prefix_dump_to_check);
    std::string specific_dump_dir = dump_dir + "/rank_" + std::to_string(device_id[i]) + "/" + net_name + "/" +
@@ -1069,12 +1092,12 @@ bool DebugServices::CheckOpOverflow(std::string node_name_to_find, unsigned int
 #ifdef ONLINE_DBG_MODE
  auto debugger = Debugger::GetInstance();
  overflow_bin_path = DumpJsonParser::GetInstance().GetOpOverflowBinPath(debugger->GetGraphPtr()->graph_id());
  auto realpath = Common::GetRealPath(overflow_bin_path);
  if (!realpath.has_value()) {
    MS_LOG(ERROR) << "Get real path failed for overflow_bin_path.";
  std::string check_overflow_bin_path = RealPath(overflow_bin_path);
  if (check_overflow_bin_path.empty()) {
    MS_LOG(WARNING) << "Get real path failed for overflow_bin_path.";
    return false;
  }
  overflow_bin_path = realpath.value();
  overflow_bin_path = check_overflow_bin_path;
 #else
  overflow_bin_path = dump_dir + "/rank_" + std::to_string(device_id) + "/" + net_name + "/" +
                      std::to_string(root_graph_id) + "/" + IterationString(iteration) + "/";
@@ -1108,8 +1131,8 @@ bool DebugServices::CheckOpOverflow(std::string node_name_to_find, unsigned int
          std::ifstream infile;
          infile.open(file_path.c_str(), std::ios::ate | std::ios::binary | std::ios::in);
          if (!infile.is_open()) {
            MS_LOG(ERROR) << "Failed to open overflow bin file " << file_name;
            MS_LOG(ERROR) << "Error: " << strerror(errno);
            MS_LOG(ERROR) << "Failed to open overflow bin file " << file_name << " Errno:" << errno
                          << " ErrInfo:" << strerror(errno);
            continue;
          }
@@ -1256,7 +1279,7 @@ std::string DebugServices::RealPath(const std::string &input_path) {
      MS_LOG(EXCEPTION) << "The length of file name : " << file_name.length() << " exceeds limit: " << NAME_MAX;
    }
    if (realpath(prefix_path.c_str(), real_path) == nullptr) {
      MS_LOG(ERROR) << "The dir " << prefix_path << " does not exist.";
      MS_LOG(WARNING) << "The dir " << prefix_path << " does not exist.";
      return "";
    }
--- a/mindspore/ccsrc/debug/debug_services.h
+++ b/mindspore/ccsrc/debug/debug_services.h
@@ -125,16 +125,30 @@ class DebugServices {
    std::vector<parameter_t> parameter_list;
    size_t location = 0;
    std::string FindQualifiedTensorName(const std::string &tensor_name) const {
    std::string FindQualifiedTensorName(const std::string &tensor_name, unsigned const int &tensor_device_id,
                                        unsigned const int &tensor_root_graph_id) const {
      std::string node_name = tensor_name.substr(0, tensor_name.find_first_of(':'));
      int indx = 0;
      for (auto check_node : check_node_list) {
        std::string w_name = std::get<0>(check_node);
        bool w_type = std::get<1>(check_node);
        auto found = w_name.find_last_of('/');
        if (found != std::string::npos && w_name.substr(found + 1) == tensor_name) return w_name;
        if ((w_type && (tensor_name.find(w_name) == location || w_name == "*")) || (!w_type && node_name == w_name)) {
          return w_name;
        bool check_tensor_name = found != std::string::npos && w_name.substr(found + 1) == tensor_name;
        bool check_node_name = (w_type && (node_name == w_name || w_name == "*")) || (!w_type && node_name == w_name);
        if (check_tensor_name || check_node_name) {
          // online debugger only support single card
          if (check_node_device_list.empty()) {
            return w_name;
          }
          auto device_vec = std::get<1>(check_node_device_list[indx]);
          auto root_graph_vec = std::get<1>(check_node_graph_list[indx]);
          auto iter1 = std::find(device_vec.begin(), device_vec.end(), tensor_device_id);
          auto iter2 = std::find(root_graph_vec.begin(), root_graph_vec.end(), tensor_root_graph_id);
          if (iter1 != device_vec.end() && iter2 != root_graph_vec.end()) {
            return w_name;
          }
        }
        indx++;
      }
      return {};
    }
@@ -214,8 +228,8 @@ class DebugServices {
                        const bool step_end, const bool recheck, std::vector<unsigned int> *device_id = nullptr,
                        std::vector<unsigned int> *root_graph_id = nullptr);
  void AddWatchPointsToCheck(bool init_dbg_suspend, bool step_end, bool recheck, const std::string &tensor_name,
                             const std::string &tensor_name_no_slot, bool *previous_iter_tensor_needed,
  void AddWatchPointsToCheck(bool init_dbg_suspend, bool step_end, bool recheck,
                             const std::shared_ptr<TensorData> &tensor, bool *previous_iter_tensor_needed,
                             std::string *qualified_tensor_name, std::vector<watchpoint_t> *watchpoints_to_check);
 #ifdef OFFLINE_DBG_MODE
@@ -236,7 +250,8 @@ class DebugServices {
  std::vector<std::shared_ptr<TensorData>> ReadNeededDumpedTensors(unsigned int iteration,
                                                                   std::vector<std::string> *async_file_pool);
  void *GetPrevTensor(const std::shared_ptr<TensorData> &tensor, bool previous_iter_tensor_needed);
  void *GetPrevTensor(const std::shared_ptr<TensorData> &tensor, bool previous_iter_tensor_needed,
                      uint32_t *prev_num_elements);
  void ReadTensorFromNpy(const std::string &file_name, std::string *tensor_type, std::size_t *size,
                         std::vector<int64_t> *shape, std::vector<char> **data_buffer);
--- a/mindspore/ccsrc/debug/debugger/debugger.cc
+++ b/mindspore/ccsrc/debug/debugger/debugger.cc
@@ -113,7 +113,7 @@ void Debugger::Init(const uint32_t device_id, const std::string device_target) {
  device_id_ = device_id;
  MS_LOG(INFO) << "Debugger got device_target: " << device_target;
  device_target_ = device_target;
  version_ = "1.3.0";
  version_ = "1.4.0";
 }
 bool IsTypeDebuggerSupported(TypeId type) {
@@ -147,8 +147,22 @@ void Debugger::EnableDebugger() {
  }
  if (debugger_enabled_) {
    std::string host = "localhost";
    // configure grpc host
    std::string env_host_str = common::GetEnv("MS_DEBUGGER_HOST");
    std::string host;
    if (!env_host_str.empty()) {
      if (CheckIp(env_host_str)) {
        MS_LOG(INFO) << "Getenv MS_DEBUGGER_HOST: " << env_host_str;
        host = env_host_str;
      } else {
        debugger_enabled_ = false;
        MS_EXCEPTION(ValueError) << "Environment variable MS_DEBUGGER_HOST isn't a valid IP address. "
                                    "Please set environment variable MS_DEBUGGER_HOST=x.x.x.x to a valid IP";
      }
    } else {
      MS_LOG(INFO) << "Environment variable MS_DEBUGGER_HOST doesn't exist. Using default debugger host: localhost";
      host = "localhost";
    }
    // configure grpc port
    std::string env_port_str = common::GetEnv("MS_DEBUGGER_PORT");
    std::string port;
@@ -1120,6 +1134,17 @@ bool Debugger::CheckPort(const std::string &port) const {
  return true;
 }
 bool Debugger::CheckIp(const std::string &host) const {
  std::regex reg_ip(
    "(25[0-4]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[1-9])"
    "[.](25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[0-9])"
    "[.](25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[0-9])"
    "[.](25[0-4]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[1-9])");
  std::smatch smat;
  std::string host_str = host;
  return std::regex_match(host_str, smat, reg_ip);
 }
 uint32_t Debugger::GetFirstRunGraphId() const { return rungraph_id_list_.front(); }
 void Debugger::LoadSingleAnfnode(const AnfNodePtr &anf_node, const size_t output_index) {
--- a/mindspore/ccsrc/debug/debugger/debugger.h
+++ b/mindspore/ccsrc/debug/debugger/debugger.h
@@ -235,6 +235,9 @@ class Debugger : public std::enable_shared_from_this<Debugger> {
  // Check if the port is valid
  bool CheckPort(const std::string &port) const;
  // Check if the IP is valid
  bool CheckIp(const std::string &host) const;
  void LoadSingleAnfnode(const AnfNodePtr &anf_node, const size_t output_index);
  // class members
--- a/mindspore/ccsrc/debug/debugger/offline_debug/dbg_services.cc
+++ b/mindspore/ccsrc/debug/debugger/offline_debug/dbg_services.cc
@@ -48,7 +48,7 @@ DbgServices::~DbgServices() {
 std::string DbgServices::GetVersion() {
  MS_LOG(INFO) << "get version is called";
  return "1.3.0";
  return "1.4.0";
 }
 int32_t DbgServices::Initialize(std::string net_name, std::string dump_folder_path, bool is_sync_mode) {
--- a/mindspore/ccsrc/debug/debugger/proto_exporter.cc
+++ b/mindspore/ccsrc/debug/debugger/proto_exporter.cc
@@ -573,7 +573,8 @@ void DumpIRProtoWithSrcInfo(const FuncGraphPtr &func_graph, const std::string &s
  // write to pb file
  std::ofstream ofs(realpath.value());
  if (!ofs.is_open()) {
    MS_LOG(ERROR) << "Open file '" << realpath.value() << "' failed!";
    MS_LOG(ERROR) << "Open file '" << realpath.value() << "' failed!"
                  << " Errno:" << errno << " ErrInfo:" << strerror(errno);
    return;
  }
  ofs << graph_proto;
--- a/mindspore/ccsrc/debug/debugger/tensor_summary.cc
+++ b/mindspore/ccsrc/debug/debugger/tensor_summary.cc
@@ -91,10 +91,12 @@ double VarianceAndMeanCalculator::GetVariance() const {
 double VarianceAndMeanCalculator::GetStandardDeviation() { return sqrt(GetVariance()); }
 template <typename T>
 TensorSummary<T>::TensorSummary(void *current_tensor_ptr, void *const previous_tensor_ptr, uint32_t num_elements)
 TensorSummary<T>::TensorSummary(void *current_tensor_ptr, void *const previous_tensor_ptr, uint32_t num_elements,
                                uint32_t prev_num_elements)
    : current_tensor_ptr(reinterpret_cast<T *>(current_tensor_ptr)),
      prev_tensor_ptr(reinterpret_cast<T *>(previous_tensor_ptr)),
      num_elements(num_elements),
      prev_num_elements_(prev_num_elements),
      min(std::numeric_limits<double>::max()),
      max(std::numeric_limits<double>::lowest()),
      inf_count(0),
@@ -108,8 +110,14 @@ void TensorSummary<T>::SummarizeTensor(const std::vector<DebugServices::watchpoi
  InitCalculators(wps);
  for (size_t i = 0; i < num_elements; ++i) {
    auto current_value = static_cast<double>(current_tensor_ptr[i]);
    double previous_value =
      prev_tensor_ptr ? static_cast<double>(prev_tensor_ptr[i]) : std::numeric_limits<double>::quiet_NaN();
    double previous_value = std::numeric_limits<double>::quiet_NaN();
    if (prev_tensor_ptr) {
      if (num_elements == prev_num_elements_) {
        previous_value = static_cast<double>(prev_tensor_ptr[i]);
      } else {
        MS_LOG(DEBUG) << "Current and previous tensor are not the same size.";
      }
    }
    inf_count += std::isinf(current_value);
    nan_count += std::isnan(current_value);
    zero_count += (current_value == 0);
--- a/mindspore/ccsrc/debug/debugger/tensor_summary.h
+++ b/mindspore/ccsrc/debug/debugger/tensor_summary.h
@@ -99,7 +99,7 @@ class TensorSummary : public ITensorSummary {
 public:
  TensorSummary() = default;
  ~TensorSummary() override = default;
  TensorSummary(void *, void *, uint32_t);
  TensorSummary(void *, void *, uint32_t, uint32_t);
  void SummarizeTensor(const std::vector<DebugServices::watchpoint_t> &) override;
  // returns hit, error_code, parameter_list
  std::tuple<bool, int, std::vector<DebugServices::parameter_t>> IsWatchpointHit(DebugServices::watchpoint_t) override;
@@ -108,6 +108,7 @@ class TensorSummary : public ITensorSummary {
  T *current_tensor_ptr;
  T *prev_tensor_ptr;
  uint32_t num_elements;
  uint32_t prev_num_elements_;
  double min;
  double max;
  uint32_t inf_count;
--- a/mindspore/ccsrc/debug/dump_proto.cc
+++ b/mindspore/ccsrc/debug/dump_proto.cc
@@ -555,7 +555,8 @@ void DumpIRProto(const FuncGraphPtr &func_graph, const std::string &suffix) {
  // write to pb file
  std::ofstream ofs(file_path);
  if (!ofs.is_open()) {
    MS_LOG(ERROR) << "Open file '" << file_path << "' failed!";
    MS_LOG(ERROR) << "Open file '" << file_path << "' failed!"
                  << " Errno:" << errno << " ErrInfo:" << strerror(errno);
    return;
  }
  ofs << GetFuncGraphProtoString(func_graph);
--- a/mindspore/ccsrc/debug/env_config_parser.cc
+++ b/mindspore/ccsrc/debug/env_config_parser.cc
@@ -122,7 +122,8 @@ void EnvConfigParser::ParseFromFile() {
  std::ifstream json_file(config_file_);
  if (!json_file.is_open()) {
    MS_LOG(WARNING) << "Env config file:" << config_file_ << " open failed."
                    << " Please check the config file '" << config_file_ << "' set by 'env_config_path' in context.";
                    << " Please check the config file '" << config_file_ << "' set by 'env_config_path' in context."
                    << " Errno:" << errno << " ErrInfo:" << strerror(errno);
    return;
  }
--- a/mindspore/ccsrc/debug/trace.cc
+++ b/mindspore/ccsrc/debug/trace.cc
@@ -348,7 +348,8 @@ bool AnalyzeFailExporter::ExportFuncGraph(const std::string &filename, const Tra
  }
  std::ofstream ofs(filename);
  if (!ofs.is_open()) {
    MS_LOG(ERROR) << "Open file '" << filename << "' failed!";
    MS_LOG(ERROR) << "Open file '" << filename << "' failed!"
                  << " Errno:" << errno << " ErrInfo:" << strerror(errno);
    return false;
  }
--- a/mindspore/ccsrc/frontend/optimizer/irpass/incorporate_getitem.h
+++ b/mindspore/ccsrc/frontend/optimizer/irpass/incorporate_getitem.h
@@ -19,6 +19,7 @@
 #include <algorithm>
 #include <memory>
 #include <set>
 #include <unordered_map>
 #include <vector>
 #include <utility>
@@ -128,6 +129,178 @@ class GetItemTransformACrossGraph {
 private:
  std::unordered_map<FuncGraphPtr, std::unordered_map<int64_t, FuncGraphPtr>> cache_;
 };
 bool HasMoreJ(const OptimizerPtr &optimizer) {
  bool more_j = false;
  auto res = optimizer->resource();
  auto resource_ptr = std::dynamic_pointer_cast<pipeline::Resource>(res);
  if (resource_ptr != nullptr) {
    const auto &manager = optimizer->manager();
    MS_EXCEPTION_IF_NULL(manager);
    more_j = manager->func_graph_j_total(resource_ptr->func_graph());
  }
  return more_j;
 }
 bool IsOutputShrinkable(const AnfNodePtr &output) {
  if (IsPrimitiveCNode(output, prim::kPrimMakeTuple)) {
    return true;
  }
  if (GetValueNode<ValueTuplePtr>(output)) {
    return true;
  }
  return false;
 }
 size_t GetOutputSize(const AnfNodePtr &output) {
  if (IsPrimitiveCNode(output, prim::kPrimMakeTuple)) {
    const auto &output_cnode = output->cast<CNodePtr>();
    MS_EXCEPTION_IF_NULL(output_cnode);
    return output_cnode->size() - 1;
  }
  const auto &value_tuple = GetValueNode<ValueTuplePtr>(output);
  if (value_tuple == nullptr) {
    MS_LOG(EXCEPTION) << "fg output is not MakeTuple or ValueTuple, but: " << output->DebugString();
  }
  return value_tuple->size();
 }
 struct TpCNodeAndIndex {
  // CNode {TupleGetItem, call, index}
  CNodePtr tp_cnode;
  int64_t index;
 };
 int64_t UpdateUserNodeIndex(const CNodePtr &fg_call_cnode, const int64_t current_index,
                            const std::vector<TpCNodeAndIndex> &tp_cnodes_and_index) {
  const auto &manager = fg_call_cnode->func_graph()->manager();
  MS_EXCEPTION_IF_NULL(manager);
  int64_t new_index = current_index;
  auto txn = manager->Transact();
  for (int64_t i = 0; i < SizeToLong(tp_cnodes_and_index.size()); ++i) {
    const auto &cnode_and_index = tp_cnodes_and_index[i];
    if (cnode_and_index.index != i) {
      constexpr auto kInputIndex = 2;
      txn.SetEdge(cnode_and_index.tp_cnode, kInputIndex, NewValueNode(i));
    }
    if (cnode_and_index.index == current_index) {
      new_index = i;
    }
  }
  txn.Commit();
  return new_index;
 }
 AbstractBasePtr ShrinkAbstract(const AbstractBasePtr &original_abstract,
                               const std::vector<TpCNodeAndIndex> &tp_cnodes_and_index) {
  if (original_abstract != nullptr && original_abstract->isa<abstract::AbstractTuple>()) {
    const auto &abs_tuple = original_abstract->cast<abstract::AbstractTuplePtr>();
    MS_EXCEPTION_IF_NULL(abs_tuple);
    const auto &abs_tuple_elements = abs_tuple->elements();
    const int64_t before_shrink_tuple_size = SizeToLong(abs_tuple_elements.size());
    AbstractBasePtrList shrunk_abstract_elements;
    std::transform(tp_cnodes_and_index.cbegin(), tp_cnodes_and_index.cend(),
                   std::back_inserter(shrunk_abstract_elements),
                   [abs_tuple_elements, before_shrink_tuple_size](const auto &node_and_index) {
                     if (node_and_index.index >= before_shrink_tuple_size) {
                       MS_LOG(EXCEPTION) << "index should less than inputs size, index: " << node_and_index.index
                                         << ", abstract tuple size: " << before_shrink_tuple_size;
                     }
                     return abs_tuple_elements[node_and_index.index];
                   });
    return std::make_shared<abstract::AbstractTuple>(shrunk_abstract_elements);
  }
  return nullptr;
 }
 FuncGraphPtr ShrinkUnsedOutput(const FuncGraphPtr &fg, const std::vector<TpCNodeAndIndex> &tp_cnodes_and_index) {
  const auto &manager = fg->manager();
  MS_EXCEPTION_IF_NULL(manager);
  auto new_fg = TransformableClone(fg, std::make_shared<TraceTransform>("tp_use"));
  auto new_fg_output = new_fg->output();
  AnfNodePtr shrunk_output = nullptr;
  int64_t before_shrink_inputs_size = 0;
  if (IsPrimitiveCNode(new_fg_output, prim::kPrimMakeTuple)) {
    // Shrink output;
    auto new_fg_output_cnode = new_fg_output->cast<CNodePtr>();
    const auto &new_fg_output_inputs = new_fg_output_cnode->inputs();
    constexpr auto kMinimalSize = 2;
    if (new_fg_output_inputs.size() <= kMinimalSize) {
      MS_LOG(EXCEPTION) << "New fg output should at least 2 elements, but: " << new_fg_output->DebugString();
    }
    before_shrink_inputs_size = SizeToLong(new_fg_output_inputs.size() - 1);
    AnfNodePtrList shrunk_inputs{NewValueNode({prim::kPrimMakeTuple})};
    // Bypass maketuple primitive in new_fg_output_inputs;
    std::transform(tp_cnodes_and_index.cbegin(), tp_cnodes_and_index.cend(), std::back_inserter(shrunk_inputs),
                   [new_fg_output, new_fg_output_inputs, before_shrink_inputs_size](const auto &node_and_index) {
                     if (node_and_index.index >= before_shrink_inputs_size) {
                       MS_LOG(EXCEPTION) << "index should less than inputs size, index: " << node_and_index.index
                                         << ", output: " << new_fg_output->DebugString();
                     }
                     return new_fg_output_inputs[node_and_index.index + 1];
                   });
    shrunk_output = new_fg->NewCNode(shrunk_inputs);
  } else {
    auto value_tuple = GetValueNode<ValueTuplePtr>(new_fg_output);
    if (value_tuple == nullptr) {
      MS_LOG(EXCEPTION) << "New fg output is not MakeTuple or ValueTuple, but " << new_fg_output->DebugString();
    }
    ValuePtrList shrunk_inputs;
    before_shrink_inputs_size = value_tuple->size();
    std::transform(tp_cnodes_and_index.cbegin(), tp_cnodes_and_index.cend(), std::back_inserter(shrunk_inputs),
                   [new_fg_output, value_tuple, before_shrink_inputs_size](const auto &node_and_index) {
                     if (node_and_index.index >= before_shrink_inputs_size) {
                       MS_LOG(EXCEPTION) << "index should less than inputs size, index: " << node_and_index.index
                                         << ", output: " << new_fg_output->DebugString();
                     }
                     return (*value_tuple)[node_and_index.index];
                   });
    shrunk_output = NewValueNode(std::make_shared<ValueTuple>(shrunk_inputs));
  }
  auto shrunk_abstract = ShrinkAbstract(new_fg_output->abstract(), tp_cnodes_and_index);
  MS_EXCEPTION_IF_NULL(shrunk_abstract);
  shrunk_output->set_abstract(shrunk_abstract);
  new_fg->set_output(shrunk_output);
  MS_LOG(DEBUG) << "Partly item used; original size: " << before_shrink_inputs_size
                << ", new size: " << tp_cnodes_and_index.size() << ", fg: " << fg->ToString() << ", new graph"
                << new_fg->ToString();
  return new_fg;
 }
 struct FuncGraphIntVectorPairHasher {
  std::size_t Int64VectorHash(const std::vector<int64_t> &int_vector) const {
    std::size_t hash_value = 0;
    constexpr auto kMaxElementsNum = 4;
    for (size_t i = 0; (i < int_vector.size()) && (i < kMaxElementsNum); ++i) {
      hash_value = hash_combine(hash_value, std::hash<int64_t>{}(int_vector[i]));
    }
    return hash_value;
  }
  std::size_t operator()(const std::pair<FuncGraphPtr, std::vector<int64_t>> &p) const {
    auto h1 = std::hash<FuncGraphPtr>{}(p.first);
    auto h2 = Int64VectorHash(p.second);
    return hash_combine(h1, h2);
  }
 };
 bool ShouldTransform(const AnfNodePtr &node, const std::vector<TpCNodeAndIndex> &tp_cnodes_and_index) {
  if (node->abstract() && node->abstract()->isa<abstract::AbstractTuple>()) {
    const auto &abs_tuple = *(node->abstract()->cast<abstract::AbstractTuplePtr>());
    if (tp_cnodes_and_index[0].index == 0 && abs_tuple.size() > 0) {
      if (abs_tuple[0]->isa<abstract::AbstractScalar>() && abs_tuple[0]->GetTypeTrack()->isa<EnvType>()) {
        return true;
      }
    }
    // fprop_fg will return MakeTuple(xx, bprop_fg).
    if (tp_cnodes_and_index.size() > 1 && tp_cnodes_and_index[1].index == 1 && abs_tuple.size() > 1 &&
        abs_tuple[1]->isa<abstract::AbstractFunction>()) {
      return true;
    }
  }
  return false;
 }
 }  // namespace internal
 // {prim::kPrimTupleGetItem, {G, Xs}, C}
@@ -136,23 +309,52 @@ class IncorporateGetitem : public AnfVisitor {
  IncorporateGetitem() : getitem_transform_() {}
  ~IncorporateGetitem() override = default;
  AnfNodePtr operator()(const OptimizerPtr &, const AnfNodePtr &node) override {
  AnfNodePtr operator()(const OptimizerPtr &optimizer, const AnfNodePtr &node) override {
    Reset();
    AnfVisitor::Match(prim::kPrimTupleGetItem, {IsCNode, IsValueNode<Int64Imm>})(node);
    if (node->func_graph() == nullptr || idx_ == -1 || fg_ == nullptr || fg_->has_flag(FUNC_GRAPH_FLAG_DEFER_INLINE) ||
        fg_->has_flag(FUNC_GRAPH_OUTPUT_NO_RECOMPUTE)) {
      return nullptr;
    }
    // This node had been substituted.
    if (processed_nodes_.find(fg_call_cnode_) != processed_nodes_.end()) {
      MS_LOG(DEBUG) << "fg call with same cnode is already replaced, node: " << node->DebugString()
                    << ", fg_call: " << fg_call_cnode_->DebugString();
      return nullptr;
    }
    const auto &manager = fg_->manager();
    MS_EXCEPTION_IF_NULL(manager);
    bool output_is_shrinkable = internal::IsOutputShrinkable(fg_->output());
    std::vector<internal::TpCNodeAndIndex> tp_cnodes_and_index;
    auto fg_call_cnode_users_counter = MultipleUse(fg_call_cnode_, fg_, &tp_cnodes_and_index);
    bool multiple_use = (tp_cnodes_and_index.size() > 1);
    if (output_is_shrinkable && multiple_use && (tp_cnodes_and_index.size() == fg_call_cnode_users_counter)) {
      if (!internal::ShouldTransform(fg_call_cnode_, tp_cnodes_and_index) && !internal::HasMoreJ(optimizer)) {
        MS_LOG(DEBUG) << "No more j and multiple use, will shrink, node: " << node->DebugString()
                      << ", fg_call: " << fg_call_cnode_->DebugString();
        const auto output_size = internal::GetOutputSize(fg_->output());
        if (fg_call_cnode_users_counter == output_size) {
          processed_nodes_.emplace(fg_call_cnode_);
          MS_LOG(DEBUG) << "All elements in output is used, no need to transform, node: " << node->DebugString()
                        << ", fg_call: " << fg_call_cnode_->DebugString();
          return nullptr;
        }
        auto new_node = ShrinkFuncGraphOutput(node, tp_cnodes_and_index);
        if (new_node != nullptr) {
          return new_node;
        }
      }
    }
    MS_LOG(DEBUG) << "Cannot shrink, transform_getitem, node: " << node->DebugString()
                  << ", fg_call: " << fg_call_cnode_->DebugString();
    auto new_fg = getitem_transform_(node, fg_, idx_);
    MS_LOG(DEBUG) << "Original fg: " << fg_->ToString() << ", new fg: " << new_fg->ToString();
    (void)args_.insert(args_.begin(), NewValueNode(new_fg));
    auto new_node = node->func_graph()->NewCNode(args_);
    // Check if the another only usage of {G, Xs} is UpdateState{s, {G, Xs}}, if yes, replace
    // UpdateState{s, {G, Xs}} with UpdateState{s, new_node};
    const auto &manager = fg_->manager();
    MS_EXCEPTION_IF_NULL(manager);
    auto &node_users_map = manager->node_users();
    auto it = node_users_map.find(fg_cnode_);
    auto it = node_users_map.find(fg_call_cnode_);
    if (it != node_users_map.end()) {
      AnfNodePtr update_state_node = nullptr;
      auto &node_users = it->second;
@@ -166,7 +368,7 @@ class IncorporateGetitem : public AnfVisitor {
      if (update_state_node != nullptr) {
        auto update_state_cnode = update_state_node->cast<CNodePtr>();
        // double check;
        if (update_state_cnode->input(2) == fg_cnode_) {
        if (update_state_cnode->input(2) == fg_call_cnode_) {
          MS_LOG(DEBUG) << "Replace UpdateState node: " << update_state_cnode->DebugString(2)
                        << ", input 2 with: " << new_node->DebugString();
          manager->SetEdge(update_state_cnode, 2, new_node);
@@ -177,12 +379,98 @@ class IncorporateGetitem : public AnfVisitor {
    return new_node;
  }
  size_t MultipleUse(const CNodePtr &fg_call, const FuncGraphPtr &fg,
                     std::vector<internal::TpCNodeAndIndex> *cnodes_and_index) const {
    const auto &manager = fg->manager();
    MS_EXCEPTION_IF_NULL(manager);
    auto &cnode_and_index_vector = *cnodes_and_index;
    std::set<int64_t> index_set;
    std::size_t total_usage = 0;
    const auto &node_users_map = manager->node_users();
    const auto &it = node_users_map.find(fg_call);
    if (it == node_users_map.end()) {
      return 0;
    }
    const auto &node_users = it->second;
    for (const auto &user : node_users) {
      if (IsPrimitiveCNode(user.first, prim::kPrimTupleGetItem)) {
        const auto &cnode = user.first->cast<CNodePtr>();
        if (cnode->input(2)->isa<ValueNode>()) {
          auto idx = GetValue<int64_t>(cnode->input(2)->cast<ValueNodePtr>()->value());
          cnode_and_index_vector.push_back({cnode, idx});
          index_set.insert(idx);
          total_usage++;
        } else {
          MS_LOG(EXCEPTION) << "tuple_getitem index is not valuenode, but: " << user.first->DebugString();
        }
      } else {
        MS_LOG(DEBUG) << "fg_call usre is not tuple_getitem, user: " << user.first->DebugString();
      }
    }
    if (index_set.size() != total_usage) {
      MS_LOG(DEBUG) << "some index usage is duplicated, total_usage: " << total_usage;
      MS_LOG(DEBUG) << "index_set:";
      for (auto idx : index_set) {
        MS_LOG(DEBUG) << " " << idx;
      }
    }
    // sort by index;
    std::sort(cnode_and_index_vector.begin(), cnode_and_index_vector.end(),
              [](const auto &tp1, const auto &tp2) { return tp1.index < tp2.index; });
    return node_users.size();
  }
  AnfNodePtr ShrinkFuncGraphOutput(const AnfNodePtr &node,
                                   const std::vector<internal::TpCNodeAndIndex> &tp_cnodes_and_index) {
    const auto &manager = fg_->manager();
    MS_EXCEPTION_IF_NULL(manager);
    std::vector<int64_t> index_vector;
    (void)std::transform(tp_cnodes_and_index.begin(), tp_cnodes_and_index.end(), std::back_inserter(index_vector),
                         [](const auto &cnode_and_index) { return cnode_and_index.index; });
    auto iter = processed_fgs_.find(std::make_pair(fg_, index_vector));
    if (iter != processed_fgs_.end()) {
      MS_LOG(DEBUG) << "fg is already processed, just update caller index, node: " << node->DebugString()
                    << ", fg_call: " << fg_call_cnode_->DebugString();
      MS_LOG(DEBUG) << "original fg: " << fg_->ToString() << ", processed_fg: " << iter->second->ToString();
      processed_nodes_.emplace(fg_call_cnode_);
      manager->SetEdge(fg_call_cnode_, 0, NewValueNode(iter->second));
      auto shrunk_abstract = internal::ShrinkAbstract(fg_call_cnode_->abstract(), tp_cnodes_and_index);
      if (shrunk_abstract != nullptr) {
        fg_call_cnode_->set_abstract(shrunk_abstract);
      }
      auto new_idx = internal::UpdateUserNodeIndex(fg_call_cnode_, idx_, tp_cnodes_and_index);
      auto new_node =
        node->func_graph()->NewCNode({NewValueNode(prim::kPrimTupleGetItem), fg_call_cnode_, NewValueNode(new_idx)});
      new_node->set_abstract(node->abstract());
      return new_node;
    }
    const auto new_fg = internal::ShrinkUnsedOutput(fg_, tp_cnodes_and_index);
    if (new_fg != nullptr) {
      MS_LOG(DEBUG) << "fg output is shrunk, original fg: " << fg_->ToString() << ", new fg: " << new_fg->ToString();
      processed_nodes_.emplace(fg_call_cnode_);
      processed_fgs_.emplace(std::make_pair(fg_, index_vector), new_fg);
      manager->SetEdge(fg_call_cnode_, 0, NewValueNode(new_fg));
      auto shrunk_abstract = internal::ShrinkAbstract(fg_call_cnode_->abstract(), tp_cnodes_and_index);
      if (shrunk_abstract != nullptr) {
        fg_call_cnode_->set_abstract(shrunk_abstract);
      }
      auto new_idx = internal::UpdateUserNodeIndex(fg_call_cnode_, idx_, tp_cnodes_and_index);
      auto new_node =
        node->func_graph()->NewCNode({NewValueNode(prim::kPrimTupleGetItem), fg_call_cnode_, NewValueNode(new_idx)});
      new_node->set_abstract(node->abstract());
      return new_node;
    }
    MS_LOG(DEBUG) << "Shrink failed. node: " << node->DebugString()
                  << ", switch_call: " << fg_call_cnode_->DebugString();
    return nullptr;
  }
  void Visit(const CNodePtr &cnode) override {
    if (cnode->size() == 0 || !IsValueNode<FuncGraph>(cnode->input(0))) {
      return;
    }
    fg_cnode_ = cnode;
    fg_call_cnode_ = cnode;
    auto &inputs = cnode->inputs();
    fg_ = GetValueNode<FuncGraphPtr>(inputs[0]);
    (void)std::copy(inputs.begin() + 1, inputs.end(), std::back_inserter(args_));
@@ -193,15 +481,19 @@ class IncorporateGetitem : public AnfVisitor {
  void Reset() {
    idx_ = -1;
    fg_ = nullptr;
    fg_cnode_ = nullptr;
    fg_call_cnode_ = nullptr;
    args_.clear();
  }
 private:
  int64_t idx_{-1};
  FuncGraphPtr fg_{nullptr};
  AnfNodePtr fg_cnode_{nullptr};
  CNodePtr fg_call_cnode_{nullptr};
  std::vector<AnfNodePtr> args_{};
  std::set<AnfNodePtr> processed_nodes_;
  std::unordered_map<std::pair<FuncGraphPtr, std::vector<int64_t>>, FuncGraphPtr,
                     internal::FuncGraphIntVectorPairHasher>
    processed_fgs_;
  internal::GetitemTransform getitem_transform_;
 };
@@ -298,7 +590,7 @@ class IncorporateGetitemSwitch : public AnfVisitor {
  IncorporateGetitemSwitch() : getitem_transform_() {}
  ~IncorporateGetitemSwitch() override = default;
  AnfNodePtr operator()(const OptimizerPtr &, const AnfNodePtr &node) override {
  AnfNodePtr operator()(const OptimizerPtr &optimizer, const AnfNodePtr &node) override {
    Reset();
    is_in_get_ = true;
    AnfVisitor::Match(prim::kPrimTupleGetItem, {IsCNode, IsValueNode<Int64Imm>})(node);
@@ -316,33 +608,57 @@ class IncorporateGetitemSwitch : public AnfVisitor {
    if (g2_ == nullptr) {
      return nullptr;
    }
    if (processed_nodes_.find(switch_) != processed_nodes_.end()) {
      MS_LOG(DEBUG) << "fg in switch node has been replaced. node: " << node->DebugString()
                    << ", switch: " << switch_->DebugString();
      return nullptr;
    }
    bool g1_output_is_shrinkable = internal::IsOutputShrinkable(g1_->output());
    bool g2_output_is_shrinkable = internal::IsOutputShrinkable(g2_->output());
    auto tuple_getitem = node->cast<CNodePtr>();
    MS_EXCEPTION_IF_NULL(tuple_getitem);
    bool has_env_type = false;
    if (tuple_getitem->input(1)->abstract() && tuple_getitem->input(1)->abstract()->isa<abstract::AbstractTuple>()) {
      const auto &abs_tuple = *(tuple_getitem->input(1)->abstract()->cast<abstract::AbstractTuplePtr>());
      // eliminate (envinstance, value1, value2, ...) built by bprop func_graph()
      if (abs_tuple.size() >= 1) {
        // Value maybe kAnyValue, so check the type track;
        if (abs_tuple[0]->isa<abstract::AbstractScalar>() && abs_tuple[0]->GetTypeTrack()->isa<EnvType>()) {
          has_env_type = true;
    const auto &switch_call = tuple_getitem->input(1);
    MS_EXCEPTION_IF_NULL(switch_call);
    const auto &switch_call_cnode = switch_call->cast<CNodePtr>();
    MS_EXCEPTION_IF_NULL(switch_call_cnode);
    // If exist env_getitem/env_setitem in this funcgraph or
    // if g1_/g2_ is fprop func_graph and the corresponding bprop funcgraph has any env_getitem or env_setitem;
    std::vector<internal::TpCNodeAndIndex> tp_cnodes_and_index;
    auto switch_call_users_counter = MultipleUseOfSwitch(switch_call, fg, &tp_cnodes_and_index);
    bool multiple_use = (tp_cnodes_and_index.size() > 1);
    if (g1_output_is_shrinkable && g2_output_is_shrinkable && multiple_use &&
        (tp_cnodes_and_index.size() == switch_call_users_counter)) {
      if (!internal::HasMoreJ(optimizer) && !ExistEnvNode(fg) && !ExistEnvNodeInTupleItem(g1_) &&
          !ExistEnvNodeInTupleItem(g2_) && !internal::ShouldTransform(switch_call, tp_cnodes_and_index)) {
        MS_LOG(DEBUG) << "No more j, will shrink. Node: " << node->DebugString()
                      << ", switch: " << switch_->DebugString();
        const auto g1_output_size = internal::GetOutputSize(g1_->output());
        const auto g2_output_size = internal::GetOutputSize(g2_->output());
        if (g1_output_size != g2_output_size) {
          MS_LOG(EXCEPTION) << "output of g1 and g2 should have same tuple size, but g1 output: "
                            << g1_->output()->DebugString() << ", g2 output: " << g2_->output()->DebugString();
        }
      }
      // eliminate (value, bprop_func) built by fprop func_graph
      if (abs_tuple.size() >= 2) {
        if (abs_tuple[1]->isa<abstract::AbstractFunction>()) {
          has_env_type = true;
        if (switch_call_users_counter == g1_output_size) {
          processed_nodes_.emplace(switch_call);
          MS_LOG(DEBUG) << "All elements in output is used, no need to transform, node: " << node->DebugString()
                        << ", switch: " << switch_->DebugString();
          return nullptr;
        }
        auto new_node = ShrinkFuncGraphOutput(node, switch_call_cnode, tp_cnodes_and_index);
        if (new_node != nullptr) {
          return new_node;
        }
      }
    }
    // If exist env_getitem/env_setitem in this funcgraph or
    // if g1_/g2_ is fprop func_graph and the corresponding bprop funcgraph has any env_getitem or env_setitem;
    if (MultipleUseOfSwitch(tuple_getitem->input(1), fg) && !ExistEnvNode(fg) && !ExistEnvNodeInTupleItem(g1_) &&
        !ExistEnvNodeInTupleItem(g2_) && !has_env_type) {
      return nullptr;
    }
    MS_LOG(DEBUG) << "Cannot shrink output, transform_getitem_switch, node: " << node->DebugString()
                  << ", switch: " << switch_->DebugString();
    auto new_g1 = getitem_transform_(node, g1_, idx_);
    auto new_g2 = getitem_transform_(node, g2_, idx_);
    MS_LOG(DEBUG) << "Original fg1: " << g1_->ToString() << ", new_fg1: " << new_g1->ToString();
    MS_LOG(DEBUG) << "Original fg2: " << g2_->ToString() << ", new_fg2: " << new_g2->ToString();
    auto sw_node = fg->NewCNode({NewValueNode(prim::kPrimSwitch), x_, NewValueNode(new_g1), NewValueNode(new_g2)});
    (void)args_.insert(args_.begin(), sw_node);
@@ -350,7 +666,60 @@ class IncorporateGetitemSwitch : public AnfVisitor {
    new_node->set_abstract(node->abstract());
    return new_node;
  }
  AnfNodePtr ShrinkFuncGraphOutput(const AnfNodePtr &node, const CNodePtr &switch_call_cnode,
                                   const std::vector<internal::TpCNodeAndIndex> &tp_cnodes_and_index) {
    const auto &manager = node->func_graph()->manager();
    MS_EXCEPTION_IF_NULL(manager);
    auto switch_cnode = switch_->cast<CNodePtr>();
    MS_EXCEPTION_IF_NULL(switch_cnode);
    std::vector<int64_t> index_vector;
    (void)std::transform(tp_cnodes_and_index.begin(), tp_cnodes_and_index.end(), std::back_inserter(index_vector),
                         [](const auto &cnode_and_index) { return cnode_and_index.index; });
    const auto &iter1 = processed_fgs_.find(std::make_pair(g1_, index_vector));
    const auto &iter2 = processed_fgs_.find(std::make_pair(g2_, index_vector));
    if (iter1 != processed_fgs_.end() && iter2 != processed_fgs_.end()) {
      MS_LOG(DEBUG) << "fg output had been processed, no need to transform, node: " << node->DebugString()
                    << ", switch: " << switch_->DebugString();
      MS_LOG(DEBUG) << "Original fg1: " << g1_->ToString() << ", new_fg1: " << iter1->second->ToString();
      MS_LOG(DEBUG) << "Original fg2: " << g2_->ToString() << ", new_fg2: " << iter2->second->ToString();
      processed_nodes_.emplace(switch_);
      manager->SetEdge(switch_cnode, 2, NewValueNode(iter1->second));
      manager->SetEdge(switch_cnode, 3, NewValueNode(iter2->second));
      auto shrunk_abstract = internal::ShrinkAbstract(switch_call_cnode->abstract(), tp_cnodes_and_index);
      if (shrunk_abstract != nullptr) {
        switch_call_cnode->set_abstract(shrunk_abstract);
      }
      auto new_idx = internal::UpdateUserNodeIndex(switch_call_cnode, idx_, tp_cnodes_and_index);
      auto new_node =
        node->func_graph()->NewCNode({NewValueNode(prim::kPrimTupleGetItem), switch_call_cnode, NewValueNode(new_idx)});
      new_node->set_abstract(node->abstract());
      return new_node;
    }
    const auto &new_g1 = internal::ShrinkUnsedOutput(g1_, tp_cnodes_and_index);
    const auto &new_g2 = internal::ShrinkUnsedOutput(g2_, tp_cnodes_and_index);
    if (new_g1 != nullptr && new_g2 != nullptr) {
      MS_LOG(DEBUG) << "Shrink output. node: " << node->DebugString() << ", switch: " << switch_->DebugString();
      MS_LOG(DEBUG) << "Original fg1: " << g1_->ToString() << ", new_fg1: " << new_g1->ToString();
      MS_LOG(DEBUG) << "Original fg2: " << g2_->ToString() << ", new_fg2: " << new_g2->ToString();
      processed_nodes_.emplace(switch_);
      processed_fgs_.emplace(std::make_pair(g1_, index_vector), new_g1);
      processed_fgs_.emplace(std::make_pair(g2_, index_vector), new_g2);
      manager->SetEdge(switch_cnode, 2, NewValueNode(new_g1));
      manager->SetEdge(switch_cnode, 3, NewValueNode(new_g2));
      auto shrunk_abstract = internal::ShrinkAbstract(switch_call_cnode->abstract(), tp_cnodes_and_index);
      if (shrunk_abstract != nullptr) {
        switch_call_cnode->set_abstract(shrunk_abstract);
      }
      auto new_idx = internal::UpdateUserNodeIndex(switch_call_cnode, idx_, tp_cnodes_and_index);
      auto new_node =
        node->func_graph()->NewCNode({NewValueNode(prim::kPrimTupleGetItem), switch_call_cnode, NewValueNode(new_idx)});
      new_node->set_abstract(node->abstract());
      return new_node;
    }
    MS_LOG(DEBUG) << "Shrink failed. node: " << node->DebugString()
                  << ", switch_call: " << switch_call_cnode->DebugString();
    return nullptr;
  }
  void Visit(const AnfNodePtr &node) override {
    if (is_in_switch_ && x_ == nullptr) {
      x_ = node;
@@ -393,22 +762,51 @@ class IncorporateGetitemSwitch : public AnfVisitor {
  }
 private:
  bool MultipleUseOfSwitch(const AnfNodePtr &switch_call, const FuncGraphPtr &fg) const {
  size_t MultipleUseOfSwitch(const AnfNodePtr &switch_call, const FuncGraphPtr &fg,
                             std::vector<internal::TpCNodeAndIndex> *cnodes_and_index) const {
    auto switch_call_cnode = switch_call->cast<CNodePtr>();
    MS_EXCEPTION_IF_NULL(switch_call_cnode);
    auto manager = fg->manager();
    MS_EXCEPTION_IF_NULL(manager);
    auto &cnode_and_index_vector = *cnodes_and_index;
    std::set<int64_t> index_set;
    std::size_t total_usage = 0;
    auto &node_users_map = manager->node_users();
    auto it = node_users_map.find(switch_call);
    if (it == node_users_map.end()) {
      return false;
      return 0;
    }
    auto &node_users = it->second;
    // If switch was used by more than 1 tuple_getitem nodes, this pass shouldn't be execute.s
    auto tuple_getitem_num = std::count_if(node_users.begin(), node_users.end(), [](std::pair<AnfNodePtr, int> &user) {
      return IsPrimitiveCNode(user.first, prim::kPrimTupleGetItem);
    });
    return tuple_getitem_num > 1;
    // If switch was used by more than 1 tuple_getitem nodes, this pass shouldn't be execute.
    for (auto user : node_users) {
      if (IsPrimitiveCNode(user.first, prim::kPrimTupleGetItem)) {
        auto cnode = user.first->cast<CNodePtr>();
        constexpr auto kInputIndex = 2;
        if (cnode->input(kInputIndex)->isa<ValueNode>()) {
          const auto &idx_node = cnode->input(kInputIndex)->cast<ValueNodePtr>();
          MS_EXCEPTION_IF_NULL(idx_node);
          auto idx = GetValue<int64_t>(idx_node->value());
          cnode_and_index_vector.push_back({cnode, idx});
          index_set.insert(idx);
          total_usage++;
        } else {
          MS_LOG(EXCEPTION) << "Tuple_getitem index is not valuenode, but: " << user.first->DebugString(2);
        }
      } else {
        MS_LOG(DEBUG) << "switch_call user is not tuple_getitem, user: " << user.first->DebugString(2);
      }
    }
    if (index_set.size() != total_usage) {
      MS_LOG(DEBUG) << "some index is duplicated, total_usage: " << total_usage;
      MS_LOG(DEBUG) << "index_set: ";
      for (auto idx : index_set) {
        MS_LOG(DEBUG) << " " << idx;
      }
    }
    // sort by index;
    std::sort(cnode_and_index_vector.begin(), cnode_and_index_vector.end(),
              [](const auto &tp1, const auto &tp2) { return tp1.index < tp2.index; });
    return node_users.size();
  }
  static bool inline ExistEnvNode(const FuncGraphPtr &fg) {
@@ -441,6 +839,10 @@ class IncorporateGetitemSwitch : public AnfVisitor {
  FuncGraphPtr g1_{nullptr}, g2_{nullptr};
  bool is_in_get_{false}, is_in_switch_{false};
  std::vector<AnfNodePtr> args_{};
  std::set<AnfNodePtr> processed_nodes_;
  std::unordered_map<std::pair<FuncGraphPtr, std::vector<int64_t>>, FuncGraphPtr,
                     internal::FuncGraphIntVectorPairHasher>
    processed_fgs_;
  internal::GetitemTransform getitem_transform_;
 };
--- a/mindspore/ccsrc/frontend/parallel/ops_info/virtual_output_info.cc
+++ b/mindspore/ccsrc/frontend/parallel/ops_info/virtual_output_info.cc
@@ -64,8 +64,14 @@ Status VirtualOutputInfo::GenerateStrategies(int64_t stage_id) {
  }
  for (auto &shape : inputs_shape_) {
    Shape temp;
    temp.emplace_back(SizeToLong(total_dev_num));
    (void)temp.insert(temp.end(), shape.size() - 1, 1);
    if (!shape.empty()) {
      if (shape[0] % total_dev_num == 0) {
        temp.emplace_back(SizeToLong(total_dev_num));
      } else {
        temp.emplace_back(1);
      }
      (void)temp.insert(temp.end(), shape.size() - 1, 1);
    }
    strategy.push_back(temp);
  }
  sp = std::make_shared<Strategy>(stage_id, strategy);
--- a/mindspore/ccsrc/frontend/parallel/step_parallel.cc
+++ b/mindspore/ccsrc/frontend/parallel/step_parallel.cc
@@ -2038,7 +2038,12 @@ void SetVirtualDatasetStrategy(const CNodePtr &node) {
      if (shape_list[0][i].empty()) {
        MS_LOG(EXCEPTION) << "shape_list[ " << i << " ].size() is zero";
      }
      Dimensions input_strategy = {dev_num};
      Dimensions input_strategy;
      if (!shape_list[0][i].empty() && shape_list[0][i][0] % dev_num == 0) {
        input_strategy.push_back(dev_num);
      } else if (!shape_list[0][i].empty()) {
        input_strategy.push_back(1);
      }
      for (size_t j = 1; j < shape_list[0][i].size(); j++) {
        input_strategy.push_back(1);
      }
@@ -3222,12 +3227,9 @@ void MarkForwardCNode(const FuncGraphPtr &root) {
  }
 }
 Status ParallelInit() {
  MS_EXCEPTION_IF_NULL(ParallelContext::GetInstance());
 CommInfo GetCommInfo() {
  int64_t device_num = ParallelContext::GetInstance()->device_num();
  int64_t global_rank = ParallelContext::GetInstance()->global_rank();
  int32_t split_stage_num = ParallelContext::GetInstance()->pipeline_stage_split_num();
  std::string parallel_mode = ParallelContext::GetInstance()->parallel_mode();
  auto ms_context = MsContext::GetInstance();
  MS_EXCEPTION_IF_NULL(ms_context);
  std::string backend = ms_context->get_param<std::string>(MS_CTX_DEVICE_TARGET);
@@ -3240,15 +3242,8 @@ Status ParallelInit() {
    world_group = NCCL_WORLD_GROUP;
    communication_backend = NCCL_BACKEND;
  } else {
    MS_LOG(ERROR) << "Invalid communication backend: " << backend;
    return FAILED;
  }
  if (split_stage_num <= 0) {
    MS_LOG(ERROR) << "Invalid stage num " << split_stage_num << ", expected a positive stage number";
    return FAILED;
    MS_LOG(EXCEPTION) << "Invalid communication backend: " << backend;
  }
  uint32_t world_rank_size = 0;
  if (!ParallelContext::GetInstance()->device_num_is_set()) {
    if (!CommManager::GetInstance().GetRankSize(world_group, &world_rank_size)) {
@@ -3266,7 +3261,21 @@ Status ParallelInit() {
    global_rank = UintToInt(rank_id);
    MS_LOG(INFO) << "Get global rank from communication model, the global rank is  " << global_rank;
  }
  CommInfo comm_info{device_num, global_rank, world_group, communication_backend};
  return comm_info;
 }
 Status ParallelInit() {
  MS_EXCEPTION_IF_NULL(ParallelContext::GetInstance());
  int32_t split_stage_num = ParallelContext::GetInstance()->pipeline_stage_split_num();
  std::string parallel_mode = ParallelContext::GetInstance()->parallel_mode();
  if (split_stage_num <= 0) {
    MS_LOG(ERROR) << "Invalid stage num " << split_stage_num << ", expected a positive stage number";
    return FAILED;
  }
  auto comm_info = GetCommInfo();
  int64_t device_num = comm_info.device_num;
  int64_t global_rank = comm_info.global_rank;
  if ((device_num <= 0) || (device_num > MAX_DEVICE_NUM)) {
    MS_LOG(ERROR) << "Invalid device num " << device_num;
    return FAILED;
@@ -3293,13 +3302,14 @@ Status ParallelInit() {
    return FAILED;
  }
  if (!InitDevice(device_num, global_rank, communication_backend, stages)) {
  if (!InitDevice(device_num, global_rank, comm_info.communication_backend, stages)) {
    MS_LOG(ERROR) << "Init device failed";
    return FAILED;
  }
  MS_LOG(INFO) << "The parallel context: dev num: " << device_num << ", global rank: " << global_rank
               << ", backend: " << backend << ", gradients_mean: " << ParallelContext::GetInstance()->gradients_mean()
               << ", communication_backend: " << comm_info.communication_backend
               << ", gradients_mean: " << ParallelContext::GetInstance()->gradients_mean()
               << ", gradient_fp32_sync: " << ParallelContext::GetInstance()->gradient_fp32_sync();
  return SUCCESS;
@@ -3714,7 +3724,13 @@ void ReorderForPipelineSplit(const FuncGraphPtr &root, const FuncGraphManagerPtr
 bool IsInsertVirtualOutput(const FuncGraphPtr &root) {
  MS_EXCEPTION_IF_NULL(ParallelContext::GetInstance());
  return (!root->has_flag(TRAINING) && ParallelContext::GetInstance()->dataset_strategy().empty());
  auto comm_info = GetCommInfo();
  int32_t split_stage_num = ParallelContext::GetInstance()->pipeline_stage_split_num();
  int32_t per_stage_device_num = comm_info.device_num / split_stage_num;
  int32_t current_stage = comm_info.global_rank / per_stage_device_num;
  MS_LOG(INFO) << "The current stage is: " << current_stage;
  return (!root->has_flag(TRAINING) && ParallelContext::GetInstance()->dataset_strategy().empty() &&
          current_stage == split_stage_num - 1);
 }
 bool StepParallel(const FuncGraphPtr &root, const opt::OptimizerPtr &optimizer) {
--- a/mindspore/ccsrc/frontend/parallel/step_parallel.h
+++ b/mindspore/ccsrc/frontend/parallel/step_parallel.h
@@ -47,6 +47,13 @@ struct LossNodeInfo {
  CNodePtr loss_node = nullptr;
 };
 struct CommInfo {
  int64_t device_num = 1;
  int64_t global_rank = 0;
  std::string world_group;
  std::string communication_backend;
 };
 struct ParameterSliceInfo {
  Shape slice_shape;
  RankList group_ranks;
@@ -178,6 +185,8 @@ void InsertVirtualOutput(const FuncGraphPtr &root, const std::vector<AnfNodePtr>
 std::string MirrorOpName();
 CommInfo GetCommInfo();
 void ReorderForPipelineSplit(const FuncGraphPtr &root, const FuncGraphManagerPtr &manager, int64_t pipeline_stages);
 }  // namespace parallel
 }  // namespace mindspore
--- a/mindspore/ccsrc/minddata/dataset/api/python/bindings/dataset/kernels/ir/image/bindings.cc
+++ b/mindspore/ccsrc/minddata/dataset/api/python/bindings/dataset/kernels/ir/image/bindings.cc
@@ -18,6 +18,7 @@
 #include "minddata/dataset/api/python/pybind_register.h"
 #include "minddata/dataset/include/dataset/transforms.h"
 #include "minddata/dataset/kernels/ir/vision/adjust_gamma_ir.h"
 #include "minddata/dataset/kernels/ir/vision/auto_contrast_ir.h"
 #include "minddata/dataset/kernels/ir/vision/bounding_box_augment_ir.h"
 #include "minddata/dataset/kernels/ir/vision/center_crop_ir.h"
@@ -67,6 +68,17 @@
 namespace mindspore {
 namespace dataset {
 PYBIND_REGISTER(
  AdjustGammaOperation, 1, ([](const py::module *m) {
    (void)py::class_<vision::AdjustGammaOperation, TensorOperation, std::shared_ptr<vision::AdjustGammaOperation>>(
      *m, "AdjustGammaOperation")
      .def(py::init([](float gamma, float gain) {
        auto ajust_gamma = std::make_shared<vision::AdjustGammaOperation>(gamma, gain);
        THROW_IF_ERROR(ajust_gamma->ValidateParams());
        return ajust_gamma;
      }));
  }));
 PYBIND_REGISTER(
  AutoContrastOperation, 1, ([](const py::module *m) {
    (void)py::class_<vision::AutoContrastOperation, TensorOperation, std::shared_ptr<vision::AutoContrastOperation>>(
--- a/mindspore/ccsrc/minddata/dataset/api/vision.cc
+++ b/mindspore/ccsrc/minddata/dataset/api/vision.cc
@@ -21,6 +21,7 @@
 #endif
 #include "minddata/dataset/include/dataset/transforms.h"
 #include "minddata/dataset/kernels/ir/vision/adjust_gamma_ir.h"
 #include "minddata/dataset/kernels/ir/vision/affine_ir.h"
 #include "minddata/dataset/kernels/ir/vision/auto_contrast_ir.h"
 #include "minddata/dataset/kernels/ir/vision/bounding_box_augment_ir.h"
@@ -118,6 +119,19 @@ std::shared_ptr<TensorOperation> Affine::Parse() {
 }
 #ifndef ENABLE_ANDROID
 // AdjustGamma Transform Operation.
 struct AdjustGamma::Data {
  Data(float gamma, float gain) : gamma_(gamma), gain_(gain) {}
  float gamma_;
  float gain_;
 };
 AdjustGamma::AdjustGamma(float gamma, float gain) : data_(std::make_shared<Data>(gamma, gain)) {}
 std::shared_ptr<TensorOperation> AdjustGamma::Parse() {
  return std::make_shared<AdjustGammaOperation>(data_->gamma_, data_->gain_);
 }
 // AutoContrast Transform Operation.
 struct AutoContrast::Data {
  Data(float cutoff, const std::vector<uint32_t> &ignore) : cutoff_(cutoff), ignore_(ignore) {}
--- a/mindspore/ccsrc/minddata/dataset/include/dataset/vision.h
+++ b/mindspore/ccsrc/minddata/dataset/include/dataset/vision.h
@@ -57,7 +57,31 @@ class AutoContrast final : public TensorTransform {
  std::shared_ptr<Data> data_;
 };
 /// \brief Apply a given image transform on a random selection of bounding box regions of a given image.
 /// \brief AdjustGamma TensorTransform.
 /// \notes Apply gamma correction on input image.
 class AdjustGamma final : public TensorTransform {
 public:
  /// \brief Constructor.
  /// \param[in] gamma Non negative real number, which makes the output image pixel value
  ///     exponential in relation to the input image pixel value.
  /// \param[in] gain The constant multiplier.
  explicit AdjustGamma(float gamma, float gain = 1);
  /// \brief Destructor.
  ~AdjustGamma() = default;
 protected:
  /// \brief Function to convert TensorTransform object into a TensorOperation object.
  /// \return Shared pointer to TensorOperation object.
  std::shared_ptr<TensorOperation> Parse() override;
 private:
  struct Data;
  std::shared_ptr<Data> data_;
 };
 /// \brief BoundingBoxAugment TensorTransform.
 /// \note  Apply a given image transform on a random selection of bounding box regions of a given image.
 class BoundingBoxAugment final : public TensorTransform {
 public:
  /// \brief Constructor.
--- a/mindspore/ccsrc/minddata/dataset/kernels/image/CMakeLists.txt
+++ b/mindspore/ccsrc/minddata/dataset/kernels/image/CMakeLists.txt
@@ -6,6 +6,7 @@ if(ENABLE_ACL)
    add_subdirectory(dvpp)
 endif()
 add_library(kernels-image OBJECT
    adjust_gamma_op.cc
    affine_op.cc
    auto_contrast_op.cc
    bounding_box.cc
--- a/mindspore/ccsrc/minddata/dataset/kernels/image/adjust_gamma_op.cc
+++ b/mindspore/ccsrc/minddata/dataset/kernels/image/adjust_gamma_op.cc
@@ -0,0 +1,43 @@
 /**
 * Copyright 2021 Huawei Technologies Co., Ltd
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 * http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */
 #include "minddata/dataset/kernels/image/adjust_gamma_op.h"
 #include <memory>
 #include "minddata/dataset/kernels/data/data_utils.h"
 #include "minddata/dataset/kernels/image/image_utils.h"
 namespace mindspore {
 namespace dataset {
 const float AdjustGammaOp::kGain = 1.0;
 Status AdjustGammaOp::Compute(const std::shared_ptr<Tensor> &input, std::shared_ptr<Tensor> *output) {
  IO_CHECK(input, output);
  // typecast
  CHECK_FAIL_RETURN_UNEXPECTED(input->type() != DataType::DE_STRING,
                               "AdjustGamma: input tensor type should be [int, float, double], but got string.");
  if (input->type().IsFloat()) {
    std::shared_ptr<Tensor> input_tensor;
    RETURN_IF_NOT_OK(TypeCast(input, &input_tensor, DataType(DataType::DE_FLOAT32)));
    return AdjustGamma(input_tensor, output, gamma_, gain_);
  } else {
    return AdjustGamma(input, output, gamma_, gain_);
  }
 }
 }  // namespace dataset
 }  // namespace mindspore
--- a/mindspore/ccsrc/minddata/dataset/kernels/image/adjust_gamma_op.h
+++ b/mindspore/ccsrc/minddata/dataset/kernels/image/adjust_gamma_op.h
@@ -0,0 +1,55 @@
 /**
 * Copyright 2021 Huawei Technologies Co., Ltd
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 * http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */
 #ifndef MINDSPORE_CCSRC_MINDDATA_DATASET_KERNELS_IMAGE_ADJUST_GAMMA_OP_H_
 #define MINDSPORE_CCSRC_MINDDATA_DATASET_KERNELS_IMAGE_ADJUST_GAMMA_OP_H_
 #include <memory>
 #include <string>
 #include "minddata/dataset/core/tensor.h"
 #include "minddata/dataset/core/cv_tensor.h"
 #include "minddata/dataset/kernels/tensor_op.h"
 #include "minddata/dataset/util/status.h"
 namespace mindspore {
 namespace dataset {
 class AdjustGammaOp : public TensorOp {
 public:
  /// Default gain to be used
  static const float kGain;
  AdjustGammaOp(const float &gamma, const float &gain) : gamma_(gamma), gain_(gain) {}
  ~AdjustGammaOp() override = default;
  /// Provide stream operator for displaying it
  friend std::ostream &operator<<(std::ostream &out, const AdjustGammaOp &so) {
    so.Print(out);
    return out;
  }
  Status Compute(const std::shared_ptr<Tensor> &input, std::shared_ptr<Tensor> *output) override;
  std::string Name() const override { return kAdjustGammaOp; }
 private:
  float gamma_;
  float gain_;
 };
 }  // namespace dataset
 }  // namespace mindspore
 #endif  // MINDSPORE_CCSRC_MINDDATA_DATASET_KERNELS_IMAGE_ADJUST_GAMMA_OP_H_
--- a/mindspore/ccsrc/minddata/dataset/kernels/image/image_utils.cc
+++ b/mindspore/ccsrc/minddata/dataset/kernels/image/image_utils.cc
@@ -872,6 +872,64 @@ Status AdjustContrast(const std::shared_ptr<Tensor> &input, std::shared_ptr<Tens
  return Status::OK();
 }
 Status AdjustGamma(const std::shared_ptr<Tensor> &input, std::shared_ptr<Tensor> *output, const float &gamma,
                   const float &gain) {
  try {
    int num_channels = 1;
    if (input->Rank() < 2) {
      RETURN_STATUS_UNEXPECTED("AdjustGamma: image shape is not <...,H,W,C> or <H,W>.");
    }
    if (input->Rank() > 2) {
      num_channels = input->shape()[-1];
    }
    if (num_channels != 1 && num_channels != 3) {
      RETURN_STATUS_UNEXPECTED("AdjustGamma: channel of input image should be 1 or 3.");
    }
    if (input->type().IsFloat()) {
      for (auto itr = input->begin<float>(); itr != input->end<float>(); itr++) {
        *itr = pow((*itr) * gain, gamma);
        *itr = std::min(std::max((*itr), 0.0f), 1.0f);
      }
      *output = input;
    } else {
      std::shared_ptr<CVTensor> input_cv = CVTensor::AsCVTensor(input);
      if (!input_cv->mat().data) {
        RETURN_STATUS_UNEXPECTED("AdjustGamma: load image failed.");
      }
      cv::Mat input_img = input_cv->mat();
      std::shared_ptr<CVTensor> output_cv;
      RETURN_IF_NOT_OK(CVTensor::CreateEmpty(input_cv->shape(), input_cv->type(), &output_cv));
      uchar LUT[256] = {};
      for (int i = 0; i < 256; i++) {
        float f = i / 255.0;
        f = pow(f, gamma);
        LUT[i] = static_cast<uchar>(floor(std::min(f * (255.0 + 1 - 1e-3) * gain, 255.0)));
      }
      if (input_img.channels() == 1) {
        cv::MatIterator_<uchar> it = input_img.begin<uchar>();
        cv::MatIterator_<uchar> it_end = input_img.end<uchar>();
        for (; it != it_end; ++it) {
          *it = LUT[(*it)];
        }
      } else {
        cv::MatIterator_<cv::Vec3b> it = input_img.begin<cv::Vec3b>();
        cv::MatIterator_<cv::Vec3b> it_end = input_img.end<cv::Vec3b>();
        for (; it != it_end; ++it) {
          (*it)[0] = LUT[(*it)[0]];
          (*it)[1] = LUT[(*it)[1]];
          (*it)[2] = LUT[(*it)[2]];
        }
      }
      output_cv->mat() = input_img * 1;
      *output = std::static_pointer_cast<Tensor>(output_cv);
    }
  } catch (const cv::Exception &e) {
    RETURN_STATUS_UNEXPECTED("AdjustGamma: " + std::string(e.what()));
  }
  return Status::OK();
 }
 Status AutoContrast(const std::shared_ptr<Tensor> &input, std::shared_ptr<Tensor> *output, const float &cutoff,
                    const std::vector<uint32_t> &ignore) {
  try {
--- a/mindspore/ccsrc/minddata/dataset/kernels/image/image_utils.h
+++ b/mindspore/ccsrc/minddata/dataset/kernels/image/image_utils.h
@@ -234,6 +234,16 @@ Status AdjustContrast(const std::shared_ptr<Tensor> &input, std::shared_ptr<Tens
 Status AutoContrast(const std::shared_ptr<Tensor> &input, std::shared_ptr<Tensor> *output, const float &cutoff,
                    const std::vector<uint32_t> &ignore);
 /// \brief Returns image with gamma correction.
 /// \param[in] input: Tensor of shape <H,W,3>/<H,W,1>/<H,W> in RGB/Grayscale and any OpenCV compatible type,
 ///     see CVTensor.
 /// \param[in] gamma: Non negative real number, same as gamma in the equation. gamma larger than 1 make the shadows
 ///     darker, while gamma smaller than 1 make dark regions lighter.
 /// \param[in] gain: The constant multiplier.
 /// \param[out] output: Adjusted image of same shape and type.
 Status AdjustGamma(const std::shared_ptr<Tensor> &input, std::shared_ptr<Tensor> *output, const float &gamma,
                   const float &gain);
 /// \brief Returns image with adjusted saturation.
 /// \param input: Tensor of shape <H,W,3> in RGB order and any OpenCv compatible type, see CVTensor.
 /// \param alpha: Alpha value to adjust saturation by. Should be a positive number.
--- a/mindspore/ccsrc/minddata/dataset/kernels/ir/validators.cc
+++ b/mindspore/ccsrc/minddata/dataset/kernels/ir/validators.cc
@@ -38,6 +38,11 @@ Status ValidateFloatScalarPositive(const std::string &op_name, const std::string
  return Status::OK();
 }
 Status ValidateFloatScalarNonNegative(const std::string &op_name, const std::string &scalar_name, float scalar) {
  RETURN_IF_NOT_OK(ValidateScalar(op_name, scalar_name, scalar, {0}, false));
  return Status::OK();
 }
 Status ValidateVectorFillvalue(const std::string &op_name, const std::vector<uint8_t> &fill_value) {
  if (fill_value.empty() || (fill_value.size() != 1 && fill_value.size() != 3)) {
    std::string err_msg =
--- a/mindspore/ccsrc/minddata/dataset/kernels/ir/validators.h
+++ b/mindspore/ccsrc/minddata/dataset/kernels/ir/validators.h
@@ -36,6 +36,9 @@ Status ValidateIntScalarPositive(const std::string &op_name, const std::string &
 // Helper function to positive float scalar
 Status ValidateFloatScalarPositive(const std::string &op_name, const std::string &scalar_name, float scalar);
 // Helper function to non-negative float scalar
 Status ValidateFloatScalarNonNegative(const std::string &op_name, const std::string &scalar_name, float scalar);
 // Helper function to validate scalar
 template <typename T>
 Status ValidateScalar(const std::string &op_name, const std::string &scalar_name, const T scalar,
--- a/mindspore/ccsrc/minddata/dataset/kernels/ir/vision/CMakeLists.txt
+++ b/mindspore/ccsrc/minddata/dataset/kernels/ir/vision/CMakeLists.txt
@@ -2,6 +2,7 @@ file(GLOB_RECURSE _CURRENT_SRC_FILES RELATIVE ${CMAKE_CURRENT_SOURCE_DIR} "*.cc"
 set_property(SOURCE ${_CURRENT_SRC_FILES} PROPERTY COMPILE_DEFINITIONS SUBMODULE_ID=mindspore::SubModuleId::SM_MD)
 set(DATASET_KERNELS_IR_VISION_SRC_FILES
        adjust_gamma_ir.cc
        affine_ir.cc
        auto_contrast_ir.cc
        bounding_box_augment_ir.cc
--- a/mindspore/ccsrc/minddata/dataset/kernels/ir/vision/adjust_gamma_ir.cc
+++ b/mindspore/ccsrc/minddata/dataset/kernels/ir/vision/adjust_gamma_ir.cc
@@ -0,0 +1,58 @@
 /**
 * Copyright 2021 Huawei Technologies Co., Ltd
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 * http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */
 #include <algorithm>
 #include "minddata/dataset/kernels/ir/vision/adjust_gamma_ir.h"
 #ifndef ENABLE_ANDROID
 #include "minddata/dataset/kernels/image/adjust_gamma_op.h"
 #endif
 #include "minddata/dataset/kernels/ir/validators.h"
 namespace mindspore {
 namespace dataset {
 namespace vision {
 #ifndef ENABLE_ANDROID
 // AdjustGammaOperation
 AdjustGammaOperation::AdjustGammaOperation(float gamma, float gain) : gamma_(gamma), gain_(gain) {}
 Status AdjustGammaOperation::ValidateParams() {
  // gamma
  RETURN_IF_NOT_OK(ValidateFloatScalarNonNegative("AdjustGamma", "gamma", gamma_));
  return Status::OK();
 }
 std::shared_ptr<TensorOp> AdjustGammaOperation::Build() {
  std::shared_ptr<AdjustGammaOp> tensor_op = std::make_shared<AdjustGammaOp>(gamma_, gain_);
  return tensor_op;
 }
 Status AdjustGammaOperation::to_json(nlohmann::json *out_json) {
  nlohmann::json args;
  args["gamma"] = gamma_;
  args["gain"] = gain_;
  *out_json = args;
  return Status::OK();
 }
 #endif
 }  // namespace vision
 }  // namespace dataset
 }  // namespace mindspore
--- a/mindspore/ccsrc/minddata/dataset/kernels/ir/vision/adjust_gamma_ir.h
+++ b/mindspore/ccsrc/minddata/dataset/kernels/ir/vision/adjust_gamma_ir.h
@@ -0,0 +1,60 @@
 /**
 * Copyright 2021 Huawei Technologies Co., Ltd
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 * http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */
 #ifndef MINDSPORE_CCSRC_MINDDATA_DATASET_KERNELS_IR_VISION_ADJUST_GAMMA_IR_H_
 #define MINDSPORE_CCSRC_MINDDATA_DATASET_KERNELS_IR_VISION_ADJUST_GAMMA_IR_H_
 #include <map>
 #include <memory>
 #include <string>
 #include <utility>
 #include <vector>
 #include "include/api/status.h"
 #include "minddata/dataset/include/dataset/constants.h"
 #include "minddata/dataset/include/dataset/transforms.h"
 #include "minddata/dataset/kernels/ir/tensor_operation.h"
 namespace mindspore {
 namespace dataset {
 namespace vision {
 constexpr char kAdjustGammaOperation[] = "AdjustGamma";
 class AdjustGammaOperation : public TensorOperation {
 public:
  explicit AdjustGammaOperation(float gamma, float gain);
  ~AdjustGammaOperation() = default;
  std::shared_ptr<TensorOp> Build() override;
  Status ValidateParams() override;
  std::string Name() const override { return kAdjustGammaOperation; }
  Status to_json(nlohmann::json *out_json) override;
 private:
  float gamma_;
  float gain_;
 };
 }  // namespace vision
 }  // namespace dataset
 }  // namespace mindspore
 #endif  // MINDSPORE_CCSRC_MINDDATA_DATASET_KERNELS_IR_VISION_ADJUST_GAMMA_IR_H_
--- a/mindspore/ccsrc/minddata/dataset/kernels/tensor_op.h
+++ b/mindspore/ccsrc/minddata/dataset/kernels/tensor_op.h
@@ -53,6 +53,7 @@ namespace dataset {
 constexpr char kTensorOp[] = "TensorOp";
 // image
 constexpr char kAdjustGammaOp[] = "AdjustGammaOp";
 constexpr char kAffineOp[] = "AffineOp";
 constexpr char kAutoContrastOp[] = "AutoContrastOp";
 constexpr char kBoundingBoxAugmentOp[] = "BoundingBoxAugmentOp";
--- a/mindspore/ccsrc/pipeline/jit/action.cc
+++ b/mindspore/ccsrc/pipeline/jit/action.cc
@@ -593,9 +593,18 @@ bool TaskEmitAction(const ResourcePtr &res) {
    context_ptr->set_param<bool>(MS_CTX_ENABLE_LOOP_SINK, false);
  } else if (context_ptr->get_param<int>(MS_CTX_EXECUTION_MODE) != kPynativeMode) {
    std::string device_target = context_ptr->get_param<std::string>(MS_CTX_DEVICE_TARGET);
    if (device_target == kAscendDevice && backend != kMsVm) {
    auto manager = func_graph->manager();
    auto graphs = manager->func_graphs();
    bool exist_while =
      std::any_of(graphs.cbegin(), graphs.cend(), [](const FuncGraphPtr &fg) { return fg->recursive(); });
    if (device_target == kAscendDevice && backend != kMsVm && !exist_while) {
      MS_LOG(INFO) << "Run graph mode with multigraph sink.";
      bc_ptr->set_is_multi_graph_sink(true);
      context_ptr->set_param<bool>(MS_CTX_IS_MULTI_GRAPH_SINK, true);
    } else {
      MS_LOG(INFO) << "Run graph mode with vm.";
      bc_ptr->set_is_multi_graph_sink(false);
      context_ptr->set_param<bool>(MS_CTX_IS_MULTI_GRAPH_SINK, false);
    }
  }
--- a/mindspore/ccsrc/pipeline/jit/pass.cc
+++ b/mindspore/ccsrc/pipeline/jit/pass.cc
@@ -290,6 +290,7 @@ OptPassGroupMap GetOptPassesA(const opt::irpass::OptimizeIRPassLib &irpass) {
      irpass.merge_addn_,
      irpass.float_tuple_getitem_switch_,
      irpass.float_env_getitem_switch_,
      irpass.inline_,
      irpass.incorporate_getitem_set_,
      irpass.incorporate_call_,
      irpass.incorporate_call_switch_,
--- a/mindspore/ccsrc/pipeline/jit/pipeline.cc
+++ b/mindspore/ccsrc/pipeline/jit/pipeline.cc
@@ -142,20 +142,21 @@ std::string GetCompileExceptionInfo() {
  return oss.str();
 }
 void SetGpuLoopSink(const ResourcePtr &resource) {
 void SetLoopCount(const ResourcePtr &resource) {
  MS_EXCEPTION_IF_NULL(resource);
  auto func_graph = resource->func_graph();
  if (func_graph != nullptr && func_graph->manager() != nullptr) {
    auto manager = func_graph->manager();
    size_t graph_nums = manager->func_graphs().size();
    int64_t sinksize = ConfigManager::GetInstance().iter_num();
    if (graph_nums == 1 || MsContext::GetInstance()->get_param<bool>(MS_CTX_ENABLE_MINDRT)) {
      resource->set_gpu_loopsink(true, sinksize);
    } else {
      resource->set_gpu_loopsink(false, sinksize);
    int64_t loop_size = ConfigManager::GetInstance().iter_num();
    const auto context_ptr = MsContext::GetInstance();
    if (context_ptr->get_param<std::string>(MS_CTX_DEVICE_TARGET) == kAscendDevice) {
      resource->set_vm_loop(!context_ptr->get_param<bool>(MS_CTX_IS_MULTI_GRAPH_SINK), loop_size);
    } else if (context_ptr->get_param<std::string>(MS_CTX_DEVICE_TARGET) == kGPUDevice) {
      bool run_with_mind_rt = graph_nums == 1 || context_ptr->get_param<bool>(MS_CTX_ENABLE_MINDRT);
      resource->set_vm_loop(!run_with_mind_rt, loop_size);
    }
    MS_LOG(INFO) << "Change gpu_loopsink_flag_ to " << resource->gpu_loopsink_flag() << ", set loopsink size to "
                 << sinksize;
    MS_LOG(INFO) << "Change vm_loop_flag to " << resource->vm_loop_flag() << ", set loop_size to " << loop_size;
  }
 }
@@ -206,7 +207,8 @@ void CacheFuncGraph(const ResourcePtr &resource) {
  ChangeFileMode(realpath.value(), S_IRWXU);
  std::ofstream fout(realpath.value());
  if (!fout.is_open()) {
    MS_LOG(EXCEPTION) << "Open cache file '" << realpath.value() << "' failed!";
    MS_LOG(EXCEPTION) << "Open cache file '" << realpath.value() << "' failed!"
                      << " Errno:" << errno << " ErrInfo:" << strerror(errno);
  }
  FuncGraphPtr fg = resource->func_graph();
  mind_ir::ModelProto fg_model = GetBinaryProto(fg, true);
@@ -825,7 +827,7 @@ void Pipeline::Run(const std::string &phase_s) {
        MS_LOG(DEBUG) << "Action " << action.first << " end.";
      };
      if (action.first == "task_emit") {
        SetGpuLoopSink(resource_);
        SetLoopCount(resource_);
      } else if (action.first == "validate") {
        CacheValidateFuncGraph(phase_s, resource_);
      }
@@ -1001,13 +1003,17 @@ py::object ExecutorPy::Run(const py::tuple &args, const py::object &phase) {
    MS_LOG(EXCEPTION) << "Can't find run graph func for " << phase_s;
  }
  // Set loopsink size for each phase.
  bool is_loopsink = info_[phase_s]->resource->gpu_loopsink_flag();
  int64_t sinksize = info_[phase_s]->resource->gpu_loopsink_size();
  ConfigManager::GetInstance().set_gpu_loopsink_size(is_loopsink ? sinksize : 1);
  // If target is not gpu or is loopsink, keep vmloop 1.
  bool g = (MsContext::GetInstance()->get_param<std::string>(MS_CTX_DEVICE_TARGET) == kGPUDevice);
  int64_t vm_loop = (!g || is_loopsink) ? 1 : sinksize;
  MS_LOG(INFO) << "VM loop size " << vm_loop << ", loopsink size " << (is_loopsink ? sinksize : 1);
  bool vm_loop_flag = info_[phase_s]->resource->vm_loop_flag();
  int64_t loop_size = info_[phase_s]->resource->loop_size();
  int64_t vm_loop = 1;
  if (vm_loop_flag) {
    vm_loop = loop_size;
  } else {
    // Set the loop size in config if graphs nums is 1(is_loop_sin=True), then there will be a loop embrace
    // 'Execute(graph)' in GPUSession.
    ConfigManager::GetInstance().set_gpu_loopsink_size(loop_size);
  }
  MS_LOG(INFO) << "VM loop size " << vm_loop << ", loopsink size " << vm_loop;
  py::object ret;
  MS_LOG(DEBUG) << "Eval run" << backend;
  for (int64_t i = 0; i < vm_loop; i++) {
@@ -1157,9 +1163,6 @@ bool InitExecDatasetVm(const std::string &queue_name, int64_t size, int64_t batc
  // Convert CNodeList to LinConvertResult.
  auto segment = std::make_shared<GraphSegment>(std::vector<AnfNodePtr>{app_init}, false);
  auto runner = convert_fn(segment, "");
  if (MsContext::GetInstance()->get_param<int>(MS_CTX_EXECUTION_MODE) != kPynativeMode) {
    backend->Link(runner.graph_id);
  }
  ConfigManager::GetInstance().set_iter_num(size);
  // PS cache does not support loop sink.
 #if ((defined ENABLE_CPU) && (!defined _WIN32))
--- a/mindspore/ccsrc/pipeline/jit/resource.h
+++ b/mindspore/ccsrc/pipeline/jit/resource.h
@@ -75,12 +75,12 @@ class Resource : public ResourceBase {
  const abstract::AbstractBasePtrList &args_spec() const { return args_spec_; }
  void set_args_spec(const abstract::AbstractBasePtrList &args_spec) { args_spec_ = args_spec; }
  void set_gpu_loopsink(const bool &flag, const int64_t size) {
    gpu_loopsink_flag_ = flag;
    gpu_loopsink_size_ = size;
  void set_vm_loop(const bool &flag, const int64_t size) {
    vm_loop_flag_ = flag;
    loop_size_ = size;
  }
  bool gpu_loopsink_flag() { return gpu_loopsink_flag_; }
  int64_t gpu_loopsink_size() { return gpu_loopsink_size_; }
  bool vm_loop_flag() { return vm_loop_flag_; }
  int64_t loop_size() { return loop_size_; }
  // Reclaim resource and clear the cache.
  // ExecutorPy::Compile() can be called multiple times, so cache
  // should be cleared.
@@ -92,8 +92,8 @@ class Resource : public ResourceBase {
  abstract::AbstractBasePtrList args_spec_;
  py::object input_;
  bool is_cleaned_;
  bool gpu_loopsink_flag_{false};
  int64_t gpu_loopsink_size_{1};
  bool vm_loop_flag_{false};
  int64_t loop_size_{1};
 };
 using ResourcePtr = std::shared_ptr<pipeline::Resource>;
--- a/mindspore/ccsrc/pipeline/jit/static_analysis/order_enforce.cc
+++ b/mindspore/ccsrc/pipeline/jit/static_analysis/order_enforce.cc
@@ -76,15 +76,17 @@ class OrderEnforcer {
    }
  }
  bool CheckMakeTupleHaveLoad(const CNodePtr &cnode) {
  std::unordered_set<AnfNodePtr> CheckMakeTupleHaveLoad(const CNodePtr &cnode) {
    MS_EXCEPTION_IF_NULL(cnode);
    std::unordered_set<AnfNodePtr> loads;
    auto inputs = cnode->inputs();
    for (size_t index = 1; index < inputs.size(); index++) {
      auto input = cnode->input(index);
      if (IsPrimitiveCNode(input, prim::kPrimLoad)) {
        return true;
        loads.insert(input);
      }
    }
    return false;
    return loads;
  }
  std::vector<AnfNodePtr> FindUpdateStateUsers(const CNodePtr &cnode) {
@@ -155,23 +157,31 @@ class OrderEnforcer {
  // u3 = UpdateState(u', maketuple2, addn) # need put addn or other-op into u3 inputs
  // assign = Assign(para2, inputs, u3)
  void HandleMakeTupleUsers(const AnfNodePtr &node) {
    MS_EXCEPTION_IF_NULL(node);
    auto maketuple = node->cast<CNodePtr>();
    MS_EXCEPTION_IF_NULL(maketuple);
    if (CheckMakeTupleHaveLoad(maketuple)) {
    std::unordered_set<AnfNodePtr> loads = CheckMakeTupleHaveLoad(maketuple);
    if (!loads.empty()) {
      auto update_state = FindLastUpdateState(maketuple);
      if (update_state != nullptr) {
        std::unordered_set<AnfNodePtr> maketuple_users = GetSpecialOperatorRealUsers(maketuple);
        std::unordered_set<AnfNodePtr> no_push_maketuple_users;
        std::unordered_set<AnfNodePtr> no_push_all_users;
        // Push and Pull at the end of the execution order,
        // In order to ensure push and pull operator cut into the same graph, do not put push operator into updatestate
        for (auto maketuple_user : maketuple_users) {
          if (!IsPrimitiveCNode(maketuple_user, prim::kPrimPush)) {
            no_push_maketuple_users.insert(maketuple_user);
            no_push_all_users.insert(maketuple_user);
          }
        }
        for (auto load : loads) {
          std::unordered_set<AnfNodePtr> load_users = GetSpecialOperatorRealUsers(load);
          for (auto load_user : load_users) {
            no_push_all_users.insert(load_user);
          }
        }
        auto update_state_cnode = update_state->cast<CNodePtr>();
        MS_EXCEPTION_IF_NULL(update_state_cnode);
        AddInputEdges(update_state_cnode, no_push_maketuple_users);
        AddInputEdges(update_state_cnode, no_push_all_users);
      }
    }
  }
@@ -265,6 +275,8 @@ class OrderEnforcer {
  // Add load users as input edges of the update_state node.
  void AddInputEdges(const CNodePtr &update_state, const std::unordered_set<AnfNodePtr> &load_users) {
    auto sorted_load_users = SortLoadUsers(load_users);
    auto inputs = update_state->inputs();
    size_t origin_size = inputs.size();
    for (auto &load_user : sorted_load_users) {
      if (IsPrimitiveCNode(load_user, prim::kPrimMakeTuple) || IsPrimitiveCNode(load_user, prim::kPrimUpdateState)) {
        continue;
@@ -272,10 +284,16 @@ class OrderEnforcer {
      if (!IsDependOn(load_user, update_state)) {
        processed_nodes_.insert(load_user);
        if (!IsInUpdateState(load_user, update_state)) {
          manager_->AddEdge(update_state, load_user);
          inputs.emplace_back(load_user);
        }
      }
    }
    if (inputs.size() > origin_size) {
      auto new_update_state = func_graph_->NewCNode(inputs);
      new_update_state->set_abstract(update_state->abstract());
      new_update_state->set_scope(update_state->scope());
      manager_->Replace(update_state, new_update_state);
    }
  }
  // Sort load users by their topo sort order.
--- a/mindspore/ccsrc/runtime/device/ascend/ascend_kernel_runtime.cc
+++ b/mindspore/ccsrc/runtime/device/ascend/ascend_kernel_runtime.cc
@@ -47,7 +47,6 @@
 #include "backend/optimizer/mem_reuse/mem_reuse_checker.h"
 #include "debug/env_config_parser.h"
 #endif
 #include "runtime/device/ascend/executor/tiling/op_tiling_calculater.h"
 #include "runtime/device/ascend/executor/hccl_dynamic_kernel.h"
 #include "utils/config_manager.h"
 #include "runtime/device/ascend/profiling/reporter/op_name_task_stream_reporter.h"
@@ -250,6 +249,8 @@ void AscendKernelRuntime::ReleaseDeviceRes() {
  MS_EXCEPTION_IF_NULL(context_ptr);
  uint32_t device_id = context_ptr->get_param<uint32_t>(MS_CTX_DEVICE_ID);
  // DestroyHccl must be called before FreeDeviceMemory
  (void)DestroyHccl();
  if (mem_manager_ != nullptr) {
    mem_manager_->FreeDeviceMemory();
  }
@@ -259,7 +260,6 @@ void AscendKernelRuntime::ReleaseDeviceRes() {
    MS_LOG(EXCEPTION) << "Reg SetTaskFailCallback failed, error: " << rt_ret;
  }
  (void)DestroyHccl();
  (void)ResetDevice(device_id);
  (void)ProfilingManager::GetInstance().StopProfiling();
  current_graph_ = nullptr;
@@ -291,9 +291,7 @@ bool AscendKernelRuntime::Init() {
    MS_LOG(WARNING) << "Init ErrorManager failed.";
  }
  try {
    OpTilingCalculater::GetInstance().Init();
    // Start up profiling before rtSetDevice
    bool ret = InitDevice();
    if (!ret) {
      return ret;
@@ -876,7 +874,7 @@ bool AscendKernelRuntime::HcclInit() {
  }
  MS_LOG(INFO) << "MINDSPORE_HCCL_CONFIG_PATH : " << full_path << ", RANK_ID: " << rank_id_str;
  bool ret = hccl::HcclAdapter::GetInstance().InitHccl(context_ptr->get_param<uint32_t>(MS_CTX_DEVICE_ID), rank_id_str,
                                                       full_path);
                                                       full_path, mode == kGraphMode);
  free(full_path);
  if (!ret) {
    MS_LOG(ERROR) << "Hcom init failed.";
--- a/mindspore/ccsrc/runtime/device/ascend/ascend_stream_assign.cc
+++ b/mindspore/ccsrc/runtime/device/ascend/ascend_stream_assign.cc
@@ -1992,6 +1992,28 @@ CNodePtr AscendStreamAssign::CreateRecvApplyKernel(const NotNull<KernelGraphPtr>
  return recv_node_ptr;
 }
 bool AscendStreamAssign::IsNopNodeTarget(const AnfNodePtr &nop_node, const CNodePtr &target_node,
                                         const CNodePtr &cur_node, bool exclude_hcom) {
  MS_EXCEPTION_IF_NULL(nop_node);
  auto cnode = nop_node->cast<CNodePtr>();
  auto new_inputs = cnode->inputs();
  for (size_t i = 1; i < new_inputs.size(); i++) {
    if (opt::IsNopNode(new_inputs[i])) {
      if (IsNopNodeTarget(new_inputs[i], target_node, cur_node, exclude_hcom)) {
        return true;
      }
    } else {
      auto new_real_input = AnfAlgo::VisitKernel(new_inputs[i], 0);
      if (target_node == new_real_input.first) {
        if (!(exclude_hcom && IsHcom(cur_node))) {
          return true;
        }
      }
    }
  }
  return false;
 }
 vector<CNodePtr>::iterator AscendStreamAssign::FindTargetOp(vector<CNodePtr>::iterator begin,
                                                            vector<CNodePtr>::iterator end, const CNodePtr &node,
                                                            bool exclude_hcom) {
@@ -2000,18 +2022,8 @@ vector<CNodePtr>::iterator AscendStreamAssign::FindTargetOp(vector<CNodePtr>::it
    for (size_t i = 1; i < inputs.size(); i++) {
      auto input = inputs[i];
      if (opt::IsNopNode(input)) {
        CNodePtr cnode = input->cast<CNodePtr>();
        auto new_inputs = cnode->inputs();
        for (size_t j = 1; j < new_inputs.size(); j++) {
          auto new_real_input = AnfAlgo::VisitKernel(new_inputs[j], 0);
          // find target node except hcom op. insert event for hcom in:InsertEventHcomDependCommonBak function
          // only insert one time
          if (node == new_real_input.first) {
            if (!(exclude_hcom && IsHcom(*begin))) {
              MS_LOG(DEBUG) << "Nop node find target op[" << (*begin)->DebugString() << "]";
              return begin;
            }
          }
        if (IsNopNodeTarget(input, node, *begin, exclude_hcom)) {
          return begin;
        }
      } else {
        auto real_input = AnfAlgo::VisitKernel(input, 0);
--- a/mindspore/ccsrc/runtime/device/ascend/ascend_stream_assign.h
+++ b/mindspore/ccsrc/runtime/device/ascend/ascend_stream_assign.h
@@ -175,7 +175,8 @@ class AscendStreamAssign {
  uint32_t GetIndexByKey(const NotNull<KernelGraphPtr> &graph_ptr, const CNodeKey &key);
  uint32_t GetIndependentStreamSwitchStreamId(const NotNull<KernelGraphPtr> &graph_ptr);
  void GetIndependentMaxTarget(const NotNull<KernelGraphPtr> &graph_ptr);
  bool IsNopNodeTarget(const AnfNodePtr &nop_node, const CNodePtr &target_node, const CNodePtr &cur_node,
                       bool exclude_hcom);
  bool IsTaskSink();
  bool IsHcom(const CNodePtr &cur_cnode_ptr);
  bool IsIndependentNode(const CNodePtr &node_ptr);
--- a/mindspore/ccsrc/runtime/device/ascend/dump/data_dumper.cc
+++ b/mindspore/ccsrc/runtime/device/ascend/dump/data_dumper.cc
@@ -133,9 +133,11 @@ void DataDumper::SetOpMappingInfo(NotNull<aicpu::dump::OpMappingInfo *> dump_inf
  }
  uint32_t graph_id = kernel_graph_->graph_id();
  uint32_t rank_id = 0;
  auto env_table_file = common::GetEnv("RANK_TABLE_FILE");
  auto ms_context = MsContext::GetInstance();
  MS_EXCEPTION_IF_NULL(ms_context);
  auto env_rank_id = common::GetEnv("RANK_ID");
  if (!(env_table_file.empty() || env_rank_id.empty())) {
  if (ms_context->get_param<bool>(MS_CTX_ENABLE_HCCL) && !env_rank_id.empty()) {
    // get actual rank id if it's distribution training case.
    if (!CommManager::GetInstance().GetRankID(kHcclWorldGroup, &rank_id)) {
      MS_LOG(INFO) << "Failed to get rank id.";
--- a/mindspore/ccsrc/runtime/device/ascend/executor/ai_core_dynamic_kernel.cc
+++ b/mindspore/ccsrc/runtime/device/ascend/executor/ai_core_dynamic_kernel.cc
@@ -19,12 +19,12 @@
 #include <memory>
 #include "framework/common/debug/log.h"
 #include "utils/log_adapter.h"
 #include "runtime/device/ascend/executor/tiling/op_tiling_calculater.h"
 #include "register/op_tiling.h"
 #include "utils/convert_utils_base.h"
 #include "utils/ms_context.h"
 #include "runtime/device/kernel_runtime_manager.h"
 #include "pipeline/jit/static_analysis/static_analysis.h"
 #include "runtime/device/ascend/executor/tiling/op_tiling_adapter.h"
 #include "common/trans.h"
 namespace mindspore {
@@ -131,14 +131,17 @@ void AiCoreDynamicKernel::ComputeTiling() {
  auto cnode = cnode_ptr_.lock();
  MS_EXCEPTION_IF_NULL(cnode);
  MS_LOG(INFO) << "Start compute tiling of:" << cnode->fullname_with_scope();
  optiling::OpRunInfo op_run_info;
  OpTilingCalculater::GetInstance().CalculateTiling(NOT_NULL(cnode), op_compile_info_, depend_tensor_map_,
                                                    NOT_NULL(&op_run_info));
  block_dim_ = op_run_info.block_dim;
  workspaces_size_ = op_run_info.workspaces;
  tiling_data_ = op_run_info.tiling_data.str();
  tiling_key_ = op_run_info.tiling_key;
  // start compute tiling
  optiling::utils::OpRunInfo op_run_info_v2(-1, true, 0);
  tiling::OpTilingCalculateAdapter converter;
  ge::ComputeGraphPtr ge_graph = std::make_shared<ge::ComputeGraph>("default");
  auto ge_node = converter.AnfNodeToGeNodeAdapter(cnode, &ge_graph, depend_tensor_map_);
  (void)optiling::OpParaCalculateV2(*ge_node, op_run_info_v2);
  block_dim_ = op_run_info_v2.GetBlockDim();
  op_run_info_v2.GetAllWorkspaces(workspaces_size_);
  tiling_data_ = op_run_info_v2.GetAllTilingData().str();
  tiling_key_ = op_run_info_v2.GetTilingKey();
 }
 void AiCoreDynamicKernel::AllocateWorkspace() {
--- a/mindspore/ccsrc/runtime/device/ascend/executor/tiling/op_tiling_adapter.cc
+++ b/mindspore/ccsrc/runtime/device/ascend/executor/tiling/op_tiling_adapter.cc
@@ -0,0 +1,292 @@
 /**
 * Copyright 2021 Huawei Technologies Co., Ltd
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 * http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */
 #include "runtime/device/ascend/executor/tiling/op_tiling_adapter.h"
 #include <algorithm>
 #include "backend/kernel_compiler/tbe/tbe_kernel_build.h"
 #include "backend/kernel_compiler/tbe/tbe_dynaminc_shape_util.h"
 #include "backend/session/anf_runtime_algorithm.h"
 #include "runtime/device/ascend/ge_types_convert.h"
 #include "utils/utils.h"
 #include "external/graph/tensor.h"
 #include "external/register/op_tiling_registry.h"
 #include "graph/utils/graph_utils.h"
 #include "common/ge_inner_error_codes.h"
 #include "graph/utils/op_desc_utils.h"
 namespace mindspore {
 namespace device {
 namespace tiling {
 constexpr auto COMPILE_INFO_KEY = "compile_info_key";
 constexpr auto COMPILE_INFO_JSON = "compile_info_json";
 constexpr auto ATOMIC_COMPILE_INFO_KEY = "_atomic_compile_info_key";
 constexpr auto ATOMIC_COMPILE_INFO_JSON = "_atomic_compile_info_json";
 constexpr auto ATTR_NAME_OP_INFER_DEPENDS = "_op_infer_depends";
 constexpr auto CONSTANTOP = "Constant";
 constexpr auto ATTR_NAME_WEIGHTS = "value";
 constexpr auto PARAM_DYNAMIC = "dynamic";
 std::string OpTilingCalculateAdapter::GetRealOpType(const std::string &op_type) {
  static const std::map<std::string, std::string> kOpTypeMap = {
    {"SparseApplyFtrl", "SparseApplyFtrlD"},
    {"SparseApplyProximalAdagrad", "SparseApplyProximalAdagradD"},
    {"SparseGatherV2", "Gather"},
    {"Pad", "PadD"},
    {"Concat", "ConcatD"},
    {"Softmax", "SoftmaxV2"},
    {"DropoutDoMask", "DropOutDoMask"},
  };
  auto iter = kOpTypeMap.find(op_type);
  if (iter == kOpTypeMap.end()) {
    return op_type;
  }
  return iter->second;
 }
 std::string OpTilingCalculateAdapter::GetOutputName(const CNodePtr &node, size_t index) {
  MS_EXCEPTION_IF_NULL(node);
  if (output_names_.size() < index) {
    return "unknown_name";
  }
  return output_names_[index];
 }
 std::string OpTilingCalculateAdapter::GetInputName(const CNodePtr &node, size_t index) {
  MS_EXCEPTION_IF_NULL(node);
  if (input_names_.size() < index) {
    return "unknown_name";
  }
  return input_names_[index];
 }
 void OpTilingCalculateAdapter::ConvertInputShapeAndType(const CNodePtr &node, ge::OpDescPtr *op_desc) {
  MS_EXCEPTION_IF_NULL(node);
  MS_EXCEPTION_IF_NULL(*op_desc);
  auto input_size = AnfAlgo::GetInputTensorNum(node);
  for (size_t i = 0; i < input_size; i++) {
    // ms info
    auto input_node_with_idx = AnfAlgo::GetPrevNodeOutput(node, i);
    auto input_node = input_node_with_idx.first;
    auto input_index = input_node_with_idx.second;
    auto ms_ori_shape = AnfAlgo::GetOutputInferShape(input_node, input_index);
    auto ms_shape = AnfAlgo::GetOutputDeviceShape(input_node, input_index);
    auto ms_format = AnfAlgo::GetOutputFormat(input_node, input_index);
    auto ms_dtype = AnfAlgo::GetOutputDeviceDataType(input_node, input_index);
    // ge info
    std::vector<int64_t> ge_shape;
    std::vector<int64_t> ge_ori_shape;
    ge::DataType ge_dtype = ascend::GeTypesConvert::TransTypeIdToGeDataType(ms_dtype);
    ge::Format ge_format = ascend::GeTypesConvert::GetGeFormat(ms_format, ms_shape.size());
    std::transform(ms_shape.begin(), ms_shape.end(), std::back_inserter(ge_shape),
                   [](size_t s) { return static_cast<int64_t>(s); });
    std::transform(ms_ori_shape.begin(), ms_ori_shape.end(), std::back_inserter(ge_ori_shape),
                   [](size_t s) { return static_cast<int64_t>(s); });
    auto input_name = GetInputName(node, i);
    ge::GeTensorDesc ge_tensor_desc;
    ge_tensor_desc.SetFormat(ge_format);
    ge_tensor_desc.SetDataType(ge_dtype);
    ge_tensor_desc.SetShape(ge::GeShape(ge_shape));
    ge_tensor_desc.SetOriginShape(ge::GeShape(ge_ori_shape));
    ge_tensor_desc.SetName(input_name);
    (void)(*op_desc)->AddInputDesc(input_name, ge_tensor_desc);
  }
 }
 void OpTilingCalculateAdapter::ConvertOutputShapeAndType(const CNodePtr &node, ge::OpDescPtr *op_desc) {
  MS_EXCEPTION_IF_NULL(node);
  MS_EXCEPTION_IF_NULL(*op_desc);
  auto output_size = AnfAlgo::GetOutputTensorNum(node);
  for (size_t i = 0; i < output_size; i++) {
    auto ms_shape = AnfAlgo::GetOutputDeviceShape(node, i);
    auto ms_ori_shape = AnfAlgo::GetOutputInferShape(node, i);
    auto ms_format = AnfAlgo::GetOutputFormat(node, i);
    auto ms_dtype = AnfAlgo::GetOutputDeviceDataType(node, i);
    std::vector<int64_t> ge_shape;
    std::vector<int64_t> ge_ori_shape;
    ge::DataType ge_dtype = ascend::GeTypesConvert::TransTypeIdToGeDataType(ms_dtype);
    ge::Format ge_format = ascend::GeTypesConvert::GetGeFormat(ms_format, ms_shape.size());
    std::transform(ms_shape.begin(), ms_shape.end(), std::back_inserter(ge_shape),
                   [](size_t s) { return static_cast<int64_t>(s); });
    std::transform(ms_ori_shape.begin(), ms_ori_shape.end(), std::back_inserter(ge_ori_shape),
                   [](size_t s) { return static_cast<int64_t>(s); });
    auto output_name = GetOutputName(node, i);
    ge::GeTensorDesc ge_tensor_desc;
    ge_tensor_desc.SetFormat(ge_format);
    ge_tensor_desc.SetDataType(ge_dtype);
    ge_tensor_desc.SetShape(ge::GeShape(ge_shape));
    ge_tensor_desc.SetOriginShape(ge::GeShape(ge_ori_shape));
    ge_tensor_desc.SetName(output_name);
    (void)(*op_desc)->AddOutputDesc(output_name, ge_tensor_desc);
  }
 }
 void OpTilingCalculateAdapter::ConvertCompileInfo(const CNodePtr &node, ge::OpDescPtr *op_desc) {
  MS_EXCEPTION_IF_NULL(node);
  MS_EXCEPTION_IF_NULL(*op_desc);
  if (!AnfAlgo::HasNodeAttr(kAttrCompileInfo, node)) {
    MS_LOG(EXCEPTION) << "Get compile_info failed";
  }
  auto compile_info_attr = AnfAlgo::GetNodeAttr<std::string>(node, kAttrCompileInfo);
  MS_LOG(INFO) << "For op " << op_name_ << ", get compile_info: " << compile_info_attr;
  std::string compile_info_key = std::to_string(std::hash<std::string>()(compile_info_attr));
  (void)ge::AttrUtils::SetStr(*(*op_desc), COMPILE_INFO_KEY, compile_info_key);
  (void)ge::AttrUtils::SetStr(*(*op_desc), COMPILE_INFO_JSON, compile_info_attr);
 }
 ge::NodePtr OpTilingCalculateAdapter::NewConstantOp(const CNodePtr &node, const std::string &name,
                                                    const tensor::TensorPtr &tensor_data, ge::ComputeGraphPtr *ge_graph,
                                                    size_t index) {
  MS_EXCEPTION_IF_NULL(node);
  MS_EXCEPTION_IF_NULL(*ge_graph);
  MS_EXCEPTION_IF_NULL(tensor_data);
  ge::OpDescPtr op_desc = std::make_shared<ge::OpDesc>(name, CONSTANTOP);
  auto ms_format = AnfAlgo::GetPrevNodeOutputFormat(node, index);
  auto ms_ori_shape = AnfAlgo::GetPrevNodeOutputInferShape(node, index);
  auto ms_dtype = AnfAlgo::GetPrevNodeOutputDeviceDataType(node, index);
  auto ms_shape = AnfAlgo::GetInputDeviceShape(node, index);
  std::vector<int64_t> ge_shape;
  std::vector<int64_t> ge_ori_shape;
  std::transform(ms_shape.begin(), ms_shape.end(), std::back_inserter(ge_shape),
                 [](size_t s) { return static_cast<int64_t>(s); });
  std::transform(ms_ori_shape.begin(), ms_ori_shape.end(), std::back_inserter(ge_ori_shape),
                 [](size_t s) { return static_cast<int64_t>(s); });
  ge::DataType ge_dtype = ascend::GeTypesConvert::TransTypeIdToGeDataType(ms_dtype);
  ge::Format ge_format = ascend::GeTypesConvert::GetGeFormat(ms_format, ms_shape.size());
  ge::GeTensorDesc ge_tensor_desc;
  ge_tensor_desc.SetFormat(ge_format);
  ge_tensor_desc.SetDataType(ge_dtype);
  ge_tensor_desc.SetShape(ge::GeShape(ge_shape));
  ge_tensor_desc.SetOriginShape(ge::GeShape(ge_ori_shape));
  ge_tensor_desc.SetName(name);
  ge::GeTensorPtr ge_tensor = std::make_shared<ge::GeTensor>(
    ge_tensor_desc, static_cast<uint8_t *>(tensor_data->data_c()), IntToSize(tensor_data->DataSize()));
  (void)op_desc->AddOutputDesc(name, ge_tensor_desc);
  ge::NodePtr constant_op = (*ge_graph)->AddNode(op_desc);
  ge::OpDescUtils::SetWeights(constant_op, {ge_tensor});
  (void)ge::AttrUtils::SetTensor(op_desc, ATTR_NAME_WEIGHTS, ge_tensor);
  return constant_op;
 }
 std::vector<std::tuple<std::size_t, ge::NodePtr>> OpTilingCalculateAdapter::ConvertDepends(
  const CNodePtr &node, const std::map<uint32_t, tensor::TensorPtr> &depend_tensor_map, ge::OpDescPtr *op_desc,
  ge::ComputeGraphPtr *ge_graph) {
  MS_EXCEPTION_IF_NULL(node);
  MS_EXCEPTION_IF_NULL(*op_desc);
  auto depends_list_me = abstract::GetDependsFormMap(node);
  if (depends_list_me.empty()) {
    MS_LOG(INFO) << "The node " << op_name_ << " has no infer depend ";
    return {};
  }
  auto has_input_name_attr = AnfAlgo::HasNodeAttr("input_names", node);
  if (!has_input_name_attr) {
    MS_LOG(EXCEPTION) << "Node should has attr: input_names. " << node->fullname_with_scope();
  }
  auto input_names_attr = AnfAlgo ::GetNodeAttr<std::vector<std::string>>(node, "input_names");
  std::vector<std::string> op_infer_depends;
  std::vector<std::tuple<std::size_t, ge::NodePtr>> constant_ops;
  for (auto index : depends_list_me) {
    if (LongToSize(index) > input_names_attr.size()) {
      MS_LOG(EXCEPTION) << "Input index " << index << " should less input_names' size " << input_names_attr.size();
    }
    auto iter = depend_tensor_map.find(LongToSize(index));
    if (iter == depend_tensor_map.end()) {
      MS_LOG(EXCEPTION) << "Input index " << index << " should less than depend_tensor_map' size "
                        << input_names_attr.size();
    }
    auto depend_name = input_names_attr[index];
    auto const_tensor = iter->second;
    ge::NodePtr ge_constant_op = NewConstantOp(node, depend_name, const_tensor, ge_graph, index);
    constant_ops.emplace_back(std::tuple<std::size_t, ge::NodePtr>(index, ge_constant_op));
    op_infer_depends.emplace_back(depend_name);
  }
  (void)(*op_desc)->SetOpInferDepends(op_infer_depends);
  return constant_ops;
 }
 void OpTilingCalculateAdapter::AddEdge(const ge::NodePtr &ge_node,
                                       const std::vector<std::tuple<std::size_t, ge::NodePtr>> &constant_ops) {
  MS_EXCEPTION_IF_NULL(ge_node);
  MS_LOG(INFO) << "Add edge for op " << op_name_;
  for (const auto &item : constant_ops) {
    auto index = std::get<0>(item);
    auto constant_op = std::get<1>(item);
    (void)ge_node->AddLinkFrom(index, constant_op);
  }
 }
 void OpTilingCalculateAdapter::InitOpIoName(const CNodePtr &node) {
  MS_EXCEPTION_IF_NULL(node);
  MS_LOG(INFO) << "Get the every input name of " << op_name_;
  auto op_info_ptr = mindspore::kernel::tbe::TbeDynamicShapeUtil::FindOp(op_name_, node);
  MS_EXCEPTION_IF_NULL(op_info_ptr);
  auto primitive = AnfAlgo::GetCNodePrimitive(node);
  MS_EXCEPTION_IF_NULL(primitive);
  auto inputs_ptr = op_info_ptr->inputs_ptr();
  size_t dynamic_input_index = 0;
  std::vector<int64_t> dynamic_inputs_list;
  if (primitive->GetAttr(kAttrDynInputSizes) != nullptr) {
    dynamic_inputs_list = GetValue<std::vector<int64_t>>(primitive->GetAttr(kAttrDynInputSizes));
  }
  for (const auto &item : inputs_ptr) {
    MS_EXCEPTION_IF_NULL(item);
    if (item->param_type() == PARAM_DYNAMIC) {
      if (dynamic_input_index > dynamic_inputs_list.size()) {
        MS_LOG(EXCEPTION) << "Dynamic input index should less than the dynamic input's size.";
      }
      auto real_inputs_num = dynamic_inputs_list[dynamic_input_index];
      for (auto k = 0; k < real_inputs_num; k++) {
        std::string input_name = item->name() + "_dynamic_" + std::to_string(k);
        input_names_.emplace_back(input_name);
      }
      dynamic_input_index++;
    } else {
      input_names_.emplace_back(item->name());
    }
  }
  // output io names
  auto outputs_ptr = op_info_ptr->outputs_ptr();
  for (const auto &out_item : outputs_ptr) {
    MS_EXCEPTION_IF_NULL(out_item);
    output_names_.emplace_back(out_item->name());
  }
 }
 ge::NodePtr OpTilingCalculateAdapter::AnfNodeToGeNodeAdapter(
  const CNodePtr &node, ge::ComputeGraphPtr *ge_graph, const std::map<uint32_t, tensor::TensorPtr> &depend_tensor_map) {
  MS_EXCEPTION_IF_NULL(node);
  op_name_ = AnfAlgo::GetCNodeName(node);
  auto op_type = GetRealOpType(op_name_);
  (void)InitOpIoName(node);
  ge::OpDescPtr op_desc = std::make_shared<ge::OpDesc>(op_name_, op_type);
  MS_EXCEPTION_IF_NULL(op_desc);
  ConvertInputShapeAndType(node, &op_desc);
  ConvertOutputShapeAndType(node, &op_desc);
  ConvertCompileInfo(node, &op_desc);
  std::vector<std::tuple<std::size_t, ge::NodePtr>> constant_ops =
    ConvertDepends(node, depend_tensor_map, &op_desc, ge_graph);
  MS_EXCEPTION_IF_NULL(*ge_graph);
  auto ge_node = (*ge_graph)->AddNode(op_desc);
  MS_EXCEPTION_IF_NULL(ge_node);
  AddEdge(ge_node, constant_ops);
  return ge_node;
 }
 }  // namespace tiling
 }  // namespace device
 }  // namespace mindspore
--- a/mindspore/ccsrc/runtime/device/ascend/executor/tiling/op_tiling_adapter.h
+++ b/mindspore/ccsrc/runtime/device/ascend/executor/tiling/op_tiling_adapter.h
@@ -0,0 +1,65 @@
 /**
 * Copyright 2021 Huawei Technologies Co., Ltd
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 * http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */
 #ifndef MINDSPORE_MINDSPORE_CCSRC_RUNTIME_DEVICE_ASCEND_TILING_OP_TILING_ADAPTER_H_
 #define MINDSPORE_MINDSPORE_CCSRC_RUNTIME_DEVICE_ASCEND_TILING_OP_TILING_ADAPTER_H_
 #include <map>
 #include <memory>
 #include <string>
 #include <vector>
 #include <tuple>
 #include "backend/kernel_compiler/oplib/opinfo.h"
 #include "register/op_tiling.h"
 #include "external/graph/operator.h"
 #include "graph/utils/graph_utils.h"
 #include "abstract/primitive_infer_map.h"
 namespace mindspore {
 namespace device {
 namespace tiling {
 class OpTilingCalculateAdapter {
 public:
  OpTilingCalculateAdapter() = default;
  ~OpTilingCalculateAdapter() = default;
  ge::NodePtr AnfNodeToGeNodeAdapter(const CNodePtr &node, ge::ComputeGraphPtr *ge_graph,
                                     const std::map<uint32_t, tensor::TensorPtr> &depend_tensor_map);
 private:
  void ConvertInputShapeAndType(const CNodePtr &node, ge::OpDescPtr *op_desc);
  void ConvertOutputShapeAndType(const CNodePtr &node, ge::OpDescPtr *op_desc);
  void ConvertCompileInfo(const CNodePtr &node, ge::OpDescPtr *op_desc);
  void ConvertAttrs(const CNodePtr &node, ge::OpDescPtr *op_desc);
  std::vector<std::tuple<std::size_t, ge::NodePtr>> ConvertDepends(
    const CNodePtr &node, const std::map<uint32_t, tensor::TensorPtr> &depend_tensor_map, ge::OpDescPtr *op_desc,
    ge::ComputeGraphPtr *ge_graph);
  ge::NodePtr NewConstantOp(const CNodePtr &node, const std::string &name, const tensor::TensorPtr &tensor_data,
                            ge::ComputeGraphPtr *ge_graph, size_t index);
  void AddEdge(const ge::NodePtr &ge_node, const std::vector<std::tuple<std::size_t, ge::NodePtr>> &constant_ops);
  std::string GetRealOpType(const std::string &op_type);
  std::string GetInputName(const CNodePtr &node, size_t index);
  std::string GetOutputName(const CNodePtr &node, size_t index);
  void InitOpIoName(const CNodePtr &node);
  std::string op_name_;
  std::vector<std::string> input_names_;
  std::vector<std::string> output_names_;
 };
 }  // namespace tiling
 }  // namespace device
 }  // namespace mindspore
 #endif  // MINDSPORE_MINDSPORE_CCSRC_RUNTIME_DEVICE_ASCEND_TILING_OP_TILING_ADAPTER_H_
--- a/mindspore/ccsrc/runtime/device/ascend/executor/tiling/op_tiling_calculater.cc
+++ b/mindspore/ccsrc/runtime/device/ascend/executor/tiling/op_tiling_calculater.cc
@@ -1,205 +0,0 @@
 /**
 * Copyright 2019 Huawei Technologies Co., Ltd
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 * http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */
 #include "runtime/device/ascend/executor/tiling/op_tiling_calculater.h"
 #include <dlfcn.h>
 #include <map>
 #include <vector>
 #include <memory>
 #include <string>
 #include <algorithm>
 #include "backend/session/anf_runtime_algorithm.h"
 #include "runtime/device/ascend/ge_types_convert.h"
 #include "utils/utils.h"
 #include "external/graph/tensor.h"
 #include "external/register/op_tiling_registry.h"
 namespace mindspore {
 namespace device {
 namespace ascend {
 ge::Tensor MakeTempGeTensor(const TypeId &type_id, const std::vector<size_t> &shape, const std::string &format) {
  auto ge_type = GeTypesConvert::TransTypeIdToGeDataType(type_id);
  std::vector<int64_t> int_shape;
  std::transform(shape.begin(), shape.end(), std::back_inserter(int_shape), SizeToLong);
  auto ge_format = GeTypesConvert::GetGeFormat(format, shape.size());
  ge::Tensor ge_tensor;
  ge_tensor.SetTensorDesc(ge::TensorDesc(ge::Shape(int_shape), ge_format, ge_type));
  return ge_tensor;
 }
 void FeedTeOpTensorInputArg(const NotNull<CNodePtr> &cnode,
                            NotNull<std::vector<optiling::TeOpTensorArg> *> tensor_arg_list) {
  MS_LOG(INFO) << "FeedTeOpTensorInputArg start, node:" << cnode->fullname_with_scope();
  auto input_size = AnfAlgo::GetInputTensorNum(cnode.get());
  // Skip Dynamic Shape Depend Input
  for (size_t i = 0; i < input_size; ++i) {
    auto input_node_with_index = AnfAlgo::GetPrevNodeOutput(cnode.get(), i);
    auto input_node = input_node_with_index.first;
    auto input_index = input_node_with_index.second;
    auto output_shape = AnfAlgo::GetOutputDeviceShape(input_node, input_index);
    auto output_ori_shape = AnfAlgo::GetOutputInferShape(input_node, input_index);
    auto output_format = AnfAlgo::GetOutputFormat(input_node, input_index);
    auto output_dtype = AnfAlgo::GetOutputDeviceDataType(input_node, input_index);
    auto iter = type_name_map.find(output_dtype);
    if (iter == type_name_map.end()) {
      MS_LOG(EXCEPTION) << "Cannot found typeId:" << output_dtype;
    }
    auto ge_output_dtype = iter->second;
    optiling::TeOpTensorArg tensor_arg;
    optiling::TeOpTensor tensor;
    tensor_arg.arg_type = optiling::TA_SINGLE;
    tensor.dtype = ge_output_dtype;
    tensor.shape.insert(tensor.shape.end(), output_shape.begin(), output_shape.end());
    tensor.ori_shape.insert(tensor.ori_shape.end(), output_ori_shape.begin(), output_ori_shape.end());
    tensor.format = GeTypesConvert::GetGeTilingFormat(GeTypesConvert::GetGeFormat(output_format, output_shape.size()));
    MS_LOG(INFO) << "Tiling Format:" << tensor.format;
    tensor_arg.tensor.emplace_back(tensor);
    tensor_arg_list->emplace_back(tensor_arg);
  }
 }
 void FeedTeOpTensorOutputArg(const NotNull<CNodePtr> &cnode,
                             NotNull<std::vector<optiling::TeOpTensorArg> *> tensor_arg_list) {
  MS_LOG(INFO) << "FeedTeOpTensorOutputArg start, node:" << cnode->fullname_with_scope();
  auto output_size = AnfAlgo::GetOutputTensorNum(cnode.get());
  for (size_t i = 0; i < output_size; ++i) {
    auto output_shape = AnfAlgo::GetOutputDeviceShape(cnode.get(), i);
    auto output_ori_shape = AnfAlgo::GetOutputInferShape(cnode.get(), i);
    auto output_format = AnfAlgo::GetOutputFormat(cnode.get(), i);
    auto data_type = AnfAlgo::GetOutputDeviceDataType(cnode.get(), i);
    auto iter = type_name_map.find(data_type);
    if (iter == type_name_map.end()) {
      MS_LOG(EXCEPTION) << "Cannot found typeId:" << data_type;
    }
    optiling::TeOpTensorArg tensor_arg;
    optiling::TeOpTensor tensor;
    tensor_arg.arg_type = optiling::TA_SINGLE;
    tensor.dtype = iter->second;
    tensor.shape.insert(tensor.shape.end(), output_shape.begin(), output_shape.end());
    tensor.ori_shape.insert(tensor.ori_shape.end(), output_ori_shape.begin(), output_ori_shape.end());
    tensor.format = GeTypesConvert::GetGeTilingFormat(GeTypesConvert::GetGeFormat(output_format, output_shape.size()));
    MS_LOG(INFO) << "Tiling Format:" << tensor.format;
    tensor_arg.tensor.emplace_back(tensor);
    tensor_arg_list->emplace_back(tensor_arg);
  }
 }
 void FeedTeOpConstTensor(const NotNull<CNodePtr> &cnode, const std::map<uint32_t, tensor::TensorPtr> &depend_tensor_map,
                         NotNull<std::map<std::string, optiling::TeConstTensorData> *> const_inputs) {
  MS_LOG(INFO) << "FeedTeOpConstTensor start, node:" << cnode->fullname_with_scope();
  auto depends_list_me = abstract::GetDependsFormMap(cnode);
  if (depends_list_me.empty()) {
    MS_LOG(INFO) << "No input depend found, " << cnode->fullname_with_scope();
    return;
  }
  std::vector<int> depends_list;
  (void)std::transform(depends_list_me.begin(), depends_list_me.end(), std::back_inserter(depends_list),
                       [](const int64_t &value) { return static_cast<int>(value); });
  for (auto index : depends_list) {
    auto iter = depend_tensor_map.find(IntToSize(index));
    if (iter == depend_tensor_map.end()) {
      MS_LOG(EXCEPTION) << "Index not found in depend_tensor_map";
    }
    auto const_tensor = iter->second;
    auto have_input_names_attr = AnfAlgo::HasNodeAttr("input_names", cnode);
    if (!have_input_names_attr) {
      MS_LOG(EXCEPTION) << "cnode:" << cnode->fullname_with_scope() << " no input_names attr";
    }
    auto input_names_attr = AnfAlgo::GetNodeAttr<std::vector<std::string>>(cnode.get(), "input_names");
    if (IntToSize(index) >= input_names_attr.size()) {
      MS_LOG(EXCEPTION) << "input index" << index << " >= input_name_attr.size:" << input_names_attr.size();
    }
    auto input_name = input_names_attr[index];
    MS_LOG(INFO) << "input_name is " << input_name;
    auto type_id = AnfAlgo::GetPrevNodeOutputDeviceDataType(cnode.get(), IntToSize(index));
    auto shape = AnfAlgo::GetPrevNodeOutputInferShape(cnode.get(), IntToSize(index));
    auto format = AnfAlgo::GetPrevNodeOutputFormat(cnode.get(), IntToSize(index));
    const_inputs->try_emplace(
      input_name,
      optiling::TeConstTensorData{static_cast<const uint8_t *>(const_tensor->data_c()),
                                  IntToSize(const_tensor->DataSize()), MakeTempGeTensor(type_id, shape, format)});
  }
  MS_LOG(INFO) << "FeedTeOpConstTensor end";
 }
 void OpTilingCalculater::Init() {
  MS_LOG(INFO) << "Start init OpTilingCalculater";
  tiling_func_map_ = optiling::OpTilingRegistryInterf::RegisteredOpInterf();
  if (tiling_func_map_.empty()) {
    MS_LOG(EXCEPTION) << "Get register tiling func failed.";
  }
 }
 std::string GetRealOpType(const std::string &op_type) {
  static const std::map<std::string, std::string> kOpTypeMap = {
    {"SparseApplyFtrl", "SparseApplyFtrlD"},
    {"SparseApplyProximalAdagrad", "SparseApplyProximalAdagradD"},
    {"SparseGatherV2", "Gather"},
    {"Pad", "PadD"},
    {"Concat", "ConcatD"},
    {"Softmax", "SoftmaxV2"},
    {"DropoutDoMask", "DropOutDoMask"},
  };
  auto iter = kOpTypeMap.find(op_type);
  if (iter == kOpTypeMap.end()) {
    return op_type;
  }
  return iter->second;
 }
 void OpTilingCalculater::CalculateTiling(const NotNull<CNodePtr> &cnode, const optiling::OpCompileInfo &op_compile_info,
                                         const std::map<uint32_t, tensor::TensorPtr> &depend_tensor_map,
                                         const NotNull<optiling::OpRunInfo *> op_run_info) {
  optiling::TeOpParas op_param;
  std::string op_type = AnfAlgo::GetCNodeName(cnode.get());
  MS_LOG(INFO) << "[DynamicShape] calculate tiling, op_type:" << op_type;
  FeedTeOpTensorInputArg(cnode, NOT_NULL(&op_param.inputs));
  FeedTeOpTensorOutputArg(cnode, NOT_NULL(&op_param.outputs));
  FeedTeOpConstTensor(cnode, depend_tensor_map, NOT_NULL(&op_param.const_inputs));
  op_type = GetRealOpType(op_type);
  auto iter = tiling_func_map_.find(op_type);
  if (iter == tiling_func_map_.end()) {
    iter = tiling_func_map_.find("AutoTiling");
    if (iter == tiling_func_map_.end()) {
      MS_LOG(EXCEPTION) << "AutoTiling Func Not Found";
    }
  }
  MS_LOG(INFO) << "Get tiling func:" << iter->first;
  if (iter != tiling_func_map_.end()) {
    bool ret = (iter->second)(op_param, op_compile_info, *op_run_info);
    if (!ret) {
      MS_LOG(EXCEPTION) << "Calculate tiling failed";
    }
  } else {
    MS_LOG(EXCEPTION) << "Tiling func not found";
  }
  MS_LOG(INFO) << "CalculateTiling success";
 }
 }  // namespace ascend
 }  // namespace device
 }  // namespace mindspore
--- a/mindspore/ccsrc/runtime/device/ascend/executor/tiling/op_tiling_calculater.h
+++ b/mindspore/ccsrc/runtime/device/ascend/executor/tiling/op_tiling_calculater.h
@@ -1,55 +0,0 @@
 /**
 * Copyright 2019 Huawei Technologies Co., Ltd
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 * http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */
 #ifndef MINDSPORE_MINDSPORE_CCSRC_RUNTIME_DEVICE_ASCEND_TILING_OP_TILING_CALCULATE_H_
 #define MINDSPORE_MINDSPORE_CCSRC_RUNTIME_DEVICE_ASCEND_TILING_OP_TILING_CALCULATE_H_
 #include <map>
 #include <memory>
 #include <string>
 #include "utils/ms_utils.h"
 #include "utils/contract.h"
 #include "ir/anf.h"
 #include "ir/tensor.h"
 #include "register/op_tiling.h"
 #include "abstract/primitive_infer_map.h"
 namespace mindspore {
 namespace device {
 namespace ascend {
 class OpTilingCalculater {
 public:
  static OpTilingCalculater &GetInstance() {
    static OpTilingCalculater instance;
    return instance;
  }
  void Init();
  void CalculateTiling(const NotNull<CNodePtr> &cnode, const optiling::OpCompileInfo &op_compile_info,
                       const std::map<uint32_t, tensor::TensorPtr> &depend_tensor_map,
                       NotNull<optiling::OpRunInfo *> op_run_info);
 private:
  OpTilingCalculater() = default;
  ~OpTilingCalculater() = default;
  DISABLE_COPY_AND_ASSIGN(OpTilingCalculater);
  std::map<std::string, optiling::OpTilingFunc> tiling_func_map_;
 };
 }  // namespace ascend
 }  // namespace device
 }  // namespace mindspore
 #endif  // MINDSPORE_MINDSPORE_CCSRC_RUNTIME_DEVICE_ASCEND_TILING_OP_TILING_CALCULATE_H_
--- a/mindspore/ccsrc/runtime/hccl_adapter/hccl_adapter.cc
+++ b/mindspore/ccsrc/runtime/hccl_adapter/hccl_adapter.cc
@@ -137,43 +137,53 @@ bool HcclAdapter::InitHccl() {
  return true;
 }
 bool HcclAdapter::InitHccl(uint32_t device_id, std::string_view rank_id, std::string_view rank_file) {
  MS_LOG(INFO) << "Start init hccl adapter.";
 bool HcclAdapter::InitHccl(uint32_t device_id, std::string_view rank_id, std::string_view rank_file,
                           bool is_graph_mode) {
  MS_LOG(INFO) << "Start init hccl adapter for " << (is_graph_mode ? "graph mode." : "pynative mode.");
  std::lock_guard<std::mutex> lock(init_mutex_);
  if (init_flag_) {
    MS_LOG(INFO) << "Hccl has been inited, skip.";
    return true;
  }
  is_graph_mode_ = is_graph_mode;
  InitPlugin();
  bool ret = InitKernelInfoStore(device_id, rank_id, rank_file);
  if (!ret) {
    return false;
  }
  ret = InitHcclComm(rank_id, rank_file);
  if (!ret) {
    return false;
  }
  ret = InitHcclExec();
  if (!ret) {
    return false;
  if (is_graph_mode_) {
    bool ret = InitKernelInfoStore(device_id, rank_id, rank_file);
    if (!ret) {
      return false;
    }
    ret = InitHcclExec();
    if (!ret) {
      return false;
    }
  } else {
    bool ret = InitHcclComm(rank_id, rank_file);
    if (!ret) {
      return false;
    }
  }
  init_flag_ = true;
  MS_LOG(INFO) << "Init hccl adapter success.";
  return true;
 }
 bool HcclAdapter::FinalizeHccl() {
  MS_LOG(INFO) << "Start destroy hccl adapter.";
  std::lock_guard<std::mutex> lock(init_mutex_);
  MS_LOG(INFO) << "Start destroy hccl adapter for " << (is_graph_mode_ ? "graph mode." : "pynative mode.");
  if (!init_flag_) {
    MS_LOG(INFO) << "Hccl has never been inited, skip.";
    return true;
  }
  (void)FinalizeHcclExec();
  (void)FinalizeHcclComm();
  (void)FinalizeKernelInfoStore();
  if (is_graph_mode_) {
    (void)FinalizeHcclExec();
    (void)FinalizeKernelInfoStore();
  } else {
    (void)FinalizeHcclComm();
  }
  FinalizePlugin();
  init_flag_ = false;
  MS_LOG(INFO) << "Destroy hccl adapter success.";
--- a/mindspore/ccsrc/runtime/hccl_adapter/hccl_adapter.h
+++ b/mindspore/ccsrc/runtime/hccl_adapter/hccl_adapter.h
@@ -42,7 +42,7 @@ class HcclAdapter {
  static HcclAdapter &GetInstance();
  // common
  bool InitHccl(uint32_t device_id, std::string_view rank_id, std::string_view rank_file);
  bool InitHccl(uint32_t device_id, std::string_view rank_id, std::string_view rank_file, bool is_graph_mode);
  bool InitHccl();
  bool FinalizeHccl();
@@ -121,6 +121,7 @@ class HcclAdapter {
  std::shared_ptr<::ge::OpsKernelBuilder> ops_kernel_builder_ = nullptr;
  bool init_flag_ = false;
  bool is_graph_mode_ = false;
  std::mutex init_mutex_;
 };
 }  // namespace mindspore::hccl
--- a/mindspore/ccsrc/transform/graph_ir/convert.h
+++ b/mindspore/ccsrc/transform/graph_ir/convert.h
@@ -78,7 +78,8 @@ class DfGraphConvertor {
  void DrawComputeGraph(const std::string &name) {
    std::ofstream fout(name);
    if (!fout.is_open()) {
      MS_LOG(ERROR) << "Open file '" << name << "' failed!";
      MS_LOG(ERROR) << "Open file '" << name << "' failed!"
                    << " Errno:" << errno << " ErrInfo:" << strerror(errno);
      return;
    }
    fout << compute_sout_.str();
@@ -87,7 +88,8 @@ class DfGraphConvertor {
  void DrawInitGraph(const std::string &name) {
    std::ofstream fout(name);
    if (!fout.is_open()) {
      MS_LOG(ERROR) << "Open file '" << name << "' failed!";
      MS_LOG(ERROR) << "Open file '" << name << "' failed!"
                    << " Errno:" << errno << " ErrInfo:" << strerror(errno);
      return;
    }
    fout << init_sout_.str();
@@ -96,7 +98,8 @@ class DfGraphConvertor {
  void DrawSaveCheckpointGraph(const std::string &name) {
    std::ofstream fout(name);
    if (!fout.is_open()) {
      MS_LOG(ERROR) << "Open file '" << name << "' failed!";
      MS_LOG(ERROR) << "Open file '" << name << "' failed!"
                    << " Errno:" << errno << " ErrInfo:" << strerror(errno);
      return;
    }
    fout << checkpoint_sout_.str();
--- a/mindspore/ccsrc/utils/utils.h
+++ b/mindspore/ccsrc/utils/utils.h
@@ -335,6 +335,7 @@ constexpr auto kAttrDataShape = "data_shape";
 constexpr auto kAttrFormat = "format";
 constexpr auto kAttrReshapeType = "reshape_type";
 constexpr auto kAttrAxis = "axis";
 constexpr auto kAttrAxes = "axes";
 constexpr auto kAttrKeepDims = "keep_dims";
 constexpr auto kAttrShapeGamma = "shape_gamma";
 constexpr auto kAttrPerm = "perm";
--- a/mindspore/ccsrc/vm/backend.cc
+++ b/mindspore/ccsrc/vm/backend.cc
@@ -289,14 +289,6 @@ VectorRef MsBackend::MsRunGraph(const GraphId &g, const VectorRef &args, const s
  return outputs;
 }
 void MsBackend::Link(GraphId graph_id) {
  MS_EXCEPTION_IF_NULL(target_sess_);
  if (graph_id == kInvalidGraphId) {
    graph_id = target_sess_->GetFinalRunGraph();
  }
  target_sess_->BuildGraph(graph_id);
 }
 MsBackend::MsBackend(const std::string &name, const std::string &target, uint32_t device_id) : Backend(name) {
  convert_fn_ = std::bind(&MsBackend::MsConvert, this, std::placeholders::_1, std::placeholders::_2);
  target_sess_ = session::SessionFactory::Get().Create(target);
--- a/mindspore/ccsrc/vm/backend.h
+++ b/mindspore/ccsrc/vm/backend.h
@@ -61,7 +61,6 @@ class Backend {
  virtual bool GetCond(const BaseRef &c, bool *value);
  virtual bool GetIndex(const BaseRef &c, int64_t *value);
  virtual GraphId CompileGraph(NotNull<FuncGraphPtr> fg) { return kInvalidGraphId; }
  virtual void Link(GraphId) {}
  virtual void SetDebugger() {}
  bool is_multi_graph_sink() const { return is_multi_graph_sink_; }
@@ -82,7 +81,6 @@ class MsBackend : public Backend {
  VectorRef MsRunGraph(const GraphId &g, const VectorRef &args, const std::string &target = "");
  VectorRef MsSimuRunGraph(const GraphId &g);
  void Link(GraphId) override;
  GraphId CompileGraph(NotNull<FuncGraphPtr> fg) override;
  VectorRef RunGraph(GraphId graph_id, const VectorRef &args);
  void ClearSessionGraphs();
--- a/mindspore/ccsrc/vm/transform.cc
+++ b/mindspore/ccsrc/vm/transform.cc
@@ -388,6 +388,13 @@ int64_t CompileGraph::AddCall(const FuncGraphPtr &graph, const CNodePtr &node) {
  MS_LOG(DEBUG) << "Call:" << Ref(fn) << ", " << height_ << ", " << (size - 1);
  AddInst(Instruction::kCall, Ref(fn));
  Ret(static_cast<int64_t>(size - 1));
  for (size_t i = size - 1; i > 0; i--) {
    const auto iter = slots_.find(inputs[i]);
    if (iter != slots_.end() && iter->second >= height_) {
      slots_.erase(inputs[i]);
    }
  }
  return RET_SUCCESS;
 }
@@ -573,9 +580,6 @@ BackendPtr CreateBackend() {
      if (MsContext::GetInstance()->get_param<int>(MS_CTX_EXECUTION_MODE) == kPynativeMode) {
        backend->set_is_multi_graph_sink(false);
        context_ptr->set_param<bool>(MS_CTX_IS_MULTI_GRAPH_SINK, false);
      } else {
        backend->set_is_multi_graph_sink(true);
        context_ptr->set_param<bool>(MS_CTX_IS_MULTI_GRAPH_SINK, true);
      }
    }
    return backend;
--- a/mindspore/core/ir/func_graph_cloner.cc
+++ b/mindspore/core/ir/func_graph_cloner.cc
@@ -759,10 +759,8 @@ FuncGraphPtr TransformableClone(const FuncGraphPtr &func_graph, const TraceInfoP
    new_func_graph->set_param_default_value(item.first, cloner[item.second]);
  }
  if (MsContext::GetInstance()->get_param<bool>(MS_CTX_IS_MULTI_GRAPH_SINK)) {
    if (func_graph->has_flag(FUNC_GRAPH_FLAG_IGNORE_VALUES)) {
      new_func_graph->set_flag(FUNC_GRAPH_FLAG_IGNORE_VALUES, true);
    }
  if (func_graph->has_flag(FUNC_GRAPH_FLAG_IGNORE_VALUES)) {
    new_func_graph->set_flag(FUNC_GRAPH_FLAG_IGNORE_VALUES, true);
  }
  if (func_graph->has_attr(FUNC_GRAPH_ATTR_GRAPH_KERNEL)) {
--- a/mindspore/dataset/vision/c_transforms.py
+++ b/mindspore/dataset/vision/c_transforms.py
@@ -54,7 +54,7 @@ from .validators import check_prob, check_crop, check_center_crop, check_resize_
    check_uniform_augment_cpp, \
    check_bounding_box_augment_cpp, check_random_select_subpolicy_op, check_auto_contrast, check_random_affine, \
    check_random_solarize, check_soft_dvpp_decode_random_crop_resize_jpeg, check_positive_degrees, FLOAT_MAX_INTEGER, \
    check_cut_mix_batch_c, check_posterize, check_gaussian_blur, check_rotate, check_slice_patches
    check_cut_mix_batch_c, check_posterize, check_gaussian_blur, check_rotate, check_slice_patches, check_adjust_gamma
 from ..transforms.c_transforms import TensorOperation
@@ -107,6 +107,37 @@ def parse_padding(padding):
    return padding
 class AdjustGamma(ImageTensorOperation):
    r"""
    Apply gamma correction on input image. Input image is expected to be in [..., H, W, C] or [H, W, C] format.
    .. math::
        I_{\text{out}} = 255 \times \text{gain} \times \left(\frac{I_{\text{in}}}{255}\right)^{\gamma}
    See `Gamma Correction`_ for more details.
    .. _Gamma Correction: https://en.wikipedia.org/wiki/Gamma_correction
    Args:
        gamma (float): Non negative real number.
            The output image pixel value is exponentially related to the input image pixel value.
            gamma larger than 1 make the shadows darker,
            while gamma smaller than 1 make dark regions lighter.
        gain (float, optional): The constant multiplier (default=1).
    Examples:
        >>> transforms_list = [c_vision.Decode(), c_vision.AdjustGamma(gamma=10.0, gain=1.0)]
        >>> image_folder_dataset = image_folder_dataset.map(operations=transforms_list,
        ...                                                 input_columns=["image"])
    """
    @check_adjust_gamma
    def __init__(self, gamma, gain=1):
        self.gamma = gamma
        self.gain = gain
    def parse(self):
        return cde.AdjustGammaOperation(self.gamma, self.gain)
 class AutoContrast(ImageTensorOperation):
    """
    Apply automatic contrast on input image. This operator calculates histogram of image, reassign cutoff percent
--- a/mindspore/dataset/vision/py_transforms.py
+++ b/mindspore/dataset/vision/py_transforms.py
@@ -31,7 +31,8 @@ from .validators import check_prob, check_center_crop, check_five_crop, check_re
    check_normalize_py, check_normalizepad_py, check_random_crop, check_random_color_adjust, check_random_rotation, \
    check_ten_crop, check_num_channels, check_pad, check_rgb_to_hsv, check_hsv_to_rgb, \
    check_random_perspective, check_random_erasing, check_cutout, check_linear_transform, check_random_affine, \
    check_mix_up, check_positive_degrees, check_uniform_augment_py, check_auto_contrast, check_rgb_to_bgr
    check_mix_up, check_positive_degrees, check_uniform_augment_py, check_auto_contrast, check_rgb_to_bgr, \
    check_adjust_gamma
 from .utils import Inter, Border
 from .py_transforms_util import is_pil
@@ -1375,7 +1376,6 @@ class RgbToBgr:
        return util.rgb_to_bgrs(rgb_imgs, self.is_hwc)
 class RgbToHsv:
    """
    Convert a NumPy RGB image or a batch of NumPy RGB images to HSV images.
@@ -1525,6 +1525,44 @@ class RandomSharpness:
        return util.random_sharpness(img, self.degrees)
 class AdjustGamma:
    """
    Adjust gamma of the input PIL image.
    Args:
        gamma (float): Non negative real number, same as gamma in the equation.
        gain (float, optional): The constant multiplier.
    Examples:
        >>> from mindspore.dataset.transforms.py_transforms import Compose
        >>> transforms_list = Compose([py_vision.Decode(),
        ...                            py_vision.AdjustGamma(),
        ...                            py_vision.ToTensor()])
        >>> # apply the transform to dataset through map function
        >>> image_folder_dataset = image_folder_dataset.map(operations=transforms_list,
        ...                                                 input_columns="image")
    """
    @check_adjust_gamma
    def __init__(self, gamma, gain=1.0):
        self.gamma = gamma
        self.gain = gain
        self.random = False
    def __call__(self, img):
        """
        Call method.
        Args:
            img (PIL image): Image to be augmented with AutoContrast.
        Returns:
            img (PIL image), Augmented image.
        """
        return util.adjust_gamma(img, self.gamma, self.gain)
 class AutoContrast:
    """
    Automatically maximize the contrast of the input PIL image.
--- a/mindspore/dataset/vision/py_transforms_util.py
+++ b/mindspore/dataset/vision/py_transforms_util.py
@@ -19,7 +19,6 @@ import math
 import numbers
 import random
 import colorsys
 import numpy as np
 from PIL import Image, ImageOps, ImageEnhance, __version__
@@ -1243,6 +1242,7 @@ def rgb_to_bgr(np_rgb_img, is_hwc):
        np_bgr_img = np_rgb_img[::-1, :, :]
    return np_bgr_img
 def rgb_to_bgrs(np_rgb_imgs, is_hwc):
    """
    Convert RGB imgs to BGR imgs.
@@ -1473,6 +1473,32 @@ def random_sharpness(img, degrees):
    return ImageEnhance.Sharpness(img).enhance(v)
 def adjust_gamma(img, gamma, gain):
    """
    Adjust gamma of the input PIL image.
    Args:
        img (PIL image): Image to be augmented with AdjustGamma.
        gamma (float): Non negative real number, same as gamma in the equation.
        gain (float, optional): The constant multiplier.
    Returns:
        img (PIL image), Augmented image.
    """
    if not is_pil(img):
        raise TypeError("img should be PIL image. Got {}.".format(type(img)))
    gamma_table = [(255 + 1 - 1e-3) * gain * pow(x / 255., gamma) for x in range(256)]
    if len(img.split()) == 3:
        gamma_table = gamma_table * 3
        img = img.point(gamma_table)
    elif len(img.split()) == 1:
        img = img.point(gamma_table)
    return img
 def auto_contrast(img, cutoff, ignore):
    """
    Automatically maximize the contrast of the input PIL image.
--- a/mindspore/dataset/vision/validators.py
+++ b/mindspore/dataset/vision/validators.py
@@ -19,10 +19,10 @@ from functools import wraps
 import numpy as np
 from mindspore._c_dataengine import TensorOp, TensorOperation
 from mindspore.dataset.core.validator_helpers import check_value, check_uint8, FLOAT_MAX_INTEGER, check_pos_float32, \
    check_float32, check_2tuple, check_range, check_positive, INT32_MAX, INT32_MIN, parse_user_args, type_check, \
    type_check_list, check_c_tensor_op, UINT8_MAX, check_value_normalize_std, check_value_cutoff, check_value_ratio, \
    check_odd
 from mindspore.dataset.core.validator_helpers import check_value, check_uint8, FLOAT_MIN_INTEGER, FLOAT_MAX_INTEGER, \
    check_pos_float32, check_float32, check_2tuple, check_range, check_positive, INT32_MAX, INT32_MIN, \
    parse_user_args, type_check, type_check_list, check_c_tensor_op, UINT8_MAX, check_value_normalize_std, \
    check_value_cutoff, check_value_ratio, check_odd
 from .utils import Inter, Border, ImageBatchFormat, SliceMode
@@ -788,6 +788,22 @@ def check_bounding_box_augment_cpp(method):
    return new_method
 def check_adjust_gamma(method):
    """Wrapper method to check the parameters of AdjustGamma ops (Python and C++)."""
    @wraps(method)
    def new_method(self, *args, **kwargs):
        [gamma, gain], _ = parse_user_args(method, *args, **kwargs)
        type_check(gamma, (float, int), "gamma")
        check_value(gamma, (0, FLOAT_MAX_INTEGER))
        if gain is not None:
            type_check(gain, (float, int), "gain")
            check_value(gain, (FLOAT_MIN_INTEGER, FLOAT_MAX_INTEGER))
        return method(self, *args, **kwargs)
    return new_method
 def check_auto_contrast(method):
    """Wrapper method to check the parameters of AutoContrast ops (Python and C++)."""
--- a/mindspore/lite/build_lite.sh
+++ b/mindspore/lite/build_lite.sh
@@ -226,13 +226,13 @@ build_lite() {
          compile_nnie_script=${BASEPATH}/mindspore/lite/tools/providers/NNIE/Hi3516D/compile_nnie.sh
          cd ${BASEPATH}/../
          if [[ "${local_lite_platform}" == "x86_64" ]]; then
            sh ${compile_nnie_script} -I x86_64 -b nnie_master -j $THREAD_NUM
            sh ${compile_nnie_script} -I x86_64 -b nnie_r1.4 -j $THREAD_NUM
            if [[ $? -ne 0 ]]; then
              echo "compile x86_64 for nnie failed."
              exit 1
            fi
          elif [[ "${local_lite_platform}" == "arm32" ]]; then
            sh ${compile_nnie_script} -I arm32 -b nnie_master -j $THREAD_NUM
            sh ${compile_nnie_script} -I arm32 -b nnie_r1.4 -j $THREAD_NUM
            if [[ $? -ne 0 ]]; then
              echo "compile arm32 for nnie failed."
              exit 1
--- a/mindspore/lite/src/CMakeLists.txt
+++ b/mindspore/lite/src/CMakeLists.txt
@@ -133,6 +133,7 @@ set(TRAIN_SRC
        ${CMAKE_CURRENT_SOURCE_DIR}/train/accuracy_monitor.cc
        ${CMAKE_CURRENT_SOURCE_DIR}/train/classification_train_accuracy_monitor.cc
        ${CMAKE_CURRENT_SOURCE_DIR}/train/train_export.cc
        ${CMAKE_CURRENT_SOURCE_DIR}/train/opt_allocator.cc
        ${CMAKE_CURRENT_SOURCE_DIR}/../tools/common/storage.cc
        )
 if(ENABLE_V0)
--- a/mindspore/lite/src/lite_session.cc
+++ b/mindspore/lite/src/lite_session.cc
@@ -209,7 +209,10 @@ int LiteSession::ConvertTensors(const lite::Model *model) {
        MS_LOG(ERROR) << "Graph output shouldn't have data";
        return RET_ERROR;
      }
      dst_tensor->set_category(Tensor::GRAPH_OUTPUT);
      // a tensor is as both input and output, would be treated as an input.
      if (!dst_tensor->IsGraphInput()) {
        dst_tensor->set_category(Tensor::GRAPH_OUTPUT);
      }
    }
    if (src_tensor->name() != nullptr) {
      dst_tensor->set_tensor_name(src_tensor->name()->str());
@@ -364,16 +367,13 @@ void LiteSession::InitGraphInOutTensorsMap(const lite::Model *model) {
  InitGraphInputMap(model);
  InitGraphOutputNodeMap(model);
  InitGraphOutputTensorMap(model);
  for (auto *tensor : this->inputs_) {
    tensor->set_category(Tensor::Category::GRAPH_INPUT);
  }
  for (auto *tensor : this->outputs_) {
    tensor->set_category(Tensor::Category::GRAPH_OUTPUT);
  }
 }
 void LiteSession::IsolateOutputTensor() {
  for (Tensor *src_tensor : outputs_) {
    if (src_tensor->IsGraphInput()) {
      continue;
    }
    Tensor *new_tensor =
      new Tensor(src_tensor->data_type(), src_tensor->shape(), src_tensor->format(), Tensor::GRAPH_OUTPUT);
    new_tensor->set_allocator(src_tensor->allocator()); /* GPU use opencl allocator */
--- a/mindspore/lite/src/mindrt_executor.cc
+++ b/mindspore/lite/src/mindrt_executor.cc
@@ -24,16 +24,24 @@
 namespace mindspore::lite {
 void MindrtExecutor::PrepareInputData(const std::vector<kernel::LiteKernel *> &kernels,
                                      const std::vector<Tensor *> &inputs) {
  for (size_t i = 0; i < inputs.size(); ++i) {
    for (size_t j = 0; j < kernels.size(); ++j) {
      auto in_tensor_size = kernels[j]->in_tensors().size();
      for (size_t k = 0; k < in_tensor_size; ++k) {
        if (inputs[i] != kernels[j]->in_tensors()[k]) {
          continue;
        }
        auto data = std::make_shared<OpData<Tensor>>(op_actors_[j]->GetAID(), inputs[i], static_cast<int>(k));
        input_data_.emplace_back(data);
  for (size_t j = 0; j < kernels.size(); ++j) {
    auto in_tensor_size = kernels[j]->in_tensors().size();
    for (size_t k = 0; k < in_tensor_size; ++k) {
      auto tensor = kernels[j]->in_tensors()[k];
      if (!tensor->IsGraphInput()) {
        continue;
      }
      size_t idx = std::find(inputs.begin(), inputs.end(), tensor) - inputs.begin();
      if (idx == inputs.size()) {
        MS_LOG(ERROR) << "The input is not found.";
        return;
      }
      auto data = std::make_shared<OpData<Tensor>>(op_actors_[j]->GetAID(), inputs.at(idx), static_cast<int>(k));
      if (data == nullptr) {
        MS_LOG(ERROR) << "new opdata failed.";
        return;
      }
      input_data_.emplace_back(data);
    }
  }
 }
@@ -42,6 +50,9 @@ void MindrtExecutor::PrepareOutputData(const std::vector<kernel::LiteKernel *> &
                                       const std::vector<Tensor *> &outputs) {
  for (size_t i = 0; i < outputs.size(); ++i) {
    Tensor *graph_output_tensor = outputs[i];
    if (graph_output_tensor->IsGraphInput()) {
      continue;
    }
    auto current_output_map =
      std::find_if(output_tensor_map_->begin(), output_tensor_map_->end(), [&](const auto output_map_tensor) {
        if (graph_output_tensor == output_map_tensor.second) {
--- a/mindspore/lite/src/runtime/kernel/arm/base/group_convolution_base.cc
+++ b/mindspore/lite/src/runtime/kernel/arm/base/group_convolution_base.cc
@@ -74,6 +74,10 @@ void GroupConvolutionBaseCPUKernel::FreeSubKernel() {
    sub_conv = nullptr;
  }
  group_convs_.clear();
  if (group_conv_creator_ != nullptr) {
    delete group_conv_creator_;
    group_conv_creator_ = nullptr;
  }
 }
 int GroupConvolutionBaseCPUKernel::PreProcess() {
--- a/mindspore/lite/src/runtime/kernel/arm/fp16/pad_fp16.cc
+++ b/mindspore/lite/src/runtime/kernel/arm/fp16/pad_fp16.cc
@@ -26,7 +26,8 @@ using mindspore::schema::PrimitiveType_PadFusion;
 namespace mindspore::kernel {
 namespace {
 constexpr size_t kPadMaxInputSize = 2;
 constexpr size_t kPadCommonInputSize = 2;
 constexpr size_t kPadMaxInputSize = 3;
 }  // namespace
 int PadFp16CPUKernel::RunImpl(int task_id) {
  PadFp16(input_, output_, in_, out_, pad_param_->paddings_, task_id, op_parameter_->thread_num_);
@@ -53,8 +54,14 @@ int PadFp16CPUKernel::RunMirrorPadImpl(int task_id) {
        for (int b = 0; b < block.size_[1]; b++) {
          int out_b_index = out_a_index + b * block.out_stride_[1];
          for (int c = 0; c < block.size_[2]; ++c) {
            int output_index = out_b_index + c * block.out_stride_[2];
            MirrorPadFp16(input_data, output_data, in_, pad_param_, output_index, output_index + block.size_[3]);
            int out_c_index = out_b_index + c * block.out_stride_[2];
            for (int d = 0; d < block.size_[3]; ++d) {
              int out_d_index = out_c_index + d * block.out_stride_[3];
              for (int e = 0; e < block.size_[4]; ++e) {
                int output_index = out_d_index + e * block.out_stride_[4];
                MirrorPadFp16(input_data, output_data, in_, pad_param_, output_index, output_index + block.size_[5]);
              }
            }
          }
        }
      }
@@ -88,13 +95,16 @@ int PadFp16CPUKernel::Run() {
  MS_ASSERT(output_ != nullptr);
  int ret = 0;
  if (pad_param_->pad_mode_ == static_cast<int>(schema::PaddingMode_CONSTANT)) {
    if (in_tensors_.size() == kPadMaxInputSize) {
    if (in_tensors_.size() >= kPadCommonInputSize) {
      ret = CopyPaddingFromInput();
      if (ret != RET_OK) {
        MS_LOG(ERROR) << "PadFp16CPUKernel CopyPaddingFromInput failed";
        return RET_ERROR;
      }
    }
    if (in_tensors_.size() == kPadMaxInputSize) {
      pad_param_->constant_value_ = reinterpret_cast<float *>(in_tensors_.at(2)->data_c())[0];
    }
    if (pad_param_->constant_value_ - 0.0f < 1e-5) {
      memset(output_, 0, output_tensor->ElementsNum() * sizeof(float16_t));
    } else {
--- a/mindspore/lite/src/runtime/kernel/arm/fp32/pad_fp32.cc
+++ b/mindspore/lite/src/runtime/kernel/arm/fp32/pad_fp32.cc
@@ -28,7 +28,8 @@ using mindspore::schema::PrimitiveType_PadFusion;
 namespace mindspore::kernel {
 namespace {
 constexpr size_t kMirrorPadInputSize = 2;
 constexpr size_t kPadMaxInputSize = 2;
 constexpr size_t kPadCommonInputSize = 2;
 constexpr size_t kPadMaxInputSize = 3;
 }  // namespace
 int PadCPUKernel::Init() {
  if (!InferShapeDone()) {
@@ -40,30 +41,30 @@ int PadCPUKernel::Init() {
 int PadCPUKernel::ReSize() {
  auto input = in_tensors_.at(0);
  auto rank = input->shape().size();
  if (rank > COMM_SHAPE_SIZE) {
    MS_LOG(ERROR) << "Pad input rank should <= " << COMM_SHAPE_SIZE << ", got " << rank;
  if (rank > DEFAULT_PAD_NDIMS) {
    MS_LOG(ERROR) << "Pad input rank should <= " << DEFAULT_PAD_NDIMS << ", got " << rank;
    return RET_ERROR;
  }
  auto output = out_tensors_.at(0);
  if (pad_param_->pad_mode_ == static_cast<int>(schema::PaddingMode_CONSTANT)) {
    auto ret = ExtendShape(in_, COMM_SHAPE_SIZE, input->shape().data(), rank);
    auto ret = ExtendShape(in_, DEFAULT_PAD_NDIMS, input->shape().data(), rank);
    if (ret != RET_OK) {
      return ret;
    }
    ret = ExtendShape(out_, COMM_SHAPE_SIZE, output->shape().data(), rank);
    ret = ExtendShape(out_, DEFAULT_PAD_NDIMS, output->shape().data(), rank);
    if (ret != RET_OK) {
      return ret;
    }
    if (pad_param_->padding_length < MAX_SHAPE_SIZE) {
      int ori_paddings[MAX_SHAPE_SIZE];
    if (pad_param_->padding_length < MAX_PAD_SIZE) {
      int ori_paddings[MAX_PAD_SIZE];
      for (auto i = 0; i < pad_param_->padding_length; ++i) {
        ori_paddings[i] = pad_param_->paddings_[i];
      }
      ret = ExtendPaddings(pad_param_->paddings_, MAX_SHAPE_SIZE, ori_paddings, pad_param_->padding_length);
      ret = ExtendPaddings(pad_param_->paddings_, MAX_PAD_SIZE, ori_paddings, pad_param_->padding_length);
      if (ret != RET_OK) {
        return ret;
      }
      pad_param_->padding_length = MAX_SHAPE_SIZE;
      pad_param_->padding_length = MAX_PAD_SIZE;
    }
  }
  return RET_OK;
@@ -71,8 +72,8 @@ int PadCPUKernel::ReSize() {
 void PadCPUKernel::InitMirrorPadBlock() {
  mirror_pad_block_.clear();
  std::vector<int> left_pads(COMM_SHAPE_SIZE);
  for (size_t i = 0; i < COMM_SHAPE_SIZE; ++i) {
  std::vector<int> left_pads(DEFAULT_PAD_NDIMS);
  for (size_t i = 0; i < DEFAULT_PAD_NDIMS; ++i) {
    left_pads[i] = pad_param_->paddings_[2 * i];
  }
@@ -83,7 +84,7 @@ void PadCPUKernel::InitMirrorPadBlock() {
  /* init separate dims */
  int cur_input = 1;
  int cur_output = 1;
  for (size_t i = 0; i < COMM_SHAPE_SIZE; ++i) {
  for (size_t i = 0; i < DEFAULT_PAD_NDIMS; ++i) {
    if (cur_input > 1) {
      input_separate_dims.emplace_back(cur_input);
      output_separate_dims.emplace_back(cur_output);
@@ -149,7 +150,7 @@ void PadCPUKernel::InitMirrorPadBlock() {
      }
      MirrorPadBlock block;
      const int size_offset = COMM_SHAPE_SIZE - static_cast<int>(pad_region.size());
      const int size_offset = DEFAULT_PAD_NDIMS - static_cast<int>(pad_region.size());
      for (size_t i = 0; i < pad_region.size(); ++i) {
        int di = size_offset + i;
        int si = remain_dim_offset + i;
@@ -265,8 +266,14 @@ int PadCPUKernel::RunMirrorPadImpl(int task_id) {
        for (int b = 0; b < block.size_[1]; b++) {
          int out_b_index = out_a_index + b * block.out_stride_[1];
          for (int c = 0; c < block.size_[2]; ++c) {
            int output_index = out_b_index + c * block.out_stride_[2];
            MirrorPad(input_data, output_data, in_, pad_param_, output_index, output_index + block.size_[3]);
            int out_c_index = out_b_index + c * block.out_stride_[2];
            for (int d = 0; d < block.size_[3]; ++d) {
              int out_d_index = out_c_index + d * block.out_stride_[3];
              for (int e = 0; e < block.size_[4]; ++e) {
                int output_index = out_d_index + e * block.out_stride_[4];
                MirrorPad(input_data, output_data, in_, pad_param_, output_index, output_index + block.size_[5]);
              }
            }
          }
        }
      }
@@ -310,8 +317,8 @@ int PadCPUKernel::CheckPaddings(int *paddings, int length, int *input_shape, int
 }
 int PadCPUKernel::CopyPaddingFromInput() {
  if (in_tensors_.size() != kMirrorPadInputSize) {
    MS_LOG(ERROR) << "Pad Reflect or Symmetric mode need 2 inputs, got " << in_tensors_.size();
  if (in_tensors_.size() < kMirrorPadInputSize) {
    MS_LOG(ERROR) << "Pad Reflect or Symmetric mode need at least 2 inputs, got " << in_tensors_.size();
    return RET_ERROR;
  }
  auto padding_tensor = in_tensors_.at(1);
@@ -327,28 +334,28 @@ int PadCPUKernel::CopyPaddingFromInput() {
    return RET_ERROR;
  }
  auto ret = ExtendShape(in_, COMM_SHAPE_SIZE, input_shape.data(), rank);
  auto ret = ExtendShape(in_, DEFAULT_PAD_NDIMS, input_shape.data(), rank);
  if (ret != RET_OK) {
    return ret;
  }
  ret = ExtendPaddings(pad_param_->paddings_, MAX_SHAPE_SIZE, paddings, padding_tensor->ElementsNum());
  ret = ExtendPaddings(pad_param_->paddings_, MAX_PAD_SIZE, paddings, padding_tensor->ElementsNum());
  if (ret != RET_OK) {
    return ret;
  }
  pad_param_->padding_length = MAX_SHAPE_SIZE;
  pad_param_->padding_length = MAX_PAD_SIZE;
  return RET_OK;
 }
 void PadCPUKernel::CalculateStrides() {
  pad_param_->in_strides[COMM_SHAPE_SIZE - 1] = 1;
  for (auto i = COMM_SHAPE_SIZE - 2; i >= 0; --i) {
  pad_param_->in_strides[DEFAULT_PAD_NDIMS - 1] = 1;
  for (auto i = DEFAULT_PAD_NDIMS - 2; i >= 0; --i) {
    pad_param_->in_strides[i] = in_[i + 1] * pad_param_->in_strides[i + 1];
  }
  for (auto i = 0; i < COMM_SHAPE_SIZE; ++i) {
  for (auto i = 0; i < DEFAULT_PAD_NDIMS; ++i) {
    out_[i] = in_[i] + pad_param_->paddings_[i * 2] + pad_param_->paddings_[i * 2 + 1];
  }
  pad_param_->out_strides[COMM_SHAPE_SIZE - 1] = 1;
  for (auto i = COMM_SHAPE_SIZE - 2; i >= 0; --i) {
  pad_param_->out_strides[DEFAULT_PAD_NDIMS - 1] = 1;
  for (auto i = DEFAULT_PAD_NDIMS - 2; i >= 0; --i) {
    pad_param_->out_strides[i] = out_[i + 1] * pad_param_->out_strides[i + 1];
  }
 }
@@ -358,7 +365,7 @@ int PadCPUKernel::HandleMirrorPad() {
  if (in_tensors_.size() == 1) {
    auto input_shape = in_tensors_.at(0)->shape();
    int rank = static_cast<int>(input_shape.size());
    ret = ExtendShape(in_, COMM_SHAPE_SIZE, input_shape.data(), rank);
    ret = ExtendShape(in_, DEFAULT_PAD_NDIMS, input_shape.data(), rank);
    if (ret != RET_OK) {
      return ret;
    }
@@ -368,7 +375,7 @@ int PadCPUKernel::HandleMirrorPad() {
      return ret;
    }
  }
  ret = CheckPaddings(pad_param_->paddings_, COMM_SHAPE_SIZE, in_, pad_param_->pad_mode_);
  ret = CheckPaddings(pad_param_->paddings_, DEFAULT_PAD_NDIMS, in_, pad_param_->pad_mode_);
  if (ret != RET_OK) {
    return ret;
  }
@@ -391,13 +398,16 @@ int PadCPUKernel::Run() {
  }
  int error_code = 0;
  if (pad_param_->pad_mode_ == static_cast<int>(schema::PaddingMode_CONSTANT)) {
    if (in_tensors_.size() == kPadMaxInputSize) {
    if (in_tensors_.size() >= kPadCommonInputSize) {
      error_code = CopyPaddingFromInput();
      if (error_code != RET_OK) {
        MS_LOG(ERROR) << "Pad run error, error_code[" << error_code << "]";
        return RET_ERROR;
      }
    }
    if (in_tensors_.size() == kPadMaxInputSize) {
      pad_param_->constant_value_ = reinterpret_cast<float *>(in_tensors_.at(2)->data_c())[0];
    }
    auto output = out_tensors_.at(0);
    int output_size = output->ElementsNum();
    auto output_data = reinterpret_cast<float *>(output->data_c());
--- a/mindspore/lite/src/runtime/kernel/arm/fp32/pad_fp32.h
+++ b/mindspore/lite/src/runtime/kernel/arm/fp32/pad_fp32.h
@@ -55,8 +55,8 @@ class PadCPUKernel : public InnerKernel {
  int HandleMirrorPad();
  int CopyPaddingFromInput();
  PadParameter *pad_param_ = nullptr;
  int in_[4] = {0};
  int out_[4] = {0};
  int in_[DEFAULT_PAD_NDIMS] = {0};
  int out_[DEFAULT_PAD_NDIMS] = {0};
  std::vector<MirrorPadBlock> mirror_pad_block_;
 };
--- a/mindspore/lite/src/tensor.cc
+++ b/mindspore/lite/src/tensor.cc
@@ -316,7 +316,9 @@ void Tensor::FreeData() {
    this->data_ = nullptr;
  } else {
    allocator_->Free(this->data_);
    this->data_ = nullptr;
    if (!IS_STATIC_ALLOCATOR(allocator_) || (allocator_->RefCount(this->data_) != 0)) {
      this->data_ = nullptr;
    }
  }
 }
--- a/mindspore/lite/src/tensor.h
+++ b/mindspore/lite/src/tensor.h
@@ -34,12 +34,15 @@
 namespace mindspore {
 namespace lite {
 #define STATIC_ALLOCATION -271964
 #define IS_STATIC_ALLOCATOR(allocator) ((allocator != nullptr) && (allocator->RefCount(nullptr) == STATIC_ALLOCATION))
 struct LiteQuantParam {
  double scale;
  int32_t zeroPoint;
  float var_corr{1};
  float mean_corr{0};
  bool inited;
  bool inited{false};
  std::vector<float> clusters{};
  int bitNum;
  int roundType;
@@ -133,7 +136,6 @@ class Tensor : public mindspore::tensor::MSTensor {
  void set_format(mindspore::Format format) override { this->format_ = format; }
  mindspore::Format format() const override { return this->format_; }
  virtual int ref_count() const { return ref_count_; }
  virtual int init_ref_count() const { return this->init_ref_count_; }
--- a/mindspore/lite/src/train/opt_allocator.cc
+++ b/mindspore/lite/src/train/opt_allocator.cc
@@ -0,0 +1,90 @@
 /**
 * Copyright 2021 Huawei Technologies Co., Ltd
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 * http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */
 #include "src/train/opt_allocator.h"
 #include <limits>
 #include "nnacl/op_base.h"
 namespace mindspore {
 size_t OptAllocator::FindFree(size_t size) {
  size_t min_size = std::numeric_limits<size_t>::max();
  size_t min_addr = std::numeric_limits<size_t>::max();
  for (auto const &itr : arena_) {
    // best fit
    if (itr.second >= size) {
      if (min_size > itr.second) {
        min_size = itr.second;
        min_addr = itr.first;
      }
    }
  }
  return min_addr;
 }
 void OptAllocator::Reorder(size_t addr) {
  size_t length = arena_[addr];
  size_t post = addr + length;
  // connect to upper block
  auto it = arena_.find(post);
  if (it != arena_.end()) {
    size_t post_size = it->second;
    arena_[addr] = length + post_size;
    arena_.erase(post);
  }
  // connect to lower block
  auto itr = arena_.lower_bound(addr);
  if (itr != arena_.begin()) {
    itr--;
    size_t last = itr->first;
    if ((last + arena_[last]) == addr) {
      arena_[last] = arena_[last] + arena_[addr];
      arena_.erase(addr);
    }
  }
 }
 size_t OptAllocator::Malloc(size_t size) {
  size = UP_DIV(size, align_size_) * align_size_;
  size_t addr = FindFree(size);
  // free block not found
  if (addr == std::numeric_limits<size_t>::max()) {
    if (!arena_.empty()) {
      addr = arena_.rbegin()->first;
      if (addr + arena_[addr] < heap_) {
        addr = heap_;
      } else {
        arena_.erase(addr);
      }
    } else {
      addr = heap_;
    }
    heap_ = addr + size;
  } else {
    if (arena_[addr] > size) {
      arena_[addr + size] = arena_[addr] - size;
    }
    arena_.erase(addr);
  }
  alloc_[addr] = size;
  return addr;
 }
 void OptAllocator::Free(size_t addr) {
  arena_[addr] = alloc_[addr];
  alloc_.erase(addr);
  Reorder(addr);
 }
 }  // namespace mindspore
Author	SHA1	Message	Date
i-robot	1f96911490	!26315 [Resnet] revert to PR 24604 Merge pull request !26315 from zhouneng/revert_pr_24604	4 years ago
zhouneng2	509e0e51fb	revert PR 24604	4 years ago
i-robot	3e8e59569a	!25420 Fix deepspeech link Merge pull request !25420 from wanyiming/r1.4_dp2	4 years ago
wanyiming	809d8b3ce3	fix deepspeech link	4 years ago
jjfeing	4aa34a7473	!25360 fix release md r1.4 Merge pull request !25360 from jjfeing/r1.4	4 years ago
jjfeing	3e2e4571ea	fix release md	4 years ago
jjfeing	5074a00911	!25307 update release r1.4 Merge pull request !25307 from jjfeing/r1.4	4 years ago
jjfeing	9a61ce075e	updata release r1.4	4 years ago
i-robot	4026500fa9	!24604 rename keyword argument from 'acc_level' to 'boost_level' in class Model.__init__ Merge pull request !24604 from zhouneng/fix_issue_acc_level_rename_r1.4	4 years ago
zhouneng2	cb561323e1	rename keyword argument from 'acc_level' to 'boost_level' in class Model.__init__	4 years ago
i-robot	31068de695	!24402 modify fasttext run scripts for r1.4 Merge pull request !24402 from zhanghuiyao/r1.4_fasttext	4 years ago
zhanghuiyao	dae78c7430	modify fasttext run scripts	4 years ago
i-robot	683439f0dd	!23047 23046 fix offline debugger read tensor and online debugger overflow and check watchpoint bug r1.4 Merge pull request !23047 from parastooashtari/overflow_bug_1.4	4 years ago
i-robot	e5c0b80492	!22892 Fix convert async dump files problem in r1.4 branch Merge pull request !22892 from TinaMengtingZhang/convert_async_bug_r1_4	4 years ago
Parastoo Ashtari	ed837dea58	fixed overflow, offline debugger and online debugger check wp bug	4 years ago
TinaMengtingZhang	1379a829ce	Fix convert async dump files failed issue and refactor convert_async.py Comments addressed Adapt async dump conversion tool with new run package (Sep03 version)	4 years ago
i-robot	9baffac2d3	!23247 [MSLITE] Fix bug of tf model‘s output names. Merge pull request !23247 from wangshaocong/convert_r1.4	4 years ago
wang_shaocong	87174d9561	[MSLITE] Convert a tf model whose input is also an output.	4 years ago
i-robot	32658645b5	!22849 Fix issue for multi card check watchpoint r1.4 Merge pull request !22849 from parastooashtari/offline_dbg_bug_1.4	4 years ago
i-robot	9af75bb1e7	!23038 Profiler deal with the case that cpu utilization data may not have information for each pipeline op. Merge pull request !23038 from casgj/r1.4	4 years ago
gaojing	4148cfc0c3	Profiler deal with the case that the cpu utilization data may not have information for each pipeline op.	4 years ago
Parastoo Ashtari	a7e977a25e	fix checkwatchpoints multi card issue	4 years ago
i-robot	9c570e316e	!22641 fix yolov3 tiny multi root graph seg fault Merge pull request !22641 from john_tzanakakis/1.4	4 years ago
i-robot	0fe6e20502	!22724 Fix for the set value of monitoring point is inconsistent with the actual value issue r1.4 Merge pull request !22724 from parastooashtari/offline_dbg_bug_1.4	4 years ago
Parastoo Ashtari	d76d1df56d	fix inconsistent wp value issue in offline dbg	4 years ago
John Tzanakakis	e0c4f0bfb1	fix yolov3 tiny multi root graph seg fault	4 years ago
i-robot	f290098e3d	!22480 modify the limit of axis of reduce ops Merge pull request !22480 from Simson/push-to-r1.4	4 years ago
simson	52c5f9befe	modify the limit of axis of reduce ops	4 years ago
i-robot	43f5ec7241	!22415 fix bug on maskrcnn/fasterrcnn Merge pull request !22415 from gengdongjie/r1.4	4 years ago
gengdongjie	5d7a31fc9d	Fix the execution sequence problem of the load in maketuple	4 years ago
i-robot	cec2fea2ee	!22234 [MSLITE] Fix ci test. Merge pull request !22234 from wangshaocong/bugfix_r1.4	4 years ago
wang_shaocong	907c2553d0	[MSLITE]fix ci test	4 years ago
i-robot	3e9b8f23d1	!22119 [MS][LITE][r1.4] fix entrance gurad problem Merge pull request !22119 from XianglongZeng/r1.4	4 years ago
zengxianglong	fa3a07cd8b	fix entrance guard problem	4 years ago
i-robot	fe435e296e	!22125 fix uint8 overflow bug Merge pull request !22125 from yuchaojie/r1.4	4 years ago
i-robot	215d258411	!21866 Ascend control use vm r1.4 Merge pull request !21866 from chenfei_mindspore/ascend-control-use-vm-r1.4	4 years ago
i-robot	42429c5ecb	!22103 Add the download link of vgg19-0-97_5004.ckpt to openpose for r1.4 Merge pull request !22103 from zhanghuiyao/fix_openpose_readme_r1.4	4 years ago
yuchaojie	0b778d936f	fix uint8 overflow bug	4 years ago
i-robot	0713e47db1	!22095 redirect model input bin Merge pull request !22095 from hangq/graph_inout	4 years ago
chenfei	5b1639ad56	ascend control use vm	4 years ago
zhanghuiyao	5d8bf2b56f	Add vgg download link to openpose readme.md for r1.4	4 years ago
hangangqiang	9e099f8497	redirect model input bin	4 years ago
i-robot	aecb280e17	!22064 fix graph input Merge pull request !22064 from hangq/graph_inout	4 years ago
i-robot	b7fd96b182	!21876 reduce func graph number by elimiate incorporate getitem Merge pull request !21876 from xychow/reduce-graph-number-by-eliminate-incorporate-getitem	4 years ago
hangangqiang	898e63ea6f	fix graph input	4 years ago
i-robot	cd14d06971	!21968 use temporary dir as dump dir in testcase Merge pull request !21968 from yelihua/ylh_r14	4 years ago
yelihua	b4c82be639	use temporary dir as dump dir	4 years ago
i-robot	c35d32d45a	!21867 update the method for get rank_id Merge pull request !21867 from yelihua/ylh_r14	4 years ago
yelihua	425b1168ad	get rank id when set hccl env for single card train	4 years ago
i-robot	f40c33ff18	!21856 Fix comments in LossBase.get_loss() Merge pull request !21856 from chenhaozhe/code_docs_change_lossBase_getloss_1.4	4 years ago
i-robot	780f40395a	!21885 [MSLITE] Fix bug of tf model‘s output names. Merge pull request !21885 from wangshaocong/convert_name_r1.4	4 years ago
wang_shaocong	8f60482bfd	[MSLITE] Fix bug of tf model parser.	4 years ago
i-robot	c1a90b0870	!21844 [EmbeddingLookup] Add a check for the case of scalar indices Merge pull request !21844 from Xiaoda/83-add-a-check-for-embeddinglookup-scalar-indices-r.14	4 years ago
zhousiyi	b798586988	keep tuple_getitem as much as possible to reduce the number of func graphs shrink output of func_graph other than set unused to dead node	4 years ago
chenhaozhe	209b2f1e04	fix comments for LossBase.get_loss	4 years ago
Xiaoda Zhang	9a27864e42	add a check for scalar indices for embeddinglookup	4 years ago
i-robot	44268e9ad8	!21810 [MSLITE] Modify output tensor names of ms model. Merge pull request !21810 from wangshaocong/convert_name_r1.4	4 years ago
wang_shaocong	cdf9989211	[MSLITE] Modify output tensor names of ms model	4 years ago
i-robot	b181342f9d	!21666 fix yolov3_darknet53 310 infer fail bug in r1.4 Merge pull request !21666 from zhanghuiyao/r1.4	4 years ago
i-robot	0f298d22f9	!21711 update Matmul docs for its wrong format Merge pull request !21711 from dinglinhe/code_docs_dlh_r1.4_I440D3_1	4 years ago
dinglinhe	6a4743441e	update Matmul docs for its wrong format	4 years ago
i-robot	eb23047515	!21692 nnie branch adapts r1.4 Merge pull request !21692 from jianghui58/r1.4_fix	4 years ago
i-robot	eb1472d129	!21510 update md5 for json Merge pull request !21510 from liubuyu/r1.4	4 years ago
jianghui58	6b12f40d3e	nnie branch adapts r1.4	4 years ago
i-robot	6efa88c486	!21652 fix bugs: DestroyHccl must be called before FreeDeviceMemory Merge pull request !21652 from jjfeing/r1.4	4 years ago
zhanghuiyao	80caa2a715	fix yolov3_darknet53 310 eval fail bug in r1.4	4 years ago
jjfeing	8192c0034a	DestroyHccl must be called before FreeDeviceMemory	4 years ago
lby	20fb63b4a5	update md5 for json when download from gitee	4 years ago
zhangzhenghai	fb2b4f0617	!21577 fix event insertion bug Merge pull request !21577 from fix-stream-1.4	4 years ago
i-robot	76616c83ff	!21526 optional init hccl for graph mode and pynative mode Merge pull request !21526 from zhoufeng/optional-init-hccl-r1.4	4 years ago
i-robot	75e8783495	!21568 parallel_train_and_eval_fix Merge pull request !21568 from yao_yf/parallel_train_and_eval_fix	4 years ago
hwjiaorui	dd3191026f	fix stream find target	4 years ago
zhoufeng	68ffbc7c55	optional init hccl for graph mode and pynative mode Signed-off-by: zhoufeng <zhoufeng54@huawei.com>	4 years ago
yao_yf	b39f02ed2d	pangu train and eval	4 years ago
i-robot	7ab5ba3a97	!21384 dynamic shape interface adapter Merge pull request !21384 from liubuyu/r1.4	4 years ago
i-robot	0d1ab208b2	!21466 api docs for doxygen Merge pull request !21466 from zhoufeng/code_docs_api_note_1.4	4 years ago
i-robot	a1d1b691fd	!21425 Add skip decorator and gpu testcase Merge pull request !21425 from gaoyong10/runtime1.4_001	4 years ago
i-robot	736db236c4	!21436 Recover the environment variable MS_DEBUGGER_HOST Merge pull request !21436 from maning202007/r1.4	4 years ago
zhoufeng	e2111f9fdd	api docs Signed-off-by: zhoufeng <zhoufeng54@huawei.com>	4 years ago
lby	2cdf231575	dynamic shape interface adapter	4 years ago
maning202007	4be1d3489f	revert `9cdd3bbf` Recover the environment variable MS_DEBUGGER_HOST Update the version number to 1.4.0 in debugger	4 years ago
gaoyong10	47efd5f9d1	add skip decorator and gpu testcase	4 years ago
i-robot	c1d65a3f76	!21385 fix same node is used by two comm op Merge pull request !21385 from zhoufeng/xiu-ba-ge	4 years ago
i-robot	10f8936f77	!21304 fix parse aicpu file error Merge pull request !21304 from yanghaitao/yht_fix_aicpu_parser_error	4 years ago
i-robot	7383457756	!21205 [MS][LITE][TOD] memory optimization Merge pull request !21205 from yonibaehr/export_yoni	4 years ago
i-robot	c77fbcddc3	!21319 [MSLITE] Expanding dimension of pad from 4 to 6. Merge pull request !21319 from wangshaocong/expand_pad	4 years ago
i-robot	01639b95a1	!21034 Add to TB-Net to model_zoo/official/recommend Merge pull request !21034 from lihaoyang/tbnet	4 years ago
i-robot	d260f38936	!15722 [assistant][AdjustGamma] Merge pull request !15722 from QingfengLi/adjust_gamma	4 years ago
i-robot	a3155d16ed	!21128 Print erro info while file open failed Merge pull request !21128 from zhangzhaoju/master_file_open_error	4 years ago
i-robot	770d2a994e	!20767 add CPU L2Loss op Merge pull request !20767 from 范吉斌/master_l2loss	4 years ago
lihaoyang	9b9dc5ee94	add tbnet for open source in model_zoo/official/recommend	4 years ago
i-robot	1245eed4d1	!21283 VM bug fix and test_cont_case add gpu Merge pull request !21283 from chenfei_mindspore/vm-bug-fix	4 years ago
chenx2ovo	4191dde45f	[feat][assistant][I3CEGM] add op adjust gamma	4 years ago
yoni	208c620cea	memory optimization	4 years ago
fanjibin	9e5618a5b8	add CPU l2loss op	4 years ago
zhangzhaoju	610ee46ebe	Add file open failure info while file open failed	4 years ago
chenfei	709b1e80db	add control test cases and vm bug fix	4 years ago
wang_shaocong	cb560406ea	[MSLITE] expanding dimension of pad from 4 to 6.	4 years ago
yanghaitao1	c1428a8af1	fix aicpu parser error	4 years ago