| @@ -0,0 +1,220 @@ | |||||
| # Contents | |||||
| - [SimCLR Description](#simclr-description) | |||||
| - [Model Architecture](#model-architecture) | |||||
| - [Dataset](#dataset) | |||||
| - [Environment Requirements](#environment-requirements) | |||||
| - [Quick Start](#quick-start) | |||||
| - [Script Description](#script-description) | |||||
| - [Script and Sample Code](#script-and-sample-code) | |||||
| - [Script Parameters](#script-parameters) | |||||
| - [Training Process](#training-process) | |||||
| - [Training](#training) | |||||
| - [Evaluation Process](#evaluation-process) | |||||
| - [Evaluation](#evaluation) | |||||
| - [Model Description](#model-description) | |||||
| - [Performance](#performance) | |||||
| - [Evaluation Performance](#evaluation-performance) | |||||
| - [ModelZoo Homepage](#modelzoo-homepage) | |||||
| ## [SimCLR Description](#contents) | |||||
| SimCLR: a simple framework for contrastive learning of visual representations. | |||||
| [Paper](https://arxiv.org/pdf/2002.05709.pdf): Ting Chen and Simon Kornblith and Mohammad Norouzi and Geoffrey Hinton. A Simple Framework for Contrastive Learning of Visual Representations. *arXiv preprint arXiv:2002.05709*. 2020. | |||||
| ## [Model Architecture](#contents) | |||||
| SimCLR learns representations by maximizing agreement between differently augmented views of the same data example via a contrastive loss in the latent space. This framework comprises the following four major components: a stochastic data augmentation module, a neural network base encoder, a small neural network projection head and a contrastive loss function. | |||||
| ## [Dataset](#contents) | |||||
| In the following sections, we will introduce how to run the scripts using the related dataset below. | |||||
| Dataset used: [CIFAR-10](<http://www.cs.toronto.edu/~kriz/cifar.html>) | |||||
| - Dataset size:175M,60,000 32*32 colorful images in 10 classes | |||||
| - Train:146M,50,000 images | |||||
| - Test:29.3M,10,000 images | |||||
| - Data format:binary files | |||||
| - Note:Data will be processed in dataset.py | |||||
| - Download the dataset, the directory structure is as follows: | |||||
| ```bash | |||||
| ├─cifar-10-batches-bin | |||||
| │ | |||||
| └─cifar-10-verify-bin | |||||
| ``` | |||||
| ## [Environment Requirements](#contents) | |||||
| - Hardware(Ascend) | |||||
| - Prepare hardware environment with Ascend processor. | |||||
| - Framework | |||||
| - [MindSpore](https://www.mindspore.cn/install/en) | |||||
| - For more information, please check the resources below: | |||||
| - [MindSpore Tutorials](https://www.mindspore.cn/tutorial/training/en/master/index.html) | |||||
| - [MindSpore Python API](https://www.mindspore.cn/doc/api_python/en/master/index.html) | |||||
| ## [Quick Start](#contents) | |||||
| After installing MindSpore via the official website, you can start training and evaluation as follows: | |||||
| ```python | |||||
| # enter script dir, train SimCLR | |||||
| sh run_standalone_train_ascend.sh [cifar10] [TRAIN_DATASET_PATH] [DEVICE_ID] | |||||
| or | |||||
| sh run_distribution_ascend.sh [DEVICENUM] [RANK_TABLE_FILE] [cifar10] [TRAIN_DATASET_PATH] | |||||
| # enter script dir, evaluate SimCLR | |||||
| sh run_standalone_eval_ascend.sh [cifar10] [DEVICE_ID] [SIMCLR_MODEL_PATH] [TRAIN_DATASET_PATH] [EVAL_DATASET_PATH] | |||||
| ``` | |||||
| ## [Script Description](#contents) | |||||
| ### [Script and Sample Code](#contents) | |||||
| ```bash | |||||
| ├── cv | |||||
| ├── SimCLR | |||||
| ├── README.md // descriptions about SimCLR | |||||
| ├── requirements.txt // package needed | |||||
| ├── scripts | |||||
| │ ├──run_distribution_train_ascend.sh // train in ascend | |||||
| │ ├──run_standalone_train_ascend.sh // train in ascend | |||||
| │ ├──run_standalone_eval_ascend.sh // evaluate in ascend | |||||
| ├── src | |||||
| │ ├──dataset.py // creating dataset | |||||
| │ ├──lr_generator.py // generating learning rate | |||||
| │ ├──nt_xent.py // contrastive cross entropy loss | |||||
| │ ├──optimizer.py // generating optimizer | |||||
| │ ├──resnet.py // base encoder network | |||||
| │ ├──simclr_model.py // simclr architecture | |||||
| ├── train.py // training script | |||||
| ├── linear_eval.py // linear evaluation script | |||||
| ├── export.py // export model for inference | |||||
| ``` | |||||
| ### [Script Parameters](#contents) | |||||
| ```python | |||||
| Major parameters in train.py as follows: | |||||
| --device_target: Device target, Currently only Ascend is supported. | |||||
| --run_cloudbrain: Whether it is running on CloudBrain platform. | |||||
| --run_distribute: Run distributed training. | |||||
| --device_num: Device num. | |||||
| --device_id: Device id, default is 0. | |||||
| --dataset_name: Dataset, Currently only cifar10 is supported. | |||||
| --train_url: Cloudbrain Location of training outputs.This parameter needs to be set when running on the cloud brain platform. | |||||
| --data_url: Cloudbrain Location of data. This parameter needs to be set when running on the cloud brain platform. | |||||
| --train_dataset_path: Dataset path for training classifier. This parameter needs to be set when running on the host. | |||||
| --train_output_path: Location of ckpt and log. This parameter needs to be set when running on the host. | |||||
| --batch_size: Batch size, default is 128. | |||||
| --epoch_size: Epoch size for training, default is 100. | |||||
| --projection_dimension: Projection output dimensionality, default is 128. | |||||
| --width_multiplier: Width multiplier for ResNet50, default is 1. | |||||
| --temperature: Temperature for contrastive cross entropy loss. | |||||
| --pre_trained_path: Pretrained checkpoint path. | |||||
| --pretrain_epoch_size: real_epoch_size = epoch_size - pretrain_epoch_size. | |||||
| save_checkpoint_epochs: Save checkpoint epochs, default is 1. | |||||
| --save_graphs: Whether save graphs, default is False. | |||||
| --optimizer: Optimizer, Currently only Adam is supported. | |||||
| --weight_decay: Weight decay. | |||||
| --warmup_epochs: Warmup epochs. | |||||
| Major parameters in linear_eval.py as follows: | |||||
| --device_target: Device target, Currently only Ascend is supported. | |||||
| --run_cloudbrain: Whether it is running on CloudBrain platform. | |||||
| --run_distribute: Run distributed training. | |||||
| --device_num: Device num. | |||||
| --device_id: Device id, default is 0. | |||||
| --dataset_name: Dataset, Currently only cifar10 is supported. | |||||
| --train_url: Cloudbrain Location of training outputs.This parameter needs to be set when running on the cloud brain platform. | |||||
| --data_url: Cloudbrain Location of data. This parameter needs to be set when running on the cloud brain platform. | |||||
| --train_dataset_path: Dataset path for training classifier. This parameter needs to be set when running on the host. | |||||
| --eval_dataset_path: Dataset path for evaluating classifier.This parameter needs to be set when running on the host. | |||||
| --train_output_path: Location of ckpt and log. This parameter needs to be set when running on the host. | |||||
| --class_num: dataset classification number, default is 10 for cifar10. | |||||
| --batch_size: Batch size, default is 128. | |||||
| --epoch_size: Epoch size for training, default is 100. | |||||
| --projection_dimension: Projection output dimensionality, default is 128. | |||||
| --width_multiplier: Width multiplier for ResNet50, default is 1. | |||||
| --pre_classifier_checkpoint_path: Classifier Checkpoint file path. | |||||
| --encoder_checkpoint_path: Encoder Checkpoint file path. | |||||
| --save_checkpoint_epochs: Save checkpoint epochs, default is 10. | |||||
| --print_iter: Log print iter, default is 100. | |||||
| --save_graphs: whether save graphs, default is False. | |||||
| ``` | |||||
| ### [Training Process](#contents) | |||||
| #### Training | |||||
| - running on Ascend | |||||
| ```bash | |||||
| sh run_distribution_ascend.sh [DEVICENUM] [RANK_TABLE_FILE] [cifar10] [TRAIN_DATASET_PATH] | |||||
| ``` | |||||
| After training, the loss value will be achieved as follows: | |||||
| ```bash | |||||
| # grep "loss is " log | |||||
| epoch: 1 step: 48, loss is 9.5758915 | |||||
| epoch time: 253236.075 ms, per step time: 5275.752 ms | |||||
| epoch: 1 step: 48, loss is 9.363186 | |||||
| epoch time: 253739.376 ms, per step time: 5286.237 ms | |||||
| epoch: 1 step: 48, loss is 9.36029 | |||||
| epoch time: 253711.625 ms, per step time: 5285.659 ms | |||||
| ... | |||||
| epoch: 100 step: 48, loss is 7.453776 | |||||
| epoch time: 12341.851 ms, per step time: 257.122 ms | |||||
| epoch: 100 step: 48, loss is 7.499168 | |||||
| epoch time: 12420.060 ms, per step time: 258.751 ms | |||||
| epoch: 100 step: 48, loss is 7.442362 | |||||
| epoch time: 12725.863 ms, per step time: 265.122 ms | |||||
| ... | |||||
| ``` | |||||
| The model checkpoint will be saved in the outputs directory. | |||||
| ### [Evaluation Process](#contents) | |||||
| #### Evaluation | |||||
| Before running the command below, please check the checkpoint path used for evaluation. | |||||
| - running on Ascend | |||||
| ```bash | |||||
| sh run_standalone_eval_ascend.sh [cifar10] [DEVICE_ID] [SIMCLR_MODEL_PATH] [TRAIN_DATASET_PATH] [EVAL_DATASET_PATH] | |||||
| ``` | |||||
| You can view the results through the file "eval_log". The accuracy of the test dataset will be as follows: | |||||
| ```bash | |||||
| # grep "Average accuracy: " eval_log | |||||
| 'Accuracy': 0.84505 | |||||
| ``` | |||||
| ## [Model Description](#contents) | |||||
| ### [Performance](#contents) | |||||
| #### Evaluation Performance | |||||
| | Parameters | Ascend | | |||||
| | -------------------------- | ------------------------------------------------------------| | |||||
| | Resource | Ascend 910; CPU 2.60GHz, 192cores; Memory, 755G | | |||||
| | uploaded Date | 30/03/2021 (month/day/year) | | |||||
| | MindSpore Version | 1.1.1 | | |||||
| | Dataset | CIFAR-10 | | |||||
| | Training Parameters | epoch=100, batch_size=128, device_num=8 | | |||||
| | Optimizer | Adam | | |||||
| | Loss Function | NT-Xent Loss | | |||||
| | linear eval | 84.505% | | |||||
| | Total time | 25m04s | | |||||
| | Scripts | [SimCLR Script](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/cv/simclr) | [SimCLR Script](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/cv/simclr) | | |||||
| ## [Description of Random Situation](#contents) | |||||
| We set the seed inside dataset.py. We also use random seed in train.py. | |||||
| ## [ModelZoo Homepage](#contents) | |||||
| Please check the official [homepage](https://gitee.com/mindspore/mindspore/tree/master/model_zoo). | |||||
| @@ -0,0 +1,62 @@ | |||||
| # Copyright 2021 Huawei Technologies Co., Ltd | |||||
| # | |||||
| # Licensed under the Apache License, Version 2.0 (the "License"); | |||||
| # you may not use this file except in compliance with the License. | |||||
| # You may obtain a copy of the License at | |||||
| # | |||||
| # http://www.apache.org/licenses/LICENSE-2.0 | |||||
| # | |||||
| # Unless required by applicable law or agreed to in writing, software | |||||
| # distributed under the License is distributed on an "AS IS" BASIS, | |||||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | |||||
| # See the License for the specific language governing permissions and | |||||
| # limitations under the License. | |||||
| # ============================================================================ | |||||
| """ | |||||
| ##############export checkpoint file into air, onnx, mindir models################# | |||||
| python export.py | |||||
| """ | |||||
| import argparse | |||||
| import numpy as np | |||||
| import mindspore as ms | |||||
| from mindspore import context, Tensor, load_checkpoint, load_param_into_net, export | |||||
| from src.simclr_model import SimCLR | |||||
| from src.resnet import resnet50 as resnet | |||||
| parser = argparse.ArgumentParser(description='SimCLR') | |||||
| parser.add_argument("--device_id", type=int, default=0, help="Device id") | |||||
| parser.add_argument("--batch_size", type=int, default=128, help="batch size") | |||||
| parser.add_argument('--dataset_name', type=str, default='cifar10', choices=['cifar10'], | |||||
| help='Dataset, Currently only cifar10 is supported.') | |||||
| parser.add_argument('--device_target', type=str, default="Ascend", | |||||
| choices=['Ascend'], | |||||
| help='Device target, Currently only Ascend is supported.') | |||||
| parser.add_argument("--ckpt_file", type=str, required=True, help="Checkpoint file path.") | |||||
| parser.add_argument("--file_name", type=str, default="simclr", help="output file name.") | |||||
| parser.add_argument("--file_format", type=str, choices=["AIR", "ONNX", "MINDIR"], default="AIR", help="file format") | |||||
| args_opt = parser.parse_args() | |||||
| context.set_context(mode=context.GRAPH_MODE, device_target=args_opt.device_target) | |||||
| if args_opt.device_target == "Ascend": | |||||
| context.set_context(device_id=args_opt.device_id) | |||||
| if __name__ == '__main__': | |||||
| if args_opt.dataset_name == 'cifar10': | |||||
| width_multiplier = 1 | |||||
| cifar_stem = True | |||||
| projection_dimension = 128 | |||||
| image_height = 32 | |||||
| image_width = 32 | |||||
| else: | |||||
| raise ValueError("dataset is not support.") | |||||
| base_net = resnet(1, width_multiplier=width_multiplier, cifar_stem=cifar_stem) | |||||
| net = SimCLR(base_net, projection_dimension, base_net.end_point.in_channels) | |||||
| param_dict = load_checkpoint(args_opt.ckpt_file) | |||||
| load_param_into_net(net, param_dict) | |||||
| input_arr = Tensor(np.zeros([args_opt.batch_size, 3, image_height, image_width]), ms.float32) | |||||
| export(net, input_arr, file_name=args_opt.file_name, file_format=args_opt.file_format) | |||||
| @@ -0,0 +1,215 @@ | |||||
| # Copyright 2021 Huawei Technologies Co., Ltd | |||||
| # | |||||
| # Licensed under the Apache License, Version 2.0 (the "License"); | |||||
| # you may not use this file except in compliance with the License. | |||||
| # You may obtain a copy of the License at | |||||
| # | |||||
| # http://www.apache.org/licenses/LICENSE-2.0 | |||||
| # | |||||
| # Unless required by applicable law or agreed to in writing, software | |||||
| # distributed under the License is distributed on an "AS IS" BASIS, | |||||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | |||||
| # See the License for the specific language governing permissions and | |||||
| # limitations under the License. | |||||
| # ============================================================================ | |||||
| """ | |||||
| ######################## eval SimCLR example ######################## | |||||
| eval SimCLR according to model file: | |||||
| python eval.py --encoder_checkpoint_path Your.ckpt --train_dataset_path /YourDataPath1 | |||||
| --eval_dataset_path /YourDataPath2 | |||||
| """ | |||||
| import ast | |||||
| import os | |||||
| import argparse | |||||
| import numpy as np | |||||
| import mindspore.common.dtype as mstype | |||||
| from mindspore import nn | |||||
| from mindspore import ops | |||||
| from mindspore import context | |||||
| from mindspore.common.initializer import TruncatedNormal | |||||
| from mindspore.train.serialization import load_checkpoint, load_param_into_net | |||||
| from mindspore.common import set_seed | |||||
| from mindspore.context import ParallelMode | |||||
| from mindspore.communication.management import init, get_rank | |||||
| from src.dataset import create_dataset | |||||
| from src.simclr_model import SimCLR | |||||
| from src.resnet import resnet50 as resnet | |||||
| from src.reporter import Reporter | |||||
| from src.optimizer import get_eval_optimizer as get_optimizer | |||||
| parser = argparse.ArgumentParser(description='Linear Evaluation Protocol') | |||||
| parser.add_argument('--device_target', type=str, default='Ascend', | |||||
| help='Device target, Currently only Ascend is supported.') | |||||
| parser.add_argument('--run_distribute', type=ast.literal_eval, default=False, help='Running distributed evaluation.') | |||||
| parser.add_argument('--run_cloudbrain', type=ast.literal_eval, default=True, | |||||
| help='Whether it is running on CloudBrain platform.') | |||||
| parser.add_argument('--device_num', type=int, default=1, help='Device num.') | |||||
| parser.add_argument('--device_id', type=int, default=0, help='device id, default is 0.') | |||||
| parser.add_argument('--dataset_name', type=str, default='cifar10', help='Dataset, Currently only cifar10 is supported.') | |||||
| parser.add_argument('--train_url', default=None, help='Cloudbrain Location of training outputs.\ | |||||
| This parameter needs to be set when running on the cloud brain platform.') | |||||
| parser.add_argument('--data_url', default=None, help='Cloudbrain Location of data.\ | |||||
| This parameter needs to be set when running on the cloud brain platform.') | |||||
| parser.add_argument('--train_dataset_path', type=str, default='./cifar/train',\ | |||||
| help='Dataset path for training classifier.\ | |||||
| This parameter needs to be set when running on the host.') | |||||
| parser.add_argument('--eval_dataset_path', type=str, default='./cifar/eval',\ | |||||
| help='Dataset path for evaluating classifier.\ | |||||
| This parameter needs to be set when running on the host.') | |||||
| parser.add_argument('--train_output_path', type=str, default='./outputs', help='Location of ckpt and log.\ | |||||
| This parameter needs to be set when running on the host.') | |||||
| parser.add_argument('--class_num', type=int, default=10, help='dataset classification number, default is 10.') | |||||
| parser.add_argument('--batch_size', type=int, default=128, help='batch_size for training classifier, default is 128.') | |||||
| parser.add_argument('--epoch_size', type=int, default=100, help='epoch size for training classifier, default is 100.') | |||||
| parser.add_argument('--projection_dimension', type=int, default=128, | |||||
| help='Projection output dimensionality, default is 128.') | |||||
| parser.add_argument('--width_multiplier', type=int, default=1, help='width_multiplier=4,resnet50x4') | |||||
| parser.add_argument('--pre_classifier_checkpoint_path', type=str, default=None, help='Classifier Checkpoint file path.') | |||||
| parser.add_argument('--encoder_checkpoint_path', type=str, help='Encoder Checkpoint file path.') | |||||
| parser.add_argument('--save_checkpoint_epochs', type=int, default=10, help='Save checkpoint epochs, default is 10.') | |||||
| parser.add_argument('--print_iter', type=int, default=100, help='log print iter, default is 100.') | |||||
| parser.add_argument('--save_graphs', type=ast.literal_eval, default=False, | |||||
| help='whether save graphs, default is False.') | |||||
| parser.add_argument('--use_norm', type=ast.literal_eval, default=False, help='Dataset normalize.') | |||||
| args = parser.parse_args() | |||||
| set_seed(1) | |||||
| local_data_url = './cache/data' | |||||
| local_train_url = './cache/train' | |||||
| _local_train_url = local_train_url | |||||
| if args.device_target != "Ascend": | |||||
| raise ValueError("Unsupported device target.") | |||||
| if args.run_distribute: | |||||
| device_id = os.getenv("DEVICE_ID", default=None) | |||||
| if device_id is None: | |||||
| raise ValueError("Unsupported device id.") | |||||
| args.device_id = int(device_id) | |||||
| rank_size = os.getenv("RANK_SIZE", default=None) | |||||
| if rank_size is None: | |||||
| raise ValueError("Unsupported rank size.") | |||||
| if args.device_num > int(rank_size) or args.device_num == 1: | |||||
| args.device_num = int(rank_size) | |||||
| context.set_context(device_id=args.device_id) | |||||
| context.set_context(mode=context.GRAPH_MODE, device_target=args.device_target, save_graphs=args.save_graphs) | |||||
| context.reset_auto_parallel_context() | |||||
| context.set_auto_parallel_context(parallel_mode=ParallelMode.DATA_PARALLEL, | |||||
| gradients_mean=True, device_num=args.device_num) | |||||
| init() | |||||
| args.rank = get_rank() | |||||
| local_data_url = os.path.join(local_data_url, str(args.device_id)) | |||||
| local_train_url = os.path.join(local_train_url, str(args.device_id)) | |||||
| args.train_output_path = os.path.join(args.train_output_path, str(args.device_id)) | |||||
| else: | |||||
| context.set_context(mode=context.GRAPH_MODE, device_target=args.device_target, | |||||
| save_graphs=args.save_graphs, device_id=args.device_id) | |||||
| args.rank = 0 | |||||
| args.device_num = 1 | |||||
| if args.run_cloudbrain: | |||||
| import moxing as mox | |||||
| args.train_dataset_path = os.path.join(local_data_url, "train") | |||||
| args.eval_dataset_path = os.path.join(local_data_url, "val") | |||||
| args.train_output_path = local_train_url | |||||
| mox.file.copy_parallel(src_url=args.data_url, dst_url=local_data_url) | |||||
| class LogisticRegression(nn.Cell): | |||||
| """ | |||||
| Logistic regression | |||||
| """ | |||||
| def __init__(self, n_features, n_classes): | |||||
| super(LogisticRegression, self).__init__() | |||||
| self.model = nn.Dense(n_features, n_classes, TruncatedNormal(0.02), TruncatedNormal(0.02)) | |||||
| def construct(self, x): | |||||
| x = self.model(x) | |||||
| return x | |||||
| class Linear_Eval(): | |||||
| """ | |||||
| Linear classifier | |||||
| """ | |||||
| def __init__(self, net, loss): | |||||
| super(Linear_Eval, self).__init__() | |||||
| self.net = net | |||||
| self.softmax = nn.Softmax() | |||||
| self.loss = loss | |||||
| def __call__(self, x, y): | |||||
| x = self.net(x) | |||||
| loss = self.loss(x, y) | |||||
| x = self.softmax(x) | |||||
| predicts = ops.Argmax(output_type=mstype.int32)(x) | |||||
| acc = np.sum(predicts.asnumpy() == y.asnumpy())/len(y.asnumpy()) | |||||
| return loss.asnumpy(), acc | |||||
| class Linear_Train(nn.Cell): | |||||
| """ | |||||
| Train linear classifier | |||||
| """ | |||||
| def __init__(self, net, loss, opt): | |||||
| super(Linear_Train, self).__init__() | |||||
| self.netwithloss = nn.WithLossCell(net, loss) | |||||
| self.train_net = nn.TrainOneStepCell(self.netwithloss, opt) | |||||
| self.train_net.set_train() | |||||
| def construct(self, x, y): | |||||
| return self.train_net(x, y) | |||||
| if __name__ == "__main__": | |||||
| base_net = resnet(1, args.width_multiplier, cifar_stem=args.dataset_name == "cifar10") | |||||
| simclr_model = SimCLR(base_net, args.projection_dimension, base_net.end_point.in_channels) | |||||
| if args.run_cloudbrain: | |||||
| mox.file.copy_parallel(src_url=args.encoder_checkpoint_path, dst_url=local_data_url+'/encoder.ckpt') | |||||
| simclr_param = load_checkpoint(local_data_url+'/encoder.ckpt') | |||||
| else: | |||||
| simclr_param = load_checkpoint(args.encoder_checkpoint_path) | |||||
| load_param_into_net(simclr_model.encoder, simclr_param) | |||||
| classifier = LogisticRegression(simclr_model.n_features, args.class_num) | |||||
| dataset = create_dataset(args, dataset_mode="train_classifier") | |||||
| optimizer = get_optimizer(classifier, dataset.get_dataset_size(), args) | |||||
| criterion = nn.SoftmaxCrossEntropyWithLogits(sparse=True, reduction='mean') | |||||
| net_Train = Linear_Train(net=classifier, loss=criterion, opt=optimizer) | |||||
| reporter = Reporter(args, linear_eval=True) | |||||
| reporter.dataset_size = dataset.get_dataset_size() | |||||
| reporter.linear_eval = True | |||||
| if args.pre_classifier_checkpoint_path: | |||||
| if args.run_cloudbrain: | |||||
| mox.file.copy_parallel(src_url=args.pre_classifier_checkpoint_path, | |||||
| dst_url=local_data_url+'/pre_classifier.ckpt') | |||||
| classifier_param = load_checkpoint(local_data_url+'/pre_classifier.ckpt') | |||||
| else: | |||||
| classifier_param = load_checkpoint(args.pre_classifier_checkpoint_path) | |||||
| load_param_into_net(classifier, classifier_param) | |||||
| else: | |||||
| dataset_train = [] | |||||
| for _, data in enumerate(dataset, start=1): | |||||
| _, images, labels = data | |||||
| features = simclr_model.inference(images) | |||||
| dataset_train.append([features, labels]) | |||||
| reporter.info('==========start training linear classifier===============') | |||||
| # Train. | |||||
| for _ in range(args.epoch_size): | |||||
| reporter.epoch_start() | |||||
| for idx, data in enumerate(dataset_train, start=1): | |||||
| features, labels = data | |||||
| out = net_Train(features, labels) | |||||
| reporter.step_end(out) | |||||
| reporter.epoch_end(classifier) | |||||
| reporter.info('==========end training linear classifier===============') | |||||
| dataset = create_dataset(args, dataset_mode="eval_classifier") | |||||
| reporter.dataset_size = dataset.get_dataset_size() | |||||
| net_Eval = Linear_Eval(net=classifier, loss=criterion) | |||||
| # Eval. | |||||
| reporter.info('==========start evaluating linear classifier===============') | |||||
| reporter.start_predict() | |||||
| for idx, data in enumerate(dataset, start=1): | |||||
| _, images, labels = data | |||||
| features = simclr_model.inference(images) | |||||
| batch_loss, batch_acc = net_Eval(features, labels) | |||||
| reporter.predict_step_end(batch_loss, batch_acc) | |||||
| reporter.end_predict() | |||||
| reporter.info('==========end evaluating linear classifier===============') | |||||
| if args.run_cloudbrain: | |||||
| mox.file.copy_parallel(src_url=_local_train_url, dst_url=args.train_url) | |||||
| @@ -0,0 +1,64 @@ | |||||
| #!/bin/bash | |||||
| # Copyright 2021 Huawei Technologies Co., Ltd | |||||
| # | |||||
| # Licensed under the Apache License, Version 2.0 (the "License"); | |||||
| # you may not use this file except in compliance with the License. | |||||
| # You may obtain a copy of the License at | |||||
| # | |||||
| # http://www.apache.org/licenses/LICENSE-2.0 | |||||
| # | |||||
| # Unless required by applicable law or agreed to in writing, software | |||||
| # distributed under the License is distributed on an "AS IS" BASIS, | |||||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | |||||
| # See the License for the specific language governing permissions and | |||||
| # limitations under the License. | |||||
| # ============================================================================ | |||||
| # an simple tutorial as follows, more parameters can be setting | |||||
| if [ $# != 4 ] | |||||
| then | |||||
| echo "Usage: sh run_distribution_ascend.sh [DEVICENUM] [RANK_TABLE_FILE] [cifar10] [TRAIN_DATASET_PATH]" | |||||
| exit 1 | |||||
| fi | |||||
| # | |||||
| get_real_path(){ | |||||
| if [ "${1:0:1}" == "/" ]; then | |||||
| echo "$1" | |||||
| else | |||||
| echo "$(realpath -m $PWD/$1)" | |||||
| fi | |||||
| } | |||||
| # | |||||
| if [ ! -f $2 ] | |||||
| then | |||||
| echo "error: RANK_TABLE_FILE=$2 is not a file" | |||||
| exit 1 | |||||
| fi | |||||
| ulimit -u unlimited | |||||
| export DEVICE_NUM=$1 | |||||
| export RANK_SIZE=$1 | |||||
| RANK_TABLE_FILE=$(get_real_path $2) | |||||
| export RANK_TABLE_FILE | |||||
| export DATASET_NAME=$3 | |||||
| export TRAIN_DATASET_PATH=$(get_real_path $4) | |||||
| echo "RANK_TABLE_FILE=${RANK_TABLE_FILE}" | |||||
| export SERVER_ID=0 | |||||
| rank_start=$((DEVICE_NUM * SERVER_ID)) | |||||
| for((i=0; i<${DEVICE_NUM}; i++)) | |||||
| do | |||||
| export DEVICE_ID=$i | |||||
| export RANK_ID=$((rank_start + i)) | |||||
| rm -rf ./train_parallel$i | |||||
| mkdir ./train_parallel$i | |||||
| cp -r ../src ./train_parallel$i | |||||
| cp ../train.py ./train_parallel$i | |||||
| echo "start training for rank $RANK_ID, device $DEVICE_ID" | |||||
| cd ./train_parallel$i ||exit | |||||
| env > env.log | |||||
| python train.py --device_id=$i --dataset_name=$DATASET_NAME --train_dataset_path=$TRAIN_DATASET_PATH \ | |||||
| --run_cloudbrain=False --run_distribute=True > log 2>&1 & | |||||
| cd .. | |||||
| done | |||||
| @@ -0,0 +1,37 @@ | |||||
| #!/bin/bash | |||||
| # Copyright 2021 Huawei Technologies Co., Ltd | |||||
| # | |||||
| # Licensed under the Apache License, Version 2.0 (the "License"); | |||||
| # you may not use this file except in compliance with the License. | |||||
| # You may obtain a copy of the License at | |||||
| # | |||||
| # http://www.apache.org/licenses/LICENSE-2.0 | |||||
| # | |||||
| # Unless required by applicable law or agreed to in writing, software | |||||
| # distributed under the License is distributed on an "AS IS" BASIS, | |||||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | |||||
| # See the License for the specific language governing permissions and | |||||
| # limitations under the License. | |||||
| # ============================================================================ | |||||
| # an simple tutorial as follows, more parameters can be setting | |||||
| if [ $# != 5 ] | |||||
| then | |||||
| echo "Usage: sh run_standalone_eval_ascend.sh [cifar10] [DEVICE_ID] [SIMCLR_MODEL_PATH] [TRAIN_DATASET_PATH] [EVAL_DATASET_PATH]" | |||||
| exit 1 | |||||
| fi | |||||
| script_self=$(readlink -f "$0") | |||||
| self_path=$(dirname "${script_self}") | |||||
| export DATASET_NAME=$1 | |||||
| export DEVICE_ID=$2 | |||||
| export SIMCLR_MODEL_PATH=$3 | |||||
| export TRAIN_DATASET_PATH=$4 | |||||
| export EVAL_DATASET_PATH=$5 | |||||
| python ${self_path}/../linear_eval.py --dataset_name=$DATASET_NAME \ | |||||
| --encoder_checkpoint_path=$SIMCLR_MODEL_PATH \ | |||||
| --train_dataset_path=$TRAIN_DATASET_PATH \ | |||||
| --eval_dataset_path=$EVAL_DATASET_PATH \ | |||||
| --device_id=$DEVICE_ID --device_target="Ascend" \ | |||||
| --run_distribute=False --run_cloudbrain=False > eval_log 2>&1 & | |||||
| @@ -0,0 +1,31 @@ | |||||
| #!/bin/bash | |||||
| # Copyright 2021 Huawei Technologies Co., Ltd | |||||
| # | |||||
| # Licensed under the Apache License, Version 2.0 (the "License"); | |||||
| # you may not use this file except in compliance with the License. | |||||
| # You may obtain a copy of the License at | |||||
| # | |||||
| # http://www.apache.org/licenses/LICENSE-2.0 | |||||
| # | |||||
| # Unless required by applicable law or agreed to in writing, software | |||||
| # distributed under the License is distributed on an "AS IS" BASIS, | |||||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | |||||
| # See the License for the specific language governing permissions and | |||||
| # limitations under the License. | |||||
| # ============================================================================ | |||||
| # an simple tutorial as follows, more parameters can be setting | |||||
| if [ $# != 3 ] | |||||
| then | |||||
| echo "Usage: sh run_standalone_train_ascend.sh [cifar10] [TRAIN_DATASET_PATH] [DEVICE_ID]" | |||||
| exit 1 | |||||
| fi | |||||
| script_self=$(readlink -f "$0") | |||||
| self_path=$(dirname "${script_self}") | |||||
| export DATASET_NAME=$1 | |||||
| export TRAIN_DATASET_PATH=$2 | |||||
| export DEVICE_ID=$3 | |||||
| python ${self_path}/../train.py --dataset_name=$DATASET_NAME --train_dataset_path=$TRAIN_DATASET_PATH \ | |||||
| --device_id=$DEVICE_ID --device_target="Ascend" \ | |||||
| --run_cloudbrain=False --run_distribute=False > log 2>&1 & | |||||
| @@ -0,0 +1,94 @@ | |||||
| # Copyright 2021 Huawei Technologies Co., Ltd | |||||
| # | |||||
| # Licensed under the Apache License, Version 2.0 (the "License"); | |||||
| # you may not use this file except in compliance with the License. | |||||
| # You may obtain a copy of the License at | |||||
| # | |||||
| # http://www.apache.org/licenses/LICENSE-2.0 | |||||
| # | |||||
| # Unless required by applicable law or agreed to in writing, software | |||||
| # distributed under the License is distributed on an "AS IS" BASIS, | |||||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | |||||
| # See the License for the specific language governing permissions and | |||||
| # limitations under the License. | |||||
| # ============================================================================ | |||||
| """ | |||||
| create train or eval dataset. | |||||
| """ | |||||
| import mindspore.common.dtype as mstype | |||||
| import mindspore.dataset as ds | |||||
| import mindspore.dataset.vision.c_transforms as C | |||||
| import mindspore.dataset.transforms.c_transforms as C2 | |||||
| import mindspore.dataset.vision.py_transforms as py_vision | |||||
| from mindspore.dataset.vision import Inter | |||||
| import cv2 | |||||
| import numpy as np | |||||
| ds.config.set_seed(0) | |||||
| def gaussian_blur(im): | |||||
| sigma = 0 | |||||
| _, w = im.shape[:2] | |||||
| kernel_size = int(w // 10) | |||||
| if kernel_size % 2 == 0: | |||||
| kernel_size -= 1 | |||||
| return np.array(cv2.GaussianBlur(im, (kernel_size, kernel_size), sigma)) | |||||
| def copy_column(x, y): | |||||
| return x, x, y | |||||
| def create_dataset(args, dataset_mode, repeat_num=1): | |||||
| """ | |||||
| create a train or evaluate cifar10 dataset for SimCLR | |||||
| """ | |||||
| if args.dataset_name != "cifar10": | |||||
| raise ValueError("Unsupported dataset.") | |||||
| if dataset_mode in ("train_endcoder", "train_classifier"): | |||||
| dataset_path = args.train_dataset_path | |||||
| else: | |||||
| dataset_path = args.eval_dataset_path | |||||
| if args.run_distribute and args.device_target == "Ascend": | |||||
| data_set = ds.Cifar10Dataset(dataset_path, num_parallel_workers=8, shuffle=True, | |||||
| num_shards=args.device_num, shard_id=args.device_id) | |||||
| else: | |||||
| data_set = ds.Cifar10Dataset(dataset_path, num_parallel_workers=8, shuffle=True) | |||||
| # define map operations | |||||
| trans = [] | |||||
| if dataset_mode == "train_endcoder": | |||||
| if args.use_crop: | |||||
| trans += [C.Resize(256, interpolation=Inter.BICUBIC)] | |||||
| trans += [C.RandomResizedCrop(size=(32, 32), scale=(0.31, 1), | |||||
| interpolation=Inter.BICUBIC, max_attempts=100)] | |||||
| if args.use_flip: | |||||
| trans += [C.RandomHorizontalFlip(prob=0.5)] | |||||
| if args.use_color_jitter: | |||||
| scale = 0.6 | |||||
| color_jitter = C.RandomColorAdjust(0.8 * scale, 0.8 * scale, 0.8 * scale, 0.2 * scale) | |||||
| trans += [C2.RandomApply([color_jitter], prob=0.8)] | |||||
| if args.use_color_gray: | |||||
| trans += [py_vision.ToPIL(), | |||||
| py_vision.RandomGrayscale(prob=0.2), | |||||
| np.array] # need to convert PIL image to a NumPy array to pass it to C++ operation | |||||
| if args.use_blur: | |||||
| trans += [C2.RandomApply([gaussian_blur], prob=0.8)] | |||||
| if args.use_norm: | |||||
| trans += [C.Normalize([0.4914, 0.4822, 0.4465], [0.2023, 0.1994, 0.2010])] | |||||
| trans += [C2.TypeCast(mstype.float32), C.HWC2CHW()] | |||||
| else: | |||||
| trans += [C.Resize(32)] | |||||
| trans += [C2.TypeCast(mstype.float32)] | |||||
| if args.use_norm: | |||||
| trans += [C.Normalize([0.4914, 0.4822, 0.4465], [0.2023, 0.1994, 0.2010])] | |||||
| trans += [C.HWC2CHW()] | |||||
| type_cast_op = C2.TypeCast(mstype.int32) | |||||
| data_set = data_set.map(operations=type_cast_op, input_columns="label", num_parallel_workers=8) | |||||
| data_set = data_set.map(operations=copy_column, input_columns=["image", "label"], | |||||
| output_columns=["image1", "image2", "label"], | |||||
| column_order=["image1", "image2", "label"], num_parallel_workers=8) | |||||
| data_set = data_set.map(operations=trans, input_columns=["image1"], num_parallel_workers=8) | |||||
| data_set = data_set.map(operations=trans, input_columns=["image2"], num_parallel_workers=8) | |||||
| # apply batch operations | |||||
| data_set = data_set.batch(args.batch_size, drop_remainder=True) | |||||
| # apply dataset repeat operation | |||||
| data_set = data_set.repeat(repeat_num) | |||||
| return data_set | |||||
| @@ -0,0 +1,198 @@ | |||||
| # Copyright 2021 Huawei Technologies Co., Ltd | |||||
| # | |||||
| # Licensed under the Apache License, Version 2.0 (the "License"); | |||||
| # you may not use this file except in compliance with the License. | |||||
| # You may obtain a copy of the License at | |||||
| # | |||||
| # http://www.apache.org/licenses/LICENSE-2.0 | |||||
| # | |||||
| # Unless required by applicable law or agreed to in writing, software | |||||
| # distributed under the License is distributed on an "AS IS" BASIS, | |||||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | |||||
| # See the License for the specific language governing permissions and | |||||
| # limitations under the License. | |||||
| # ============================================================================ | |||||
| """learning rate generator""" | |||||
| import math | |||||
| import numpy as np | |||||
| def _generate_steps_lr(lr_init, lr_max, total_steps, warmup_steps): | |||||
| """ | |||||
| Applies three steps decay to generate learning rate array. | |||||
| """ | |||||
| decay_epoch_index = [0.3 * total_steps, 0.6 * total_steps, 0.8 * total_steps] | |||||
| lr_each_step = [] | |||||
| for i in range(total_steps): | |||||
| if i < warmup_steps: | |||||
| lr = lr_init + (lr_max - lr_init) * i / warmup_steps | |||||
| else: | |||||
| if i < decay_epoch_index[0]: | |||||
| lr = lr_max | |||||
| elif i < decay_epoch_index[1]: | |||||
| lr = lr_max * 0.1 | |||||
| elif i < decay_epoch_index[2]: | |||||
| lr = lr_max * 0.01 | |||||
| else: | |||||
| lr = lr_max * 0.001 | |||||
| lr_each_step.append(lr) | |||||
| return lr_each_step | |||||
| def _generate_poly_lr(lr_init, lr_end, lr_max, total_steps, warmup_steps): | |||||
| """ | |||||
| Applies polynomial decay to generate learning rate array. | |||||
| Args: | |||||
| lr_init(float): init learning rate. | |||||
| lr_end(float): end learning rate | |||||
| lr_max(float): max learning rate. | |||||
| total_steps(int): all steps in training. | |||||
| warmup_steps(int): all steps in warmup epochs. | |||||
| Returns: | |||||
| np.array, learning rate array. | |||||
| """ | |||||
| lr_each_step = [] | |||||
| if warmup_steps != 0: | |||||
| inc_each_step = (float(lr_max) - float(lr_init)) / float(warmup_steps) | |||||
| else: | |||||
| inc_each_step = 0 | |||||
| for i in range(total_steps): | |||||
| if i < warmup_steps: | |||||
| lr = float(lr_init) + inc_each_step * float(i) | |||||
| else: | |||||
| base = (1.0 - (float(i) - float(warmup_steps)) / (float(total_steps) - float(warmup_steps))) | |||||
| lr = float(lr_max) * base * base | |||||
| if lr < 0.0: | |||||
| lr = 0.0 | |||||
| lr_each_step.append(lr) | |||||
| return lr_each_step | |||||
| def _generate_cosine_lr(lr_init, lr_end, lr_max, total_steps, warmup_steps): | |||||
| """ | |||||
| Applies cosine decay to generate learning rate array. | |||||
| Args: | |||||
| lr_init(float): init learning rate. | |||||
| lr_end(float): end learning rate | |||||
| lr_max(float): max learning rate. | |||||
| total_steps(int): all steps in training. | |||||
| warmup_steps(int): all steps in warmup epochs. | |||||
| Returns: | |||||
| np.array, learning rate array. | |||||
| """ | |||||
| decay_steps = total_steps - warmup_steps | |||||
| lr_each_step = [] | |||||
| for i in range(total_steps): | |||||
| if i < warmup_steps: | |||||
| lr_inc = (float(lr_max) - float(lr_init)) / float(warmup_steps) | |||||
| lr = float(lr_init) + lr_inc * (i + 1) | |||||
| else: | |||||
| linear_decay = (total_steps - i) / decay_steps | |||||
| cosine_decay = 0.5 * (1 + math.cos(math.pi * 2 * 0.47 * i / decay_steps)) | |||||
| decayed = linear_decay * cosine_decay + 0.00001 | |||||
| lr = lr_max * decayed | |||||
| lr_each_step.append(lr) | |||||
| return lr_each_step | |||||
| def _generate_liner_lr(lr_init, lr_end, lr_max, total_steps, warmup_steps): | |||||
| """ | |||||
| Applies liner decay to generate learning rate array. | |||||
| Args: | |||||
| lr_init(float): init learning rate. | |||||
| lr_end(float): end learning rate | |||||
| lr_max(float): max learning rate. | |||||
| total_steps(int): all steps in training. | |||||
| warmup_steps(int): all steps in warmup epochs. | |||||
| Returns: | |||||
| np.array, learning rate array. | |||||
| """ | |||||
| lr_each_step = [] | |||||
| for i in range(total_steps): | |||||
| if i < warmup_steps: | |||||
| lr = lr_init + (lr_max - lr_init) * i / warmup_steps | |||||
| else: | |||||
| lr = lr_max - (lr_max - lr_end) * (i - warmup_steps) / (total_steps - warmup_steps) | |||||
| lr_each_step.append(lr) | |||||
| return lr_each_step | |||||
| def get_lr(lr_init, lr_end, lr_max, warmup_epochs, total_epochs, steps_per_epoch, lr_decay_mode): | |||||
| """ | |||||
| generate learning rate array | |||||
| Args: | |||||
| lr_init(float): init learning rate | |||||
| lr_end(float): end learning rate | |||||
| lr_max(float): max learning rate | |||||
| warmup_epochs(int): number of warmup epochs | |||||
| total_epochs(int): total epoch of training | |||||
| steps_per_epoch(int): steps of one epoch | |||||
| lr_decay_mode(string): learning rate decay mode, including steps, poly, cosine or liner(default) | |||||
| Returns: | |||||
| np.array, learning rate array | |||||
| """ | |||||
| lr_each_step = [] | |||||
| total_steps = steps_per_epoch * total_epochs | |||||
| warmup_steps = steps_per_epoch * warmup_epochs | |||||
| if lr_decay_mode == 'steps': | |||||
| lr_each_step = _generate_steps_lr(lr_init, lr_max, total_steps, warmup_steps) | |||||
| elif lr_decay_mode == 'poly': | |||||
| lr_each_step = _generate_poly_lr(lr_init, lr_end, lr_max, total_steps, warmup_steps) | |||||
| elif lr_decay_mode == 'cosine': | |||||
| lr_each_step = _generate_cosine_lr(lr_init, lr_end, lr_max, total_steps, warmup_steps) | |||||
| else: | |||||
| lr_each_step = _generate_liner_lr(lr_init, lr_end, lr_max, total_steps, warmup_steps) | |||||
| lr_each_step = np.array(lr_each_step).astype(np.float32) | |||||
| return lr_each_step | |||||
| def linear_warmup_lr(current_step, warmup_steps, base_lr, init_lr): | |||||
| lr_inc = (float(base_lr) - float(init_lr)) / float(warmup_steps) | |||||
| lr = float(init_lr) + lr_inc * current_step | |||||
| return lr | |||||
| def warmup_cosine_annealing_lr(lr, steps_per_epoch, warmup_epochs, max_epoch=120, global_step=0): | |||||
| """ | |||||
| generate learning rate array with cosine | |||||
| Args: | |||||
| lr(float): base learning rate | |||||
| steps_per_epoch(int): steps size of one epoch | |||||
| warmup_epochs(int): number of warmup epochs | |||||
| max_epoch(int): total epochs of training | |||||
| global_step(int): the current start index of lr array | |||||
| Returns: | |||||
| np.array, learning rate array | |||||
| """ | |||||
| base_lr = lr | |||||
| warmup_init_lr = 0 | |||||
| total_steps = int(max_epoch * steps_per_epoch) | |||||
| warmup_steps = int(warmup_epochs * steps_per_epoch) | |||||
| decay_steps = total_steps - warmup_steps | |||||
| lr_each_step = [] | |||||
| for i in range(total_steps): | |||||
| if i < warmup_steps: | |||||
| lr = linear_warmup_lr(i + 1, warmup_steps, base_lr, warmup_init_lr) | |||||
| else: | |||||
| linear_decay = (total_steps - i) / decay_steps | |||||
| cosine_decay = 0.5 * (1 + math.cos(math.pi * 2 * 0.47 * i / decay_steps)) | |||||
| decayed = linear_decay * cosine_decay + 0.00001 | |||||
| lr = base_lr * decayed | |||||
| lr_each_step.append(lr) | |||||
| lr_each_step = np.array(lr_each_step).astype(np.float32) | |||||
| learning_rate = lr_each_step[global_step:] | |||||
| return learning_rate | |||||
| @@ -0,0 +1,91 @@ | |||||
| # Copyright 2021 Huawei Technologies Co., Ltd | |||||
| # | |||||
| # Licensed under the Apache License, Version 2.0 (the "License"); | |||||
| # you may not use this file except in compliance with the License. | |||||
| # You may obtain a copy of the License at | |||||
| # | |||||
| # http://www.apache.org/licenses/LICENSE-2.0 | |||||
| # | |||||
| # Unless required by applicable law or agreed to in writing, software | |||||
| # distributed under the License is distributed on an "AS IS" BASIS, | |||||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | |||||
| # See the License for the specific language governing permissions and | |||||
| # limitations under the License. | |||||
| # ============================================================================ | |||||
| """SimCLR Loss class.""" | |||||
| from mindspore import Tensor | |||||
| from mindspore import ops as P | |||||
| from mindspore.common import dtype as mstype | |||||
| import mindspore.nn as nn | |||||
| class CrossEntropyLoss(nn.Cell): | |||||
| """ | |||||
| Cross Entropy Loss. | |||||
| """ | |||||
| def __init__(self, reduction="mean"): | |||||
| super(CrossEntropyLoss, self).__init__() | |||||
| self.cross_entropy = P.SoftmaxCrossEntropyWithLogits() | |||||
| if reduction == "sum": | |||||
| self.reduction = P.ReduceSum() | |||||
| if reduction == "mean": | |||||
| self.reduction = P.ReduceMean() | |||||
| self.one_hot = P.OneHot() | |||||
| self.one = Tensor(1.0, mstype.float32) | |||||
| self.zero = Tensor(0.0, mstype.float32) | |||||
| def construct(self, logits, label): | |||||
| loss = self.cross_entropy(logits, label)[0] | |||||
| loss = self.reduction(loss, (-1,)) | |||||
| return loss | |||||
| class NT_Xent_Loss(nn.Cell): | |||||
| """ | |||||
| Loss for SimCLR. | |||||
| """ | |||||
| def __init__(self, batch_size, temperature=1, world_size=1): | |||||
| super(NT_Xent_Loss, self).__init__() | |||||
| # Parameters. | |||||
| self.LARGE_NUM = 1e9 | |||||
| self.batch_size = batch_size | |||||
| self.temperature = temperature | |||||
| self.world_size = world_size | |||||
| self.N = 2 * self.batch_size * self.world_size | |||||
| # Tail_Loss. | |||||
| self.criterion = CrossEntropyLoss(reduction="mean") | |||||
| self.norm = P.L2Normalize(axis=1) | |||||
| self.one_hot = P.OneHot() | |||||
| self.range = nn.Range(0, self.batch_size) | |||||
| self.one = Tensor(1.0, mstype.float32) | |||||
| self.zero = Tensor(0.0, mstype.float32) | |||||
| self.transpose = P.Transpose() | |||||
| self.matmul = nn.MatMul() | |||||
| # Operations. | |||||
| self.ones = P.Ones() | |||||
| self.zeros = P.Zeros() | |||||
| self.cat1 = P.Concat(axis=1) | |||||
| def construct(self, z_i, z_j): | |||||
| """ | |||||
| Forward. | |||||
| """ | |||||
| hidden1 = self.norm(z_i) | |||||
| hidden2 = self.norm(z_j) | |||||
| hidden1_large = hidden1 | |||||
| hidden2_large = hidden2 | |||||
| ones_mask = self.range() | |||||
| zeros_mask = self.zeros((self.batch_size, self.batch_size), mstype.float32) | |||||
| masks = self.one_hot(ones_mask, self.batch_size, self.one, self.zero) | |||||
| labels = self.cat1((masks, zeros_mask)) | |||||
| logits_aa = self.matmul(hidden1, self.transpose(hidden1_large, (1, 0))) / self.temperature | |||||
| logits_aa = logits_aa - masks * self.LARGE_NUM | |||||
| logits_bb = self.matmul(hidden2, self.transpose(hidden2_large, (1, 0))) / self.temperature | |||||
| logits_bb = logits_bb - masks * self.LARGE_NUM | |||||
| logits_ab = self.matmul(hidden1, self.transpose(hidden2_large, (1, 0))) / self.temperature | |||||
| logits_ba = self.matmul(hidden2, self.transpose(hidden1_large, (1, 0))) / self.temperature | |||||
| loss_a = self.criterion(self.cat1((logits_ab, logits_aa)), labels) | |||||
| loss_b = self.criterion(self.cat1((logits_ba, logits_bb)), labels) | |||||
| loss = loss_a + loss_b | |||||
| return loss | |||||
| @@ -0,0 +1,52 @@ | |||||
| # Copyright 2021 Huawei Technologies Co., Ltd | |||||
| # | |||||
| # Licensed under the Apache License, Version 2.0 (the "License"); | |||||
| # you may not use this file except in compliance with the License. | |||||
| # You may obtain a copy of the License at | |||||
| # | |||||
| # http://www.apache.org/licenses/LICENSE-2.0 | |||||
| # | |||||
| # Unless required by applicable law or agreed to in writing, software | |||||
| # distributed under the License is distributed on an "AS IS" BASIS, | |||||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | |||||
| # See the License for the specific language governing permissions and | |||||
| # limitations under the License. | |||||
| # ============================================================================ | |||||
| """optimizer generator""" | |||||
| from mindspore import nn, Tensor | |||||
| from .lr_generator import get_lr | |||||
| def get_train_optimizer(net, steps_per_epoch, args): | |||||
| """ | |||||
| generate optimizer for updating the weights. | |||||
| """ | |||||
| if args.optimizer == "Adam": | |||||
| lr = get_lr(lr_init=1e-4, lr_end=1e-6, lr_max=9e-4, | |||||
| warmup_epochs=args.warmup_epochs, total_epochs=args.epoch_size, | |||||
| steps_per_epoch=steps_per_epoch, | |||||
| lr_decay_mode="linear") | |||||
| lr = Tensor(lr) | |||||
| decayed_params = [] | |||||
| no_decayed_params = [] | |||||
| for param in net.trainable_params(): | |||||
| if 'beta' not in param.name and 'gamma' not in param.name and 'bias' not in param.name: | |||||
| decayed_params.append(param) | |||||
| else: | |||||
| no_decayed_params.append(param) | |||||
| group_params = [{'params': decayed_params, 'weight_decay': args.weight_decay}, | |||||
| {'params': no_decayed_params}, | |||||
| {'order_params': net.trainable_params()}] | |||||
| optimizer = nn.Adam(params=group_params, learning_rate=lr) | |||||
| else: | |||||
| raise ValueError("Unsupported optimizer.") | |||||
| return optimizer | |||||
| def get_eval_optimizer(net, steps_per_epoch, args): | |||||
| lr = get_lr(lr_init=1e-3, lr_end=6e-6, lr_max=1e-2, | |||||
| warmup_epochs=5, total_epochs=args.epoch_size, | |||||
| steps_per_epoch=steps_per_epoch, | |||||
| lr_decay_mode="linear") | |||||
| lr = Tensor(lr) | |||||
| optimizer = nn.Adam(params=net.trainable_params(), learning_rate=lr) | |||||
| return optimizer | |||||
| @@ -0,0 +1,135 @@ | |||||
| # Copyright 2021 Huawei Technologies Co., Ltd | |||||
| # | |||||
| # Licensed under the Apache License, Version 2.0 (the "License"); | |||||
| # you may not use this file except in compliance with the License. | |||||
| # You may obtain a copy of the License at | |||||
| # | |||||
| # http://www.apache.org/licenses/LICENSE-2.0 | |||||
| # | |||||
| # Unless required by applicable law or agreed to in writing, software | |||||
| # distributed under the License is distributed on an "AS IS" BASIS, | |||||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | |||||
| # See the License for the specific language governing permissions and | |||||
| # limitations under the License. | |||||
| # ============================================================================ | |||||
| """Reporter class.""" | |||||
| import logging | |||||
| import os | |||||
| import time | |||||
| from datetime import datetime | |||||
| from mindspore.train.serialization import save_checkpoint | |||||
| class Reporter(logging.Logger): | |||||
| """ | |||||
| This class includes several functions that can save images/checkpoints and print/save logging information. | |||||
| """ | |||||
| def __init__(self, args, linear_eval): | |||||
| super(Reporter, self).__init__("clean") | |||||
| self.log_dir = os.path.join(args.train_output_path, 'log') | |||||
| if not os.path.exists(self.log_dir): | |||||
| os.makedirs(self.log_dir, exist_ok=True) | |||||
| if linear_eval: | |||||
| self.ckpts_dir = os.path.join(args.train_output_path, "checkpoint") | |||||
| if not os.path.exists(self.ckpts_dir): | |||||
| os.makedirs(self.ckpts_dir, exist_ok=True) | |||||
| self.rank = args.rank | |||||
| self.save_checkpoint_epochs = args.save_checkpoint_epochs | |||||
| formatter = logging.Formatter('%(message)s') | |||||
| # console handler | |||||
| console = logging.StreamHandler() | |||||
| console.setLevel(logging.INFO) | |||||
| formatter = logging.Formatter('%(message)s') | |||||
| console.setFormatter(formatter) | |||||
| self.addHandler(console) | |||||
| # file handler | |||||
| log_name = datetime.now().strftime('%Y-%m-%d_time_%H_%M_%S') + '_rank_{}.log'.format(self.rank) | |||||
| self.log_fn = os.path.join(self.log_dir, log_name) | |||||
| fh = logging.FileHandler(self.log_fn) | |||||
| fh.setLevel(logging.INFO) | |||||
| fh.setFormatter(formatter) | |||||
| self.addHandler(fh) | |||||
| if args: | |||||
| self.save_args(args) | |||||
| self.step = 0 | |||||
| self.epoch = 0 | |||||
| self.dataset_size = 0 | |||||
| self.print_iter = args.print_iter | |||||
| self.contrastive_loss = [] | |||||
| self.linear_eval = False | |||||
| self.Loss = 0 | |||||
| self.Acc = 0 | |||||
| def info(self, msg, *args, **kwargs): | |||||
| if self.isEnabledFor(logging.INFO): | |||||
| self._log(logging.INFO, msg, args, **kwargs) | |||||
| def save_args(self, args): | |||||
| self.info('Args:') | |||||
| args_dict = vars(args) | |||||
| for key in args_dict.keys(): | |||||
| self.info('--> %s: %s', key, args_dict[key]) | |||||
| self.info('') | |||||
| def important_info(self, msg, *args, **kwargs): | |||||
| if self.logger.isEnabledFor(logging.INFO) and self.rank == 0: | |||||
| line_width = 2 | |||||
| important_msg = '\n' | |||||
| important_msg += ('*'*70 + '\n')*line_width | |||||
| important_msg += ('*'*line_width + '\n')*2 | |||||
| important_msg += '*'*line_width + ' '*8 + msg + '\n' | |||||
| important_msg += ('*'*line_width + '\n')*2 | |||||
| important_msg += ('*'*70 + '\n')*line_width | |||||
| self.info(important_msg, *args, **kwargs) | |||||
| def epoch_start(self): | |||||
| self.step_start_time = time.time() | |||||
| self.epoch_start_time = time.time() | |||||
| self.step = 0 | |||||
| self.epoch += 1 | |||||
| self.contrastive_loss = [] | |||||
| def step_end(self, loss): | |||||
| """print log when step end.""" | |||||
| self.step += 1 | |||||
| self.contrastive_loss.append(loss.asnumpy()) | |||||
| if self.step % self.print_iter == 0: | |||||
| step_cost = (time.time() - self.step_start_time) * 1000 / self.print_iter | |||||
| self.info("Epoch[{}] [{}/{}] step cost: {:.2f} ms, loss: {}".format( | |||||
| self.epoch, self.step, self.dataset_size, step_cost, loss)) | |||||
| self.step_start_time = time.time() | |||||
| def epoch_end(self, net): | |||||
| """print log and save cgeckpoints when epoch end.""" | |||||
| epoch_cost = (time.time() - self.epoch_start_time) * 1000 | |||||
| pre_step_time = epoch_cost / self.dataset_size | |||||
| mean_loss = sum(self.contrastive_loss) / self.dataset_size | |||||
| self.info("Epoch [{}] total cost: {:.2f} ms, pre step: {:.2f} ms, mean_loss: {:.2f}"\ | |||||
| .format(self.epoch, epoch_cost, pre_step_time, mean_loss)) | |||||
| if self.epoch % self.save_checkpoint_epochs == 0: | |||||
| if self.linear_eval: | |||||
| save_checkpoint(net, os.path.join(self.ckpts_dir, f"linearClassifier_{self.epoch}.ckpt")) | |||||
| else: | |||||
| save_checkpoint(net, os.path.join(self.ckpts_dir, f"simclr_{self.epoch}.ckpt")) | |||||
| def start_predict(self): | |||||
| self.predict_start_time = time.time() | |||||
| self.step = 0 | |||||
| self.info('==========start predict===============') | |||||
| def end_predict(self): | |||||
| avg_loss = self.Loss / self.step | |||||
| avg_acc = self.Acc / self.step | |||||
| self.info('Average loss {:.5f}, Average accuracy {:.5f}'.format(avg_loss, avg_acc)) | |||||
| self.info('==========end predict===============\n') | |||||
| def predict_step_end(self, loss, acc): | |||||
| self.step += 1 | |||||
| self.Loss = self.Loss + loss | |||||
| self.Acc = self.Acc + acc | |||||
| if self.step % self.print_iter == 0: | |||||
| current_loss = self.Loss / self.step | |||||
| current_acc = self.Acc / self.step | |||||
| self.info('[{}/{}] Current total loss {:.5f}, Current total accuracy {:.5f}'\ | |||||
| .format(self.step, self.dataset_size, current_loss, current_acc)) | |||||
| @@ -0,0 +1,485 @@ | |||||
| # Copyright 2021 Huawei Technologies Co., Ltd | |||||
| # | |||||
| # Licensed under the Apache License, Version 2.0 (the "License"); | |||||
| # you may not use this file except in compliance with the License. | |||||
| # You may obtain a copy of the License at | |||||
| # | |||||
| # http://www.apache.org/licenses/LICENSE-2.0 | |||||
| # | |||||
| # Unless required by applicable law or agreed to in writing, software | |||||
| # distributed under the License is distributed on an "AS IS" BASIS, | |||||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | |||||
| # See the License for the specific language governing permissions and | |||||
| # limitations under the License. | |||||
| # ============================================================================ | |||||
| """SimCLR ResNet.""" | |||||
| import math | |||||
| import numpy as np | |||||
| import mindspore.nn as nn | |||||
| import mindspore.common.dtype as mstype | |||||
| from mindspore.ops import operations as P | |||||
| from mindspore.ops import functional as F | |||||
| from mindspore.common.tensor import Tensor | |||||
| from scipy.stats import truncnorm | |||||
| def _conv_variance_scaling_initializer(in_channel, out_channel, kernel_size): | |||||
| fan_in = in_channel * kernel_size * kernel_size | |||||
| scale = 1.0 | |||||
| scale /= max(1., fan_in) | |||||
| stddev = (scale ** 0.5) / .87962566103423978 | |||||
| mu, sigma = 0, stddev | |||||
| weight = truncnorm(-2, 2, loc=mu, scale=sigma).rvs(out_channel * in_channel * kernel_size * kernel_size) | |||||
| weight = np.reshape(weight, (out_channel, in_channel, kernel_size, kernel_size)) | |||||
| return Tensor(weight, dtype=mstype.float32) | |||||
| def _weight_variable(shape, factor=0.01): | |||||
| init_value = np.random.randn(*shape).astype(np.float32) * factor | |||||
| return Tensor(init_value) | |||||
| def calculate_gain(nonlinearity, param=None): | |||||
| """calculate_gain""" | |||||
| linear_fns = ['linear', 'conv1d', 'conv2d', 'conv3d', 'conv_transpose1d', 'conv_transpose2d', 'conv_transpose3d'] | |||||
| res = 0 | |||||
| if nonlinearity in linear_fns or nonlinearity == 'sigmoid': | |||||
| res = 1 | |||||
| elif nonlinearity == 'tanh': | |||||
| res = 5.0 / 3 | |||||
| elif nonlinearity == 'relu': | |||||
| res = math.sqrt(2.0) | |||||
| elif nonlinearity == 'leaky_relu': | |||||
| if param is None: | |||||
| negative_slope = 0.01 | |||||
| elif not isinstance(param, bool) and isinstance(param, int) or isinstance(param, float): | |||||
| # True/False are instances of int, hence check above | |||||
| negative_slope = param | |||||
| else: | |||||
| raise ValueError("negative_slope {} not a valid number".format(param)) | |||||
| res = math.sqrt(2.0 / (1 + negative_slope ** 2)) | |||||
| else: | |||||
| raise ValueError("Unsupported nonlinearity {}".format(nonlinearity)) | |||||
| return res | |||||
| def _calculate_fan_in_and_fan_out(tensor): | |||||
| """_calculate_fan_in_and_fan_out""" | |||||
| dimensions = len(tensor) | |||||
| if dimensions < 2: | |||||
| raise ValueError("Fan in and fan out can not be computed for tensor with fewer than 2 dimensions") | |||||
| if dimensions == 2: # Linear | |||||
| fan_in = tensor[1] | |||||
| fan_out = tensor[0] | |||||
| else: | |||||
| num_input_fmaps = tensor[1] | |||||
| num_output_fmaps = tensor[0] | |||||
| receptive_field_size = 1 | |||||
| if dimensions > 2: | |||||
| receptive_field_size = tensor[2] * tensor[3] | |||||
| fan_in = num_input_fmaps * receptive_field_size | |||||
| fan_out = num_output_fmaps * receptive_field_size | |||||
| return fan_in, fan_out | |||||
| def _calculate_correct_fan(tensor, mode): | |||||
| mode = mode.lower() | |||||
| valid_modes = ['fan_in', 'fan_out'] | |||||
| if mode not in valid_modes: | |||||
| raise ValueError("Mode {} not supported, please use one of {}".format(mode, valid_modes)) | |||||
| fan_in, fan_out = _calculate_fan_in_and_fan_out(tensor) | |||||
| return fan_in if mode == 'fan_in' else fan_out | |||||
| def kaiming_normal(inputs_shape, a=0, mode='fan_in', nonlinearity='leaky_relu'): | |||||
| fan = _calculate_correct_fan(inputs_shape, mode) | |||||
| gain = calculate_gain(nonlinearity, a) | |||||
| std = gain / math.sqrt(fan) | |||||
| return np.random.normal(0, std, size=inputs_shape).astype(np.float32) | |||||
| def kaiming_uniform(inputs_shape, a=0., mode='fan_in', nonlinearity='leaky_relu'): | |||||
| fan = _calculate_correct_fan(inputs_shape, mode) | |||||
| gain = calculate_gain(nonlinearity, a) | |||||
| std = gain / math.sqrt(fan) | |||||
| bound = math.sqrt(3.0) * std # Calculate uniform bounds from standard deviation | |||||
| return np.random.uniform(-bound, bound, size=inputs_shape).astype(np.float32) | |||||
| def _conv3x3(in_channel, out_channel, stride=1, use_se=False): | |||||
| if use_se: | |||||
| weight = _conv_variance_scaling_initializer(in_channel, out_channel, kernel_size=3) | |||||
| else: | |||||
| weight_shape = (out_channel, in_channel, 3, 3) | |||||
| weight = Tensor(kaiming_normal(weight_shape, mode="fan_out", nonlinearity='relu')) | |||||
| return nn.Conv2d(in_channel, out_channel, | |||||
| kernel_size=3, stride=stride, padding=0, pad_mode='same', weight_init=weight) | |||||
| def _conv1x1(in_channel, out_channel, stride=1, use_se=False): | |||||
| if use_se: | |||||
| weight = _conv_variance_scaling_initializer(in_channel, out_channel, kernel_size=1) | |||||
| else: | |||||
| weight_shape = (out_channel, in_channel, 1, 1) | |||||
| weight = Tensor(kaiming_normal(weight_shape, mode="fan_out", nonlinearity='relu')) | |||||
| return nn.Conv2d(in_channel, out_channel, | |||||
| kernel_size=1, stride=stride, padding=0, pad_mode='same', weight_init=weight) | |||||
| def _conv7x7(in_channel, out_channel, stride=1, use_se=False): | |||||
| if use_se: | |||||
| weight = _conv_variance_scaling_initializer(in_channel, out_channel, kernel_size=7) | |||||
| else: | |||||
| weight_shape = (out_channel, in_channel, 7, 7) | |||||
| weight = Tensor(kaiming_normal(weight_shape, mode="fan_out", nonlinearity='relu')) | |||||
| return nn.Conv2d(in_channel, out_channel, | |||||
| kernel_size=7, stride=stride, padding=0, pad_mode='same', weight_init=weight) | |||||
| def _bn(channel): | |||||
| return nn.BatchNorm2d(channel, eps=1e-4, momentum=0.9, | |||||
| gamma_init=1, beta_init=0, moving_mean_init=0, moving_var_init=1) | |||||
| def _bn_last(channel): | |||||
| return nn.BatchNorm2d(channel, eps=1e-4, momentum=0.9, | |||||
| gamma_init=0, beta_init=0, moving_mean_init=0, moving_var_init=1) | |||||
| def _fc(in_channel, out_channel, use_se=False): | |||||
| if use_se: | |||||
| weight = np.random.normal(loc=0, scale=0.01, size=out_channel*in_channel) | |||||
| weight = Tensor(np.reshape(weight, (out_channel, in_channel)), dtype=mstype.float32) | |||||
| else: | |||||
| weight_shape = (out_channel, in_channel) | |||||
| weight = Tensor(kaiming_uniform(weight_shape, a=math.sqrt(5))) | |||||
| return nn.Dense(in_channel, out_channel, has_bias=True, weight_init=weight, bias_init=0) | |||||
| class ResidualBlock(nn.Cell): | |||||
| """ | |||||
| ResNet V1 residual block definition. | |||||
| Args: | |||||
| in_channel (int): Input channel. | |||||
| out_channel (int): Output channel. | |||||
| stride (int): Stride size for the first convolutional layer. Default: 1. | |||||
| use_se (bool): enable SE-ResNet50 net. Default: False. | |||||
| se_block(bool): use se block in SE-ResNet50 net. Default: False. | |||||
| Returns: | |||||
| Tensor, output tensor. | |||||
| Examples: | |||||
| >>> ResidualBlock(3, 256, stride=2) | |||||
| """ | |||||
| expansion = 4 | |||||
| def __init__(self, | |||||
| in_channel, | |||||
| out_channel, | |||||
| stride=1, | |||||
| use_se=False, se_block=False): | |||||
| super(ResidualBlock, self).__init__() | |||||
| self.stride = stride | |||||
| self.use_se = use_se | |||||
| self.se_block = se_block | |||||
| channel = out_channel // self.expansion | |||||
| self.conv1 = _conv1x1(in_channel, channel, stride=1, use_se=self.use_se) | |||||
| self.bn1 = _bn(channel) | |||||
| if self.use_se and self.stride != 1: | |||||
| self.e2 = nn.SequentialCell([_conv3x3(channel, channel, stride=1, use_se=True), _bn(channel), | |||||
| nn.ReLU(), nn.MaxPool2d(kernel_size=2, stride=2, pad_mode='same')]) | |||||
| else: | |||||
| self.conv2 = _conv3x3(channel, channel, stride=stride, use_se=self.use_se) | |||||
| self.bn2 = _bn(channel) | |||||
| self.conv3 = _conv1x1(channel, out_channel, stride=1, use_se=self.use_se) | |||||
| self.bn3 = _bn_last(out_channel) | |||||
| if self.se_block: | |||||
| self.se_global_pool = P.ReduceMean(keep_dims=False) | |||||
| self.se_dense_0 = _fc(out_channel, int(out_channel/4), use_se=self.use_se) | |||||
| self.se_dense_1 = _fc(int(out_channel/4), out_channel, use_se=self.use_se) | |||||
| self.se_sigmoid = nn.Sigmoid() | |||||
| self.se_mul = P.Mul() | |||||
| self.relu = nn.ReLU() | |||||
| self.down_sample = False | |||||
| if stride != 1 or in_channel != out_channel: | |||||
| self.down_sample = True | |||||
| self.down_sample_layer = None | |||||
| if self.down_sample: | |||||
| if self.use_se: | |||||
| if stride == 1: | |||||
| self.down_sample_layer = nn.SequentialCell([_conv1x1(in_channel, out_channel, | |||||
| stride, use_se=self.use_se), _bn(out_channel)]) | |||||
| else: | |||||
| self.down_sample_layer = nn.SequentialCell([nn.MaxPool2d(kernel_size=2, stride=2, pad_mode='same'), | |||||
| _conv1x1(in_channel, out_channel, 1, | |||||
| use_se=self.use_se), _bn(out_channel)]) | |||||
| else: | |||||
| self.down_sample_layer = nn.SequentialCell([_conv1x1(in_channel, out_channel, stride, | |||||
| use_se=self.use_se), _bn(out_channel)]) | |||||
| self.add = F.tensor_add | |||||
| def construct(self, x): | |||||
| """ | |||||
| Forward. | |||||
| """ | |||||
| identity = x | |||||
| out = self.conv1(x) | |||||
| out = self.bn1(out) | |||||
| out = self.relu(out) | |||||
| if self.use_se and self.stride != 1: | |||||
| out = self.e2(out) | |||||
| else: | |||||
| out = self.conv2(out) | |||||
| out = self.bn2(out) | |||||
| out = self.relu(out) | |||||
| out = self.conv3(out) | |||||
| out = self.bn3(out) | |||||
| if self.se_block: | |||||
| out_se = out | |||||
| out = self.se_global_pool(out, (2, 3)) | |||||
| out = self.se_dense_0(out) | |||||
| out = self.relu(out) | |||||
| out = self.se_dense_1(out) | |||||
| out = self.se_sigmoid(out) | |||||
| out = F.reshape(out, F.shape(out) + (1, 1)) | |||||
| out = self.se_mul(out, out_se) | |||||
| if self.down_sample: | |||||
| identity = self.down_sample_layer(identity) | |||||
| out = self.add(out, identity) | |||||
| out = self.relu(out) | |||||
| return out | |||||
| class Identity(nn.Cell): | |||||
| def construct(self, x): | |||||
| return x | |||||
| class ResNet(nn.Cell): | |||||
| """ | |||||
| ResNet architecture. | |||||
| Args: | |||||
| block (Cell): Block for network. | |||||
| layer_nums (list): Numbers of block in different layers. | |||||
| in_channels (list): Input channel in each layer. | |||||
| out_channels (list): Output channel in each layer. | |||||
| strides (list): Stride size in each layer. | |||||
| num_classes (int): The number of classes that the training images are belonging to. | |||||
| use_se (bool): enable SE-ResNet50 net. Default: False. | |||||
| se_block(bool): use se block in SE-ResNet50 net in layer 3 and layer 4. Default: False. | |||||
| Returns: | |||||
| Tensor, output tensor. | |||||
| Examples: | |||||
| >>> ResNet(ResidualBlock, | |||||
| >>> [3, 4, 6, 3], | |||||
| >>> [64, 256, 512, 1024], | |||||
| >>> [256, 512, 1024, 2048], | |||||
| >>> [1, 2, 2, 2], | |||||
| >>> 10) | |||||
| """ | |||||
| def __init__(self, | |||||
| block, | |||||
| layer_nums, | |||||
| in_channels, | |||||
| out_channels, | |||||
| strides, | |||||
| num_classes, | |||||
| width_multiplier, | |||||
| cifar_stem, | |||||
| use_se=False): | |||||
| super(ResNet, self).__init__() | |||||
| if not len(layer_nums) == len(in_channels) == len(out_channels) == 4: | |||||
| raise ValueError("the length of layer_num, in_channels, out_channels list must be 4!") | |||||
| self.use_se = use_se | |||||
| self.se_block = False | |||||
| if self.use_se: | |||||
| self.se_block = True | |||||
| if self.use_se: | |||||
| self.conv1_0 = _conv3x3(3, 32, stride=2, use_se=self.use_se) | |||||
| self.bn1_0 = _bn(32) | |||||
| self.conv1_1 = _conv3x3(32, 32, stride=1, use_se=self.use_se) | |||||
| self.bn1_1 = _bn(32) | |||||
| self.conv1_2 = _conv3x3(32, 64, stride=1, use_se=self.use_se) | |||||
| else: | |||||
| if cifar_stem: | |||||
| self.conv1 = _conv3x3(3, 64 * width_multiplier, stride=1) # cifar | |||||
| else: | |||||
| self.conv1 = _conv7x7(3, 64 * width_multiplier, stride=2) | |||||
| self.bn1 = _bn(64 * width_multiplier) | |||||
| self.relu = P.ReLU() | |||||
| if cifar_stem: | |||||
| self.maxpool = Identity() # cifar | |||||
| else: | |||||
| self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, pad_mode="same") | |||||
| in_channels = [i * width_multiplier for i in in_channels] | |||||
| out_channels = [i * width_multiplier for i in out_channels] | |||||
| self.layer1 = self._make_layer(block, | |||||
| layer_nums[0], | |||||
| in_channel=in_channels[0], | |||||
| out_channel=out_channels[0], | |||||
| stride=strides[0], | |||||
| use_se=self.use_se) | |||||
| self.layer2 = self._make_layer(block, | |||||
| layer_nums[1], | |||||
| in_channel=in_channels[1], | |||||
| out_channel=out_channels[1], | |||||
| stride=strides[1], | |||||
| use_se=self.use_se) | |||||
| self.layer3 = self._make_layer(block, | |||||
| layer_nums[2], | |||||
| in_channel=in_channels[2], | |||||
| out_channel=out_channels[2], | |||||
| stride=strides[2], | |||||
| use_se=self.use_se, | |||||
| se_block=self.se_block) | |||||
| self.layer4 = self._make_layer(block, | |||||
| layer_nums[3], | |||||
| in_channel=in_channels[3], | |||||
| out_channel=out_channels[3], | |||||
| stride=strides[3], | |||||
| use_se=self.use_se, | |||||
| se_block=self.se_block) | |||||
| self.mean = P.ReduceMean(keep_dims=True) | |||||
| self.flatten = nn.Flatten() | |||||
| self.end_point = _fc(out_channels[3], num_classes, use_se=self.use_se) | |||||
| def _make_layer(self, block, layer_num, in_channel, out_channel, stride, use_se=False, se_block=False): | |||||
| """ | |||||
| Make stage network of ResNet. | |||||
| Args: | |||||
| block (Cell): Resnet block. | |||||
| layer_num (int): Layer number. | |||||
| in_channel (int): Input channel. | |||||
| out_channel (int): Output channel. | |||||
| stride (int): Stride size for the first convolutional layer. | |||||
| se_block(bool): use se block in SE-ResNet50 net. Default: False. | |||||
| Returns: | |||||
| SequentialCell, the output layer. | |||||
| Examples: | |||||
| >>> _make_layer(ResidualBlock, 3, 128, 256, 2) | |||||
| """ | |||||
| layers = [] | |||||
| resnet_block = block(in_channel, out_channel, stride=stride, use_se=use_se) | |||||
| layers.append(resnet_block) | |||||
| if se_block: | |||||
| for _ in range(1, layer_num - 1): | |||||
| resnet_block = block(out_channel, out_channel, stride=1, use_se=use_se) | |||||
| layers.append(resnet_block) | |||||
| resnet_block = block(out_channel, out_channel, stride=1, use_se=use_se, se_block=se_block) | |||||
| layers.append(resnet_block) | |||||
| else: | |||||
| for _ in range(1, layer_num): | |||||
| resnet_block = block(out_channel, out_channel, stride=1, use_se=use_se) | |||||
| layers.append(resnet_block) | |||||
| return nn.SequentialCell(layers) | |||||
| def construct(self, x): | |||||
| """ | |||||
| Forward. | |||||
| """ | |||||
| if self.use_se: | |||||
| x = self.conv1_0(x) | |||||
| x = self.bn1_0(x) | |||||
| x = self.relu(x) | |||||
| x = self.conv1_1(x) | |||||
| x = self.bn1_1(x) | |||||
| x = self.relu(x) | |||||
| x = self.conv1_2(x) | |||||
| else: | |||||
| x = self.conv1(x) | |||||
| x = self.bn1(x) | |||||
| x = self.relu(x) | |||||
| c1 = self.maxpool(x) | |||||
| c2 = self.layer1(c1) | |||||
| c3 = self.layer2(c2) | |||||
| c4 = self.layer3(c3) | |||||
| c5 = self.layer4(c4) | |||||
| out = self.mean(c5, (2, 3)) | |||||
| out = self.flatten(out) | |||||
| out = self.end_point(out) | |||||
| return out | |||||
| def resnet50(class_num=10, width_multiplier=1, cifar_stem=True): | |||||
| """ | |||||
| Get ResNet50 neural network. | |||||
| Args: | |||||
| class_num (int): Class number. | |||||
| Returns: | |||||
| Cell, cell instance of ResNet50 neural network. | |||||
| Examples: | |||||
| >>> net = resnet50(10) | |||||
| """ | |||||
| return ResNet(ResidualBlock, | |||||
| [3, 4, 6, 3], | |||||
| [64, 256, 512, 1024], | |||||
| [256, 512, 1024, 2048], | |||||
| [1, 2, 2, 2], | |||||
| class_num, | |||||
| width_multiplier, | |||||
| cifar_stem) | |||||
| def se_resnet50(class_num=1001, width_multiplier=1): | |||||
| """ | |||||
| Get SE-ResNet50 neural network. | |||||
| Args: | |||||
| class_num (int): Class number. | |||||
| Returns: | |||||
| Cell, cell instance of SE-ResNet50 neural network. | |||||
| Examples: | |||||
| >>> net = se-resnet50(1001) | |||||
| """ | |||||
| return ResNet(ResidualBlock, | |||||
| [3, 4, 6, 3], | |||||
| [64, 256, 512, 1024], | |||||
| [256, 512, 1024, 2048], | |||||
| [1, 2, 2, 2], | |||||
| class_num, | |||||
| width_multiplier, | |||||
| use_se=True) | |||||
| def resnet101(class_num=1001, width_multiplier=1): | |||||
| """ | |||||
| Get ResNet101 neural network. | |||||
| Args: | |||||
| class_num (int): Class number. | |||||
| Returns: | |||||
| Cell, cell instance of ResNet101 neural network. | |||||
| Examples: | |||||
| >>> net = resnet101(1001) | |||||
| """ | |||||
| return ResNet(ResidualBlock, | |||||
| [3, 4, 23, 3], | |||||
| [64, 256, 512, 1024], | |||||
| [256, 512, 1024, 2048], | |||||
| [1, 2, 2, 2], | |||||
| class_num, | |||||
| width_multiplier) | |||||
| @@ -0,0 +1,53 @@ | |||||
| # Copyright 2021 Huawei Technologies Co., Ltd | |||||
| # | |||||
| # Licensed under the Apache License, Version 2.0 (the "License"); | |||||
| # you may not use this file except in compliance with the License. | |||||
| # You may obtain a copy of the License at | |||||
| # | |||||
| # http://www.apache.org/licenses/LICENSE-2.0 | |||||
| # | |||||
| # Unless required by applicable law or agreed to in writing, software | |||||
| # distributed under the License is distributed on an "AS IS" BASIS, | |||||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | |||||
| # See the License for the specific language governing permissions and | |||||
| # limitations under the License. | |||||
| # ============================================================================ | |||||
| """SimCLR Model class.""" | |||||
| from mindspore import nn | |||||
| from .resnet import _fc | |||||
| class Identity(nn.Cell): | |||||
| def construct(self, x): | |||||
| return x | |||||
| class SimCLR(nn.Cell): | |||||
| """ | |||||
| SimCLR Model. | |||||
| """ | |||||
| def __init__(self, encoder, project_dim, n_features): | |||||
| super(SimCLR, self).__init__() | |||||
| self.encoder = encoder | |||||
| self.n_features = n_features | |||||
| self.encoder.end_point = Identity() | |||||
| self.dense1 = _fc(self.n_features, self.n_features) | |||||
| self.relu = nn.ReLU() | |||||
| self.end_point = _fc(self.n_features, project_dim) | |||||
| # Projector MLP. | |||||
| def projector(self, x): | |||||
| out = self.dense1(x) | |||||
| out = self.relu(out) | |||||
| out = self.end_point(out) | |||||
| return out | |||||
| def construct(self, x_i, x_j): | |||||
| h_i = self.encoder(x_i) | |||||
| z_i = self.projector(h_i) | |||||
| h_j = self.encoder(x_j) | |||||
| z_j = self.projector(h_j) | |||||
| return h_i, h_j, z_i, z_j | |||||
| def inference(self, x): | |||||
| h = self.encoder(x) | |||||
| return h | |||||
| @@ -0,0 +1,164 @@ | |||||
| # Copyright 2021 Huawei Technologies Co., Ltd | |||||
| # | |||||
| # Licensed under the Apache License, Version 2.0 (the "License"); | |||||
| # you may not use this file except in compliance with the License. | |||||
| # You may obtain a copy of the License at | |||||
| # | |||||
| # http://www.apache.org/licenses/LICENSE-2.0 | |||||
| # | |||||
| # Unless required by applicable law or agreed to in writing, software | |||||
| # distributed under the License is distributed on an "AS IS" BASIS, | |||||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | |||||
| # See the License for the specific language governing permissions and | |||||
| # limitations under the License. | |||||
| # ============================================================================ | |||||
| """ | |||||
| ######################## train SimCLR example ######################## | |||||
| train simclr and get network model files(.ckpt) : | |||||
| python train.py --train_dataset_path /YourDataPath | |||||
| """ | |||||
| import ast | |||||
| import argparse | |||||
| import os | |||||
| from src.nt_xent import NT_Xent_Loss | |||||
| from src.optimizer import get_train_optimizer as get_optimizer | |||||
| from src.dataset import create_dataset | |||||
| from src.simclr_model import SimCLR | |||||
| from src.resnet import resnet50 as resnet | |||||
| from mindspore import nn | |||||
| from mindspore import context | |||||
| from mindspore.train import Model | |||||
| from mindspore.train.callback import ModelCheckpoint, CheckpointConfig, LossMonitor, TimeMonitor | |||||
| from mindspore.common import initializer as weight_init | |||||
| from mindspore.common import set_seed | |||||
| from mindspore.context import ParallelMode | |||||
| from mindspore.communication.management import init, get_rank | |||||
| from mindspore.train.serialization import load_checkpoint, load_param_into_net | |||||
| parser = argparse.ArgumentParser(description='MindSpore SimCLR') | |||||
| parser.add_argument('--device_target', type=str, default='Ascend', | |||||
| help='Device target, Currently only Ascend is supported.') | |||||
| parser.add_argument('--run_cloudbrain', type=ast.literal_eval, default=True, | |||||
| help='Whether it is running on CloudBrain platform.') | |||||
| parser.add_argument('--run_distribute', type=ast.literal_eval, default=True, help='Run distributed training.') | |||||
| parser.add_argument('--device_num', type=int, default=1, help='Device num.') | |||||
| parser.add_argument('--device_id', type=int, default=0, help='device id, default is 0.') | |||||
| parser.add_argument('--dataset_name', type=str, default='cifar10', help='Dataset, Currently only cifar10 is supported.') | |||||
| parser.add_argument('--train_url', default=None, help='Cloudbrain Location of training outputs.\ | |||||
| This parameter needs to be set when running on the cloud brain platform.') | |||||
| parser.add_argument('--data_url', default=None, help='Cloudbrain Location of data.\ | |||||
| This parameter needs to be set when running on the cloud brain platform.') | |||||
| parser.add_argument('--train_dataset_path', type=str, default='./cifar/train', | |||||
| help='Dataset path for training classifier. ' | |||||
| 'This parameter needs to be set when running on the host.') | |||||
| parser.add_argument('--train_output_path', type=str, default='./outputs', help='Location of ckpt and log.\ | |||||
| This parameter needs to be set when running on the host.') | |||||
| parser.add_argument('--batch_size', type=int, default=128, help='batch_size, default is 128.') | |||||
| parser.add_argument('--epoch_size', type=int, default=100, help='epoch size for training, default is 100.') | |||||
| parser.add_argument('--projection_dimension', type=int, default=128, | |||||
| help='Projection output dimensionality, default is 128.') | |||||
| parser.add_argument('--width_multiplier', type=int, default=1, help='width_multiplier for ResNet50') | |||||
| parser.add_argument('--temperature', type=float, default=0.5, help='temperature for loss') | |||||
| parser.add_argument('--pre_trained_path', type=str, default=None, help='Pretrained checkpoint path') | |||||
| parser.add_argument('--pretrain_epoch_size', type=int, default=0, | |||||
| help='real_epoch_size = epoch_size - pretrain_epoch_size.') | |||||
| parser.add_argument('--save_checkpoint_epochs', type=int, default=1, help='Save checkpoint epochs, default is 1.') | |||||
| parser.add_argument('--save_graphs', type=ast.literal_eval, default=False, | |||||
| help='whether save graphs, default is False.') | |||||
| parser.add_argument('--optimizer', type=str, default='Adam', help='Optimizer, Currently only Adam is supported.') | |||||
| parser.add_argument('--weight_decay', type=float, default=3e-4, help='weight decay') | |||||
| parser.add_argument('--warmup_epochs', type=int, default=15, help='warmup epochs.') | |||||
| parser.add_argument('--use_crop', type=ast.literal_eval, default=True, help='RandomResizedCrop') | |||||
| parser.add_argument('--use_flip', type=ast.literal_eval, default=True, help='RandomHorizontalFlip') | |||||
| parser.add_argument('--use_color_jitter', type=ast.literal_eval, default=True, help='RandomColorAdjust') | |||||
| parser.add_argument('--use_color_gray', type=ast.literal_eval, default=True, help='RandomGrayscale') | |||||
| parser.add_argument('--use_blur', type=ast.literal_eval, default=False, help='GaussianBlur') | |||||
| parser.add_argument('--use_norm', type=ast.literal_eval, default=False, help='Normalize') | |||||
| args = parser.parse_args() | |||||
| local_data_url = './cache/data' | |||||
| local_train_url = './cache/train' | |||||
| _local_train_url = local_train_url | |||||
| if args.device_target != "Ascend": | |||||
| raise ValueError("Unsupported device target.") | |||||
| if args.run_distribute: | |||||
| device_id = os.getenv("DEVICE_ID", default=None) | |||||
| if device_id is None: | |||||
| raise ValueError("Unsupported device id.") | |||||
| args.device_id = int(device_id) | |||||
| rank_size = os.getenv("RANK_SIZE", default=None) | |||||
| if rank_size is None: | |||||
| raise ValueError("Unsupported rank size.") | |||||
| if args.device_num > int(rank_size) or args.device_num == 1: | |||||
| args.device_num = int(rank_size) | |||||
| context.set_context(device_id=args.device_id) | |||||
| context.set_context(mode=context.GRAPH_MODE, device_target=args.device_target, save_graphs=args.save_graphs) | |||||
| context.reset_auto_parallel_context() | |||||
| context.set_auto_parallel_context(parallel_mode=ParallelMode.DATA_PARALLEL, | |||||
| gradients_mean=True, device_num=args.device_num) | |||||
| init() | |||||
| args.rank = get_rank() | |||||
| local_data_url = os.path.join(local_data_url, str(args.device_id)) | |||||
| local_train_url = os.path.join(local_train_url, str(args.device_id)) | |||||
| args.train_output_path = os.path.join(args.train_output_path, str(args.device_id)) | |||||
| else: | |||||
| context.set_context(mode=context.GRAPH_MODE, device_target=args.device_target, | |||||
| save_graphs=args.save_graphs, device_id=args.device_id) | |||||
| args.rank = 0 | |||||
| args.device_num = 1 | |||||
| if args.run_cloudbrain: | |||||
| import moxing as mox | |||||
| args.train_dataset_path = os.path.join(local_data_url, "train") | |||||
| args.train_output_path = local_train_url | |||||
| mox.file.copy_parallel(src_url=args.data_url, dst_url=local_data_url) | |||||
| set_seed(1) | |||||
| class NetWithLossCell(nn.Cell): | |||||
| def __init__(self, backbone, loss_fn): | |||||
| super(NetWithLossCell, self).__init__(auto_prefix=False) | |||||
| self._backbone = backbone | |||||
| self._loss_fn = loss_fn | |||||
| def construct(self, data_x, data_y, label): | |||||
| _, _, x_pred, y_pred = self._backbone(data_x, data_y) | |||||
| return self._loss_fn(x_pred, y_pred) | |||||
| if __name__ == "__main__": | |||||
| dataset = create_dataset(args, dataset_mode="train_endcoder") | |||||
| # Net. | |||||
| base_net = resnet(1, args.width_multiplier, cifar_stem=args.dataset_name == "cifar10") | |||||
| net = SimCLR(base_net, args.projection_dimension, base_net.end_point.in_channels) | |||||
| # init weight | |||||
| if args.pre_trained_path: | |||||
| if args.run_cloudbrain: | |||||
| mox.file.copy_parallel(src_url=args.pre_trained_path, dst_url=local_data_url+'/pre_train.ckpt') | |||||
| param_dict = load_checkpoint(local_data_url+'/pre_train.ckpt') | |||||
| else: | |||||
| param_dict = load_checkpoint(args.pre_trained_path) | |||||
| load_param_into_net(net, param_dict) | |||||
| else: | |||||
| for _, cell in net.cells_and_names(): | |||||
| if isinstance(cell, nn.Conv2d): | |||||
| cell.weight.set_data(weight_init.initializer(weight_init.XavierUniform(), | |||||
| cell.weight.shape, | |||||
| cell.weight.dtype)) | |||||
| if isinstance(cell, nn.Dense): | |||||
| cell.weight.set_data(weight_init.initializer(weight_init.TruncatedNormal(), | |||||
| cell.weight.shape, | |||||
| cell.weight.dtype)) | |||||
| optimizer = get_optimizer(net, dataset.get_dataset_size(), args) | |||||
| loss = NT_Xent_Loss(args.batch_size, args.temperature) | |||||
| net_loss = NetWithLossCell(net, loss) | |||||
| train_net = nn.TrainOneStepCell(net_loss, optimizer) | |||||
| model = Model(train_net) | |||||
| time_cb = TimeMonitor(data_size=dataset.get_dataset_size()) | |||||
| config_ck = CheckpointConfig(save_checkpoint_steps=args.save_checkpoint_epochs) | |||||
| ckpts_dir = os.path.join(args.train_output_path, "checkpoint") | |||||
| ckpoint_cb = ModelCheckpoint(prefix="checkpoint_simclr", directory=ckpts_dir, config=config_ck) | |||||
| print("============== Starting Training ==============") | |||||
| model.train(args.epoch_size, dataset, callbacks=[time_cb, ckpoint_cb, LossMonitor()]) | |||||
| if args.run_cloudbrain and args.device_id == 0: | |||||
| mox.file.copy_parallel(src_url=_local_train_url, dst_url=args.train_url) | |||||