Ziyan
98e2ee90de
fix optimizer parallel problems
5 years ago
ougongchang
1dafb2c6f5
Modify collecting graph and dataset graph to step end stage
We collect graph and dataset graph in begin stage before,
If there compile graph fail in GPU, we also collect graph
and dataset graph to summary dir, it will confuse user.
So we collect graph and dataset graph in step end stage now,
If there compile graph fail, we will not collect graph and dataset
graph.
5 years ago
shenwei41
051c290d8b
Modify patches and alerts
5 years ago
gengdongjie
00f7a936bf
add resnet50 support multi node training
5 years ago
zhouyaqiang
b0004a1791
support multy node training and remove code
5 years ago
mindspore-ci-bot
387dac5832
!3651 change num_samples definition
Merge pull request !3651 from jiangzhiwen/dataset/change_num_samples
5 years ago
hanjun996
20ccf83826
modify tdt
5 years ago
mindspore-ci-bot
a3e7c4c754
!3625 Optimize tensor data
Merge pull request !3625 from hewei/optimize_tensor_data
5 years ago
mindspore-ci-bot
fe514bd1cc
!3644 [MD] fix minddataset core dump when file list size ia greater than 1000.
Merge pull request !3644 from liyong126/fix_mindrecord_bug
5 years ago
hexia
3100824703
fix input
5 years ago
mindspore-ci-bot
a337a02732
!3638 fix codex and support akg op profiling
Merge pull request !3638 from geekun/yjk_master
5 years ago
yangzhenzhang
9aa84b3d14
add strided slice op
5 years ago
wandongdong
b39d524d44
set out format to nhwc4
5 years ago
meixiaowei
8950952fe3
modify readme
5 years ago
limingqi107
af39ca8252
modify the wrong word
5 years ago
wilfChen
9cad0fec1d
gpu broadcast to
5 years ago
mindspore-ci-bot
1b69923472
!3643 Throw exception if different communication ops which are divided to the same segement share the same input
Merge pull request !3643 from huanghui/communication-op-fusion
5 years ago
mindspore-ci-bot
d4b52ac59f
!3489 use kernelruntime::mem_manager to reduce rtMalloc and rtFree time in trans data format
Merge pull request !3489 from lvchangquan/master
5 years ago
panfengfeng
4644085e92
add epoch_num
5 years ago
wanghua
7dd5e78fde
add tinybert scripts
5 years ago
jiangzhiwen
1eda0ef071
change num_samples definition
5 years ago
mindspore-ci-bot
fcdad59ce6
!3594 fix batchnorm issue under mix precision in pynative mode
Merge pull request !3594 from wangqiuliang/fix-batchnorm-under-mix-precision-in-pynative
5 years ago
GuoMengHao
2309e7369a
add_python_distribute_pretrain_script
Signed-off-by: GuoMengHao <guomenghao@huawei.com>
5 years ago
mindspore-ci-bot
12a150bb5d
!3630 not reuse refnode input's memory
Merge pull request !3630 from laiyongqiang/refnode_input_fix
5 years ago
mindspore-ci-bot
c57ad1528f
!3635 fix dataset & train gil lock of gpu process master
Merge pull request !3635 from panfengfeng/fix_dataset_train_gil_of_gpu_master
5 years ago
mindspore-ci-bot
44e739ae31
!3627 fix: device occupied tdt hung
Merge pull request !3627 from guozhijian/fix_device_occupied_tdt_hung
5 years ago
geekun
17d71280b8
fix codex and support akg op profiling
5 years ago
mindspore-ci-bot
9ccc6889eb
!3624 fix GeneratorDataset time out
Merge pull request !3624 from yanghaitao/yht_generator_timeout
5 years ago
mindspore-ci-bot
4834a6b347
!3574 Rename AnfNode::user_data related functions to follow naming rule
Merge pull request !3574 from hewei/rename_user_data_func
5 years ago
liyong
ed70de8070
fix coredump when number of file list more than 1000.
5 years ago
huanghui
311d8ea1f9
add exception when different communication op in one segment shared the same input
5 years ago
mindspore-ci-bot
e4a7ca7f08
!3637 Lowering value checking threshold to support training with very small steps or
Merge pull request !3637 from thlinh/dev_Jul28_lower_checking_threshold
5 years ago
lvchangquan
fdbe4c19ba
use kernel_runtime::mem_manager to reduce rtMalloc and rtFree time in trans data format
5 years ago
He Wei
db6aa862d5
Optimize tensor data
Replace std::vector<T> with std::unique_ptr<T[]> for tensor data storage,
it prevent unintended data initialization when data is lazy allocated.
5 years ago
mindspore-ci-bot
d06da1d270
!3603 Check that the number columns of names and default matches
Merge pull request !3603 from jiangzhiwen/fix_column_names_exceeded
5 years ago
kingfo
fab9fac109
fix batchnorm under mix precision in pynative mode
5 years ago
Hoai Linh Tran
b4c57295f7
Lowering value checking threshold to support training with very small steps
5 years ago
mindspore-ci-bot
b75943f220
!3620 add mindspore lite
Merge pull request !3620 from 张学同/to_merge
5 years ago
panfengfeng
48ab208148
fix dataset train gil of gpu
5 years ago
mindspore-ci-bot
800b9dc596
!3270 New optimization pass to remove redundant Select ops
Merge pull request !3270 from thlinh/dev_Jul17_removeSelect
5 years ago
mindspore-ci-bot
f48ef43647
!3611 dataset: repair problem in vgg cifar(version)
Merge pull request !3611 from ms_yan/vgg_repair
5 years ago
mindspore-ci-bot
fa3a1f4a16
!3556 add desc about sink_size
Merge pull request !3556 from jinyaohui/master
5 years ago
mindspore-ci-bot
2f956d7cc2
!3612 modify case
Merge pull request !3612 from changzherui/mod_case
5 years ago
mindspore-ci-bot
cf6e13cc48
!3563 fix a bug that causes failure when running muti-p from origin dataset,not from MR
Merge pull request !3563 from zhouyuanshen/master
5 years ago
mindspore-ci-bot
c2385e2ede
!3615 Move nn/distribution to nn/probability/distribution
Merge pull request !3615 from XunDeng/pp_poc_v3
5 years ago
jonyguo
b9d855cbca
fix: device occupied tdt hung
5 years ago
mindspore-ci-bot
9e1244934c
!3614 Update Convert Switch to use PCNode
Merge pull request !3614 from Giancarlo/update_convert_sw
5 years ago
mindspore-ci-bot
980b67d1c4
!3578 fix maskrcnn dataset rescale bug
Merge pull request !3578 from meixiaowei/master
5 years ago
mindspore-ci-bot
f96efbfe19
!3606 [bug][auto_mixed_precision]fix amp doc and eval network build
Merge pull request !3606 from vlne-v1/amp_doc
5 years ago
jiangzhiwen
2cc6b5cb52
fix number of columns not match
5 years ago