chenzomi
d471d32e87
[ME] change `check_integer` to format `check_positive_int` and `check_integeter`
5 years ago
chenzomi
d4e8e94981
[ME] delete check_bool and replace with Validate.check_bool
5 years ago
mindspore-ci-bot
7f390467e9
!6781 Change prefix for server ckpt callback
Merge pull request !6781 from ZPaC/master-change-prefix-for-server-ckpt
5 years ago
ZPaC
28c57f3f29
Change prefix for server ckpt callback
5 years ago
caozhou
5221041490
fix int in ckpt name changed to float bug
5 years ago
李鸿章
548b931f9d
flush summary when appropriate
5 years ago
ougongchang
e93365c664
Add a note for summary only supports linux systems
5 years ago
nhussain
3bac9d3713
switch input columns and operation
change ImagefolderDV2 name
change ds.transforms.vision to ds.vision
change batch api to match map api more closely
compose op changes
test_pylint
remove compose op from vision, move to transform module, refactor map and batch to use column_order
5 years ago
ZPaC
87bf2a7dcd
Add PS context.
5 years ago
Li Hongzhang
066950f69e
GPU dataset_sink_mode collect inputs
5 years ago
mindspore-ci-bot
4ec343961e
!5482 modify save_checkpoint
Merge pull request !5482 from liuyang/md_save_checkpoint
5 years ago
liuyang_655
4683de3443
modify save_checkpoint
5 years ago
Li Hongzhang
f95d3f21fb
fix assertion: Tensor(0) is falsy
5 years ago
mindspore-ci-bot
5b738794d2
!5389 Copy the default specified data when collect_specified data is None
Merge pull request !5389 from ougongchang/fix_summarycollector
5 years ago
Li Hongzhang
9050f2ad64
forkserver multiprocessing context
5 years ago
ougongchang
458c69a22c
Copy the default specified data when collect_specified data is None
5 years ago
ZPaC
830172201a
Fix multi server precision error.
5 years ago
wanyiming
3d354d76fd
mod_callback
5 years ago
Li Hongzhang
de43c11e2e
fix several issues
- handle collection for multiple trains
- how many tensors to collect when sunk
- change loglevel for get_learning_rate
- update calculation of `max_file_size`
- fix how collect_tensor_freq counting
5 years ago
changzherui
fe9371b7e7
modify ckpt param check
5 years ago
mindspore-ci-bot
3dcea81721
!3907 modify ckpt func check parameter
Merge pull request !3907 from changzherui/mod_ckpt_func_param
5 years ago
changzherui
e60cb7fdf8
modify ckpt func check parameter
5 years ago
Li Hongzhang
fd03ed8341
fix not-exit issue and docs issue
- fix writer pool not exit when max_file_size too small
- fix API docs to illustrate `collect_tensor_freq` and `max_file_size`
5 years ago
ougongchang
1dafb2c6f5
Modify collecting graph and dataset graph to step end stage
We collect graph and dataset graph in begin stage before,
If there compile graph fail in GPU, we also collect graph
and dataset graph to summary dir, it will confuse user.
So we collect graph and dataset graph in step end stage now,
If there compile graph fail, we will not collect graph and dataset
graph.
5 years ago
Li Hongzhang
cd4776fc32
change at-most collected tensor from 50 to 20
When `collect_tensor_freq` is specified as `None`,
the `collect_tensor_freq` would be auto calculated.
The previous behavior is to collect at most 50 steps,
now changing to 20
5 years ago
changzherui
070e63e1ee
modify timemonitor
5 years ago
Li Hongzhang
879409ef97
restore the ability to collect network graph
5 years ago
ougongchang
336fca14bc
Fix collecting bert network name faild in MindInsight lineage.
1. collect the origin network in model, and set it to cb_params
2. collect the origin network name in SummaryCollector
3. Update the SummaryCollector API Doc
5 years ago
Li Hongzhang
88dcd90889
limit summary of exhausting the disk
5 years ago
changzherui
99a2ab4b2e
modify asyn save checkpoint bug
5 years ago
d00455729
d45abc5f54
Asynchronous save checkpoint
5 years ago
chenzomi
25969b5d8f
add loss monitor to lenet
5 years ago
ougongchang
9062ea4bdd
There is a error in the SummaryCollector example.
The useage is error when collect custom lineage data.
5 years ago
ougongchang
0ee568b733
Update the Api document of SummaryCollector and SummaryRecord.
Add more detail note for SummaryCollector and SummaryRecord,
else if it is used not right, some proplem will be caused.
5 years ago
chenzomi
9b7a426c6b
bug fix in auto create quant graph in master
5 years ago
mindspore-ci-bot
f1a9a7ceb1
!2718 fix quantization aware training auto create graph bug
Merge pull request !2718 from chenzhongming/master
5 years ago
ougongchang
cd868aea52
fix get loss error and NoneType error cause by _proceesor_specified_data
fix get loss error when it not a scalar and fix process specified data
failed when the action is False, and collect_specified_data parameter is
not None
5 years ago
mindspore-ci-bot
2fadbb1d04
!2743 add more ut and st for SummaryCollector
Merge pull request !2743 from ougongchang/summary_collector_ut
5 years ago
chenzomi
1089c908a9
cherry-pick r0.5 to master for quantizaiton aware training
5 years ago
ougongchang
3dc6f6f2d9
add more ut and st for SummaryCollector
Has fixed collecting optimizer error when mode is eval
5 years ago
luopengting
0e1c21d6b3
fix batch_size, collected batch_num before
5 years ago
Li Hongzhang
97d8673018
warn when values duplicate and set mode to 'eval' to avoid extra recording
5 years ago
ougongchang
33b5cda1da
Decide whether to collect data by dataset sink mode and current step in SummaryCollector.
Before, we only decide whether to collect data by current step,
it will not work well in dataset sink mode, so we check to see
if it's a dataset sink mode, and decide whether to collect data.
5 years ago
mindspore-ci-bot
087779b7d4
!2517 checkpoint add model_type
Merge pull request !2517 from chenzhongming/quant
5 years ago
chenzomi
d3f9b80066
checkpoint add model_type
5 years ago
ougongchang
108dd7a4a2
Make sure record the first step data in SummaryCollector, and catch the ValueError when the loss is not a Scalar.
5 years ago
mindspore-ci-bot
ebf30051ff
!2452 Change the dataset attribute in SummaryCollector
Merge pull request !2452 from ougongchang/fix_dataset_bug
5 years ago
Li Hongzhang
4c0d12fd63
enhance callback module and strongly check callbacks is list or not
5 years ago
ougongchang
d3ada15673
Change the attribute to children, becuase the attribute has beed changed in dataset
5 years ago
chenzomi
a834a6308e
change some comment name in the whole project
5 years ago