jiangshuqiang
066068f9a4
fix queue.empty block for summary
5 years ago
mindspore-ci-bot
444ff97206
!13505 typo fix
From: @wudenggang
Reviewed-by: @ouwenchang,@Hanshize,@kingxian
Signed-off-by: @kingxian
5 years ago
mindspore-ci-bot
373a3a199c
!13415 fix the examples to pass the ci doctest
From: @jiang-shuqiang
Reviewed-by:
Signed-off-by:
5 years ago
jiangshuqiang
517e4697ab
fix the examples to pass the doctest
5 years ago
wudenggang
b17a558af4
typo fix
5 years ago
yingchen
d582883c03
fix api bugs
5 years ago
ougongchang
8637acdcb8
Get a string path when the summary path is a list
If device target is GPU, the device number may be set to 1.
5 years ago
yepei6
908b590d8e
update comments
5 years ago
ougongchang
abd2e978a0
Prevents child processes from entering sleep stat
Note: When using sys.exit method, it is found that when the main process is killed, the summary process enters a sleep state and cannot execute the exit operation
5 years ago
ougongchang
d576bdd713
Delete feature that SummaryCollector support to collect image when dataset sink mode
5 years ago
ougongchang
4309e20b1b
Add heartbeat check in summary and delete test cases that do not exit
5 years ago
zhangyi
e4000470cf
fix error format of docstrings for some api comments.
5 years ago
jiangshuqiang
b21a98b062
fix function description for SummaryCollector and SummaryRecord
5 years ago
ougongchang
3240b2d8e1
Add device id to summary file name
To prevent write data conflicts in multi-card scenarios, the file on each card is increased by device_id
5 years ago
jiangshuqiang
0bb80995bc
fix param check for unexpected_format
5 years ago
ougongchang
c6e4b0c85f
Add more log when collect graph and use summary operators
Fix can not collect input data when batch size is 1 and total step
number is 1
Fixed spelling errors
5 years ago
jiangshuqiang
7d79376bec
remove max_file_size limitation for export files
5 years ago
jiangshuqiang
ab5cc10250
add the tensor collection feature when record summary
5 years ago
mindspore-ci-bot
1f06cd63f3
!10436 Support to control whether to throw runtime exceptions in SummaryRecord
From: @ouwenchang
Reviewed-by:
Signed-off-by:
5 years ago
ougongchang
06be546b52
Support to control whether raise RuntimeError exception in SummaryRecord
1. Support explainer raise an RuntimeError exception
2. fix the ut of SummaryRecord
5 years ago
ougongchang
e5529230bf
Prevent the multiprocess from capturing KeyboardInterrupt
5 years ago
ougongchang
4f46adf702
Place control of the pThread into the Summary process
5 years ago
ougongchang
99c06b9801
Specify a maxinmum number of OpenBLAS threads
Environment variables are used to specify a maximum number of OpenBLAS threads.
In ubuntu(GPU) environment, numpy will use too many threads for computing,
it may affect the start of the summary process.
Notice: At present, the performance of setting the thread to 2 has been tested to be more suitable.
If it is to be adjusted, it is recommended to test according to the scenario first
a
5 years ago
ougongchang
bc23af20d6
fix the docstring of SummaryCollector and SummaryRecord
5 years ago
unknown
64f27b66f1
add uncertainty, 1 ch saliency, separated datafile
bugfix
undo unexecpted changes
modify summary format
bugfix
fix CI problem
enhance comment
fix comment typo
use mindspore.log for errors
fix pylint issues
5 years ago
Jiaqi
7492e59fe4
modify Returns
5 years ago
ougongchang
6072b25a07
SummaryRecord support to record mindexplain data
The SummaryRecord.add_value() method is extended to record the data of
MindExplain.
5 years ago
chenzomi
acadb694aa
[ME] delete reduant function in check_parameter
5 years ago
ougongchang
e93365c664
Add a note for summary only supports linux systems
5 years ago
Li Hongzhang
869ca261bc
rm unused params for SummaryRecord
5 years ago
Li Hongzhang
9050f2ad64
forkserver multiprocessing context
5 years ago
wanyiming
3d354d76fd
mod_callback
5 years ago
Li Hongzhang
de43c11e2e
fix several issues
- handle collection for multiple trains
- how many tensors to collect when sunk
- change loglevel for get_learning_rate
- update calculation of `max_file_size`
- fix how collect_tensor_freq counting
5 years ago
Li Hongzhang
fd03ed8341
fix not-exit issue and docs issue
- fix writer pool not exit when max_file_size too small
- fix API docs to illustrate `collect_tensor_freq` and `max_file_size`
5 years ago
Li Hongzhang
05dd17687a
max_file_size include metadata length and drop last step
5 years ago
Li Hongzhang
88dcd90889
limit summary of exhausting the disk
5 years ago
Li Hongzhang
89462e9c3b
check disk space before writing and remove unused mode value
5 years ago
ougongchang
0ee568b733
Update the Api document of SummaryCollector and SummaryRecord.
Add more detail note for SummaryCollector and SummaryRecord,
else if it is used not right, some proplem will be caused.
5 years ago
mindspore-ci-bot
bc42685436
!2770 Capture the time before hand over to the processes pool to avoid time flips
Merge pull request !2770 from LiHongzhang/capture_time
5 years ago
Li Hongzhang
22dea2fc18
SummaryRecord register close atexit
5 years ago
Li Hongzhang
299469babb
address the importance of closing the SummaryRecord and illustrate how
5 years ago
Li Hongzhang
f9c6d12bc4
capture the time before hand over to processes pool to ensure time order
5 years ago
Li Hongzhang
97d8673018
warn when values duplicate and set mode to 'eval' to avoid extra recording
5 years ago
chenzomi
a834a6308e
change some comment name in the whole project
5 years ago
ougongchang
939cd29d7e
Add a callback named SummaryCollector and delete SummaryStep callback
I added a SummaryCollector to help users automatically collect information
such as the network, loss, learning rate and so on, making it easier to collect this information.
It also can collect train lineage and eval lineage information which is
collected by TrainLineage Callback and EvalLineage Callback in
MindInsight.
I also add some UT for SummaryCollect to keep the code correct.
5 years ago
Li Hongzhang
0921c1e538
enhance the SummaryRecord with set_mode and add_value
5 years ago
mindspore-ci-bot
373832d030
!2193 fix log level too high: step has no summary record is normal
Merge pull request !2193 from wenkai/wk1_log_level_0617
5 years ago
wenkai
a2bad5c72d
fix log level too high: step has no record is normal.
5 years ago
Li Hongzhang
d31e14f593
fix having too many processes and no attribute of '_closed'
1. When initing SummaryRecord, if check failed, self._close is not set,
which leads to 'SummaryRecord' object has no attribute '_closed'
AttributeError later on.
2. There may be too many processes for handling summary adapting.
See issue #I1K6K7
5 years ago
Li Hongzhang
ccf49b7c0e
shape is tuple, not a integer
5 years ago