| @@ -75,9 +75,9 @@ class Callback: | |||||
| """ | """ | ||||
| Abstract base class used to build a callback class. Callbacks are context managers | Abstract base class used to build a callback class. Callbacks are context managers | ||||
| which will be entered and exited when passing into the Model. | which will be entered and exited when passing into the Model. | ||||
| You can leverage this mechanism to init and release resources automatically. | |||||
| You can use this mechanism to initialize and release resources automatically. | |||||
| Callback function will execution some operating to the current step or epoch. | |||||
| Callback function will execute some operations in the current step or epoch. | |||||
| Examples: | Examples: | ||||
| >>> class Print_info(Callback): | >>> class Print_info(Callback): | ||||
| @@ -229,11 +229,11 @@ class RunContext: | |||||
| """ | """ | ||||
| Provides information about the model. | Provides information about the model. | ||||
| Run call being made. Provides information about original request to model function. | |||||
| callback objects can stop the loop by calling request_stop() of run_context. | |||||
| Provides information about original request to model function. | |||||
| Callback objects can stop the loop by calling request_stop() of run_context. | |||||
| Args: | Args: | ||||
| original_args (dict): Holding the related information of model etc. | |||||
| original_args (dict): Holding the related information of model. | |||||
| """ | """ | ||||
| def __init__(self, original_args): | def __init__(self, original_args): | ||||
| if not isinstance(original_args, dict): | if not isinstance(original_args, dict): | ||||
| @@ -246,13 +246,13 @@ class RunContext: | |||||
| Get the _original_args object. | Get the _original_args object. | ||||
| Returns: | Returns: | ||||
| Dict, a object holding the original arguments of model. | |||||
| Dict, an object that holds the original arguments of model. | |||||
| """ | """ | ||||
| return self._original_args | return self._original_args | ||||
| def request_stop(self): | def request_stop(self): | ||||
| """ | """ | ||||
| Sets stop requested during training. | |||||
| Sets stop requirement during training. | |||||
| Callbacks can use this function to request stop of iterations. | Callbacks can use this function to request stop of iterations. | ||||
| model.train() checks whether this is called or not. | model.train() checks whether this is called or not. | ||||
| @@ -70,23 +70,24 @@ def _chg_ckpt_file_name_if_same_exist(directory, prefix): | |||||
| class CheckpointConfig: | class CheckpointConfig: | ||||
| """ | """ | ||||
| The config for model checkpoint. | |||||
| The configuration of model checkpoint. | |||||
| Note: | Note: | ||||
| During the training process, if dataset is transmitted through the data channel, | During the training process, if dataset is transmitted through the data channel, | ||||
| suggest set save_checkpoint_steps be an integer multiple of loop_size. | |||||
| Otherwise there may be deviation in the timing of saving checkpoint. | |||||
| It is suggested to set 'save_checkpoint_steps' to an integer multiple of loop_size. | |||||
| Otherwise, the time to save the checkpoint may be biased. | |||||
| Args: | Args: | ||||
| save_checkpoint_steps (int): Steps to save checkpoint. Default: 1. | save_checkpoint_steps (int): Steps to save checkpoint. Default: 1. | ||||
| save_checkpoint_seconds (int): Seconds to save checkpoint. Default: 0. | save_checkpoint_seconds (int): Seconds to save checkpoint. Default: 0. | ||||
| Can't be used with save_checkpoint_steps at the same time. | Can't be used with save_checkpoint_steps at the same time. | ||||
| keep_checkpoint_max (int): Maximum step to save checkpoint. Default: 5. | |||||
| keep_checkpoint_max (int): Maximum number of checkpoint files can be saved. Default: 5. | |||||
| keep_checkpoint_per_n_minutes (int): Keep one checkpoint every n minutes. Default: 0. | keep_checkpoint_per_n_minutes (int): Keep one checkpoint every n minutes. Default: 0. | ||||
| Can't be used with keep_checkpoint_max at the same time. | Can't be used with keep_checkpoint_max at the same time. | ||||
| integrated_save (bool): Whether to intergrated save in automatic model parallel scene. Default: True. | |||||
| Integrated save function is only supported in automatic parallel scene, not supported in manual parallel. | |||||
| async_save (bool): Whether asynchronous execute save checkpoint into file. Default: False | |||||
| integrated_save (bool): Whether to perform integrated save function in automatic model parallel scene. | |||||
| Default: True. Integrated save function is only supported in automatic parallel scene, not supported | |||||
| in manual parallel. | |||||
| async_save (bool): Whether asynchronous execution saves the checkpoint to a file. Default: False | |||||
| Raises: | Raises: | ||||
| ValueError: If the input_param is None or 0. | ValueError: If the input_param is None or 0. | ||||
| @@ -180,9 +181,9 @@ class ModelCheckpoint(Callback): | |||||
| It is called to combine with train process and save the model and network parameters after traning. | It is called to combine with train process and save the model and network parameters after traning. | ||||
| Args: | Args: | ||||
| prefix (str): Checkpoint files names prefix. Default: "CKP". | |||||
| directory (str): Folder path into which checkpoint files will be saved. Default: None. | |||||
| config (CheckpointConfig): Checkpoint strategy config. Default: None. | |||||
| prefix (str): The prefix name of checkpoint files. Default: "CKP". | |||||
| directory (str): The path of the folder which will be saved in the checkpoint file. Default: None. | |||||
| config (CheckpointConfig): Checkpoint strategy configuration. Default: None. | |||||
| Raises: | Raises: | ||||
| ValueError: If the prefix is invalid. | ValueError: If the prefix is invalid. | ||||
| @@ -27,13 +27,13 @@ class LossMonitor(Callback): | |||||
| If the loss is NAN or INF, it will terminate training. | If the loss is NAN or INF, it will terminate training. | ||||
| Note: | Note: | ||||
| If per_print_times is 0 do not print loss. | |||||
| If per_print_times is 0, do not print loss. | |||||
| Args: | Args: | ||||
| per_print_times (int): Print loss every times. Default: 1. | |||||
| per_print_times (int): Print the loss each every time. Default: 1. | |||||
| Raises: | Raises: | ||||
| ValueError: If print_step is not int or less than zero. | |||||
| ValueError: If print_step is not an integer or less than zero. | |||||
| """ | """ | ||||
| def __init__(self, per_print_times=1): | def __init__(self, per_print_times=1): | ||||
| @@ -62,7 +62,7 @@ class SummaryCollector(Callback): | |||||
| SummaryCollector can help you to collect some common information. | SummaryCollector can help you to collect some common information. | ||||
| It can help you to collect loss, learning late, computational graph and so on. | It can help you to collect loss, learning late, computational graph and so on. | ||||
| SummaryCollector also persists data collected by the summary operator into a summary file. | |||||
| SummaryCollector also enables the summary operator to collect data from a summary file. | |||||
| Note: | Note: | ||||
| 1. Multiple SummaryCollector instances in callback list are not allowed. | 1. Multiple SummaryCollector instances in callback list are not allowed. | ||||
| @@ -74,51 +74,51 @@ class SummaryCollector(Callback): | |||||
| If the directory does not exist, it will be created automatically. | If the directory does not exist, it will be created automatically. | ||||
| collect_freq (int): Set the frequency of data collection, it should be greater then zero, | collect_freq (int): Set the frequency of data collection, it should be greater then zero, | ||||
| and the unit is `step`. Default: 10. If a frequency is set, we will collect data | and the unit is `step`. Default: 10. If a frequency is set, we will collect data | ||||
| at (current steps % freq) == 0, and the first step will be collected at any time. | |||||
| when (current steps % freq) equals to 0, and the first step will be collected at any time. | |||||
| It is important to note that if the data sink mode is used, the unit will become the `epoch`. | It is important to note that if the data sink mode is used, the unit will become the `epoch`. | ||||
| It is not recommended to collect data too frequently, which can affect performance. | It is not recommended to collect data too frequently, which can affect performance. | ||||
| collect_specified_data (Union[None, dict]): Perform custom operations on the collected data. Default: None. | collect_specified_data (Union[None, dict]): Perform custom operations on the collected data. Default: None. | ||||
| By default, if set to None, all data is collected as the default behavior. | By default, if set to None, all data is collected as the default behavior. | ||||
| If you want to customize the data collected, you can do so with a dictionary. | |||||
| Examples,you can set {'collect_metric': False} to control not collecting metrics. | |||||
| You can customize the collected data with a dictionary. | |||||
| For example, you can set {'collect_metric': False} to control not collecting metrics. | |||||
| The data that supports control is shown below. | The data that supports control is shown below. | ||||
| - collect_metric: Whether to collect training metrics, currently only loss is collected. | |||||
| The first output will be treated as loss, and it will be averaged. | |||||
| - collect_metric: Whether to collect training metrics, currently only the loss is collected. | |||||
| The first output will be treated as the loss and it will be averaged. | |||||
| Optional: True/False. Default: True. | Optional: True/False. Default: True. | ||||
| - collect_graph: Whether to collect computational graph, currently only | |||||
| - collect_graph: Whether to collect the computational graph. Currently, only | |||||
| training computational graph is collected. Optional: True/False. Default: True. | training computational graph is collected. Optional: True/False. Default: True. | ||||
| - collect_train_lineage: Whether to collect lineage data for the training phase, | - collect_train_lineage: Whether to collect lineage data for the training phase, | ||||
| this field will be displayed on the lineage page of Mindinsight. Optional: True/False. Default: True. | this field will be displayed on the lineage page of Mindinsight. Optional: True/False. Default: True. | ||||
| - collect_eval_lineage: Whether to collect lineage data for the eval phase, | |||||
| - collect_eval_lineage: Whether to collect lineage data for the evaluation phase, | |||||
| this field will be displayed on the lineage page of Mindinsight. Optional: True/False. Default: True. | this field will be displayed on the lineage page of Mindinsight. Optional: True/False. Default: True. | ||||
| - collect_input_data: Whether to collect dataset for each training. Currently only image data is supported. | - collect_input_data: Whether to collect dataset for each training. Currently only image data is supported. | ||||
| Optional: True/False. Default: True. | Optional: True/False. Default: True. | ||||
| - collect_dataset_graph: Whether to collect dataset graph for the training phase. | - collect_dataset_graph: Whether to collect dataset graph for the training phase. | ||||
| Optional: True/False. Default: True. | Optional: True/False. Default: True. | ||||
| - histogram_regular: Collect weight and bias for parameter distribution page display in MindInsight. | |||||
| - histogram_regular: Collect weight and bias for parameter distribution page and displayed in MindInsight. | |||||
| This field allows regular strings to control which parameters to collect. | This field allows regular strings to control which parameters to collect. | ||||
| Default: None, it means only the first five parameters are collected. | Default: None, it means only the first five parameters are collected. | ||||
| It is not recommended to collect too many parameters at once, as it can affect performance. | It is not recommended to collect too many parameters at once, as it can affect performance. | ||||
| Note that if you collect too many parameters and run out of memory, the training will fail. | Note that if you collect too many parameters and run out of memory, the training will fail. | ||||
| keep_default_action (bool): This field affects the collection behavior of the 'collect_specified_data' field. | keep_default_action (bool): This field affects the collection behavior of the 'collect_specified_data' field. | ||||
| Optional: True/False, Default: True. | Optional: True/False, Default: True. | ||||
| True: means that after specified data is set, non-specified data is collected as the default behavior. | |||||
| False: means that after specified data is set, only the specified data is collected, | |||||
| True: it means that after specified data is set, non-specified data is collected as the default behavior. | |||||
| False: it means that after specified data is set, only the specified data is collected, | |||||
| and the others are not collected. | and the others are not collected. | ||||
| custom_lineage_data (Union[dict, None]): Allows you to customize the data and present it on the MingInsight | custom_lineage_data (Union[dict, None]): Allows you to customize the data and present it on the MingInsight | ||||
| lineage page. In the custom data, the key type support str, and the value type support str/int/float. | |||||
| Default: None, it means there is no custom data. | |||||
| collect_tensor_freq (Optional[int]): Same semantic as the `collect_freq`, but controls TensorSummary only. | |||||
| Because TensorSummary data is too large compared to other summary data, this parameter is used to reduce | |||||
| its collection. By default, TensorSummary data will be collected at most 20 steps, but not more than how | |||||
| many steps other summary data will be collected. | |||||
| lineage page. In the custom data, the type of the key supports str, and the type of value supports str, int | |||||
| and float. Default: None, it means there is no custom data. | |||||
| collect_tensor_freq (Optional[int]): The same semantics as the `collect_freq`, but controls TensorSummary only. | |||||
| Because TensorSummary data is too large to be compared with other summary data, this parameter is used to | |||||
| reduce its collection. By default, The maximum number of steps for collecting TensorSummary data is 21, | |||||
| but it will not exceed the number of steps for collecting other summary data. | |||||
| Default: None, which means to follow the behavior as described above. For example, given `collect_freq=10`, | Default: None, which means to follow the behavior as described above. For example, given `collect_freq=10`, | ||||
| when the total steps is 600, TensorSummary will be collected 20 steps, while other summary data 61 steps, | when the total steps is 600, TensorSummary will be collected 20 steps, while other summary data 61 steps, | ||||
| but when the total steps is 20, both TensorSummary and other summary will be collected 3 steps. | but when the total steps is 20, both TensorSummary and other summary will be collected 3 steps. | ||||
| Also note that when in parallel mode, the total steps will be splitted evenly, which will | Also note that when in parallel mode, the total steps will be splitted evenly, which will | ||||
| affect how many steps TensorSummary will be collected. | |||||
| max_file_size (Optional[int]): The maximum size in bytes each file can be written to the disk. | |||||
| affect the number of steps TensorSummary will be collected. | |||||
| max_file_size (Optional[int]): The maximum size in bytes of each file that can be written to the disk. | |||||
| Default: None, which means no limit. For example, to write not larger than 4GB, | Default: None, which means no limit. For example, to write not larger than 4GB, | ||||
| specify `max_file_size=4 * 1024**3`. | specify `max_file_size=4 * 1024**3`. | ||||
| @@ -41,7 +41,7 @@ class FixedLossScaleManager(LossScaleManager): | |||||
| Args: | Args: | ||||
| loss_scale (float): Loss scale. Default: 128.0. | loss_scale (float): Loss scale. Default: 128.0. | ||||
| drop_overflow_update (bool): whether to do optimizer if there is overflow. Default: True. | |||||
| drop_overflow_update (bool): whether to execute optimizer if there is an overflow. Default: True. | |||||
| Examples: | Examples: | ||||
| >>> loss_scale_manager = FixedLossScaleManager() | >>> loss_scale_manager = FixedLossScaleManager() | ||||
| @@ -59,7 +59,7 @@ class FixedLossScaleManager(LossScaleManager): | |||||
| return self._loss_scale | return self._loss_scale | ||||
| def get_drop_overflow_update(self): | def get_drop_overflow_update(self): | ||||
| """Get the flag whether to drop optimizer update when there is overflow happened""" | |||||
| """Get the flag whether to drop optimizer update when there is an overflow.""" | |||||
| return self._drop_overflow_update | return self._drop_overflow_update | ||||
| def update_loss_scale(self, overflow): | def update_loss_scale(self, overflow): | ||||
| @@ -82,7 +82,7 @@ class DynamicLossScaleManager(LossScaleManager): | |||||
| Dynamic loss-scale manager. | Dynamic loss-scale manager. | ||||
| Args: | Args: | ||||
| init_loss_scale (float): Init loss scale. Default: 2**24. | |||||
| init_loss_scale (float): Initialize loss scale. Default: 2**24. | |||||
| scale_factor (int): Coefficient of increase and decrease. Default: 2. | scale_factor (int): Coefficient of increase and decrease. Default: 2. | ||||
| scale_window (int): Maximum continuous normal steps when there is no overflow. Default: 2000. | scale_window (int): Maximum continuous normal steps when there is no overflow. Default: 2000. | ||||
| @@ -135,7 +135,7 @@ class DynamicLossScaleManager(LossScaleManager): | |||||
| self.cur_iter += 1 | self.cur_iter += 1 | ||||
| def get_drop_overflow_update(self): | def get_drop_overflow_update(self): | ||||
| """Get the flag whether to drop optimizer update when there is overflow happened""" | |||||
| """Get the flag whether to drop optimizer update when there is an overflow.""" | |||||
| return True | return True | ||||
| def get_update_cell(self): | def get_update_cell(self): | ||||
| @@ -13,11 +13,11 @@ | |||||
| # limitations under the License. | # limitations under the License. | ||||
| # ============================================================================ | # ============================================================================ | ||||
| """ | """ | ||||
| quantization. | |||||
| Quantization. | |||||
| User can use quantization aware to train a model. MindSpore supports quantization aware training, | User can use quantization aware to train a model. MindSpore supports quantization aware training, | ||||
| which models quantization errors in both the forward and backward passes using fake-quantization | which models quantization errors in both the forward and backward passes using fake-quantization | ||||
| ops. Note that the entire computation is carried out in floating point. At the end of quantization | |||||
| operations. Note that the entire computation is carried out in floating point. At the end of quantization | |||||
| aware training, MindSpore provides conversion functions to convert the trained model into lower precision. | aware training, MindSpore provides conversion functions to convert the trained model into lower precision. | ||||
| """ | """ | ||||
| @@ -474,8 +474,8 @@ def export(network, *inputs, file_name, mean=127.5, std_dev=127.5, file_format=' | |||||
| def convert_quant_network(network, | def convert_quant_network(network, | ||||
| bn_fold=False, | |||||
| freeze_bn=10000, | |||||
| bn_fold=True, | |||||
| freeze_bn=10000000, | |||||
| quant_delay=(0, 0), | quant_delay=(0, 0), | ||||
| num_bits=(8, 8), | num_bits=(8, 8), | ||||
| per_channel=(False, False), | per_channel=(False, False), | ||||
| @@ -487,21 +487,20 @@ def convert_quant_network(network, | |||||
| Args: | Args: | ||||
| network (Cell): Obtain a pipeline through network for saving graph summary. | network (Cell): Obtain a pipeline through network for saving graph summary. | ||||
| bn_fold (bool): Flag to used bn fold ops for simulation inference operation. Default: False. | |||||
| freeze_bn (int): Number of steps after which BatchNorm OP parameters used total mean and variance. Default: 0. | |||||
| bn_fold (bool): Flag to used bn fold ops for simulation inference operation. Default: True. | |||||
| freeze_bn (int): Number of steps after which BatchNorm OP parameters used total mean and variance. Default: 1e7. | |||||
| quant_delay (int, list or tuple): Number of steps after which weights and activations are quantized during | quant_delay (int, list or tuple): Number of steps after which weights and activations are quantized during | ||||
| eval. The first element represent weights and second element represent data flow. Default: (0, 0) | eval. The first element represent weights and second element represent data flow. Default: (0, 0) | ||||
| num_bits (int, list or tuple): Number of bits to use for quantizing weights and activations. The first | |||||
| num_bits (int, list or tuple): Number of bits to use for quantize weights and activations. The first | |||||
| element represent weights and second element represent data flow. Default: (8, 8) | element represent weights and second element represent data flow. Default: (8, 8) | ||||
| per_channel (bool, list or tuple): Quantization granularity based on layer or on channel. If `True` | per_channel (bool, list or tuple): Quantization granularity based on layer or on channel. If `True` | ||||
| then base on per channel otherwise base on per layer. The first element represent weights | then base on per channel otherwise base on per layer. The first element represent weights | ||||
| and second element represent data flow. Default: (False, False) | and second element represent data flow. Default: (False, False) | ||||
| symmetric (bool, list or tuple): Quantization algorithm use symmetric or not. If `True` then base on | |||||
| symmetric (bool, list or tuple): Whether the quantization algorithm is symmetric or not. If `True` then base on | |||||
| symmetric otherwise base on asymmetric. The first element represent weights and second | symmetric otherwise base on asymmetric. The first element represent weights and second | ||||
| element represent data flow. Default: (False, False) | element represent data flow. Default: (False, False) | ||||
| narrow_range (bool, list or tuple): Quantization algorithm use narrow range or not. If `True` then base | |||||
| on narrow range otherwise base on off narrow range. The first element represent weights and | |||||
| second element represent data flow. Default: (False, False) | |||||
| narrow_range (bool, list or tuple): Whether the quantization algorithm uses narrow range or not. | |||||
| The first element represents weights and the second element represents data flow. Default: (False, False) | |||||
| Returns: | Returns: | ||||
| Cell, Network which has change to quantization aware training network cell. | Cell, Network which has change to quantization aware training network cell. | ||||
| @@ -144,10 +144,10 @@ def save_checkpoint(parameter_list, ckpt_file_name, async_save=False): | |||||
| Saves checkpoint info to a specified file. | Saves checkpoint info to a specified file. | ||||
| Args: | Args: | ||||
| parameter_list (list): Parameters list, each element is a dict | |||||
| parameter_list (list): Parameters list, each element is a dictionary | |||||
| like {"name":xx, "type":xx, "shape":xx, "data":xx}. | like {"name":xx, "type":xx, "shape":xx, "data":xx}. | ||||
| ckpt_file_name (str): Checkpoint file name. | ckpt_file_name (str): Checkpoint file name. | ||||
| async_save (bool): Whether asynchronous execute save checkpoint into file. Default: False | |||||
| async_save (bool): Whether asynchronous execution saves the checkpoint to a file. Default: False | |||||
| Raises: | Raises: | ||||
| RuntimeError: Failed to save the Checkpoint file. | RuntimeError: Failed to save the Checkpoint file. | ||||
| @@ -270,10 +270,10 @@ def load_param_into_net(net, parameter_dict): | |||||
| Args: | Args: | ||||
| net (Cell): Cell network. | net (Cell): Cell network. | ||||
| parameter_dict (dict): Parameter dict. | |||||
| parameter_dict (dict): Parameter dictionary. | |||||
| Raises: | Raises: | ||||
| TypeError: Argument is not a Cell, or parameter_dict is not a Parameter dict. | |||||
| TypeError: Argument is not a Cell, or parameter_dict is not a Parameter dictionary. | |||||
| """ | """ | ||||
| if not isinstance(net, nn.Cell): | if not isinstance(net, nn.Cell): | ||||
| logger.error("Failed to combine the net and the parameters.") | logger.error("Failed to combine the net and the parameters.") | ||||
| @@ -447,12 +447,12 @@ def _fill_param_into_net(net, parameter_list): | |||||
| def export(net, *inputs, file_name, file_format='AIR'): | def export(net, *inputs, file_name, file_format='AIR'): | ||||
| """ | """ | ||||
| Exports MindSpore predict model to file in specified format. | |||||
| Export the MindSpore prediction model to a file in the specified format. | |||||
| Args: | Args: | ||||
| net (Cell): MindSpore network. | net (Cell): MindSpore network. | ||||
| inputs (Tensor): Inputs of the `net`. | inputs (Tensor): Inputs of the `net`. | ||||
| file_name (str): File name of model to export. | |||||
| file_name (str): File name of the model to be exported. | |||||
| file_format (str): MindSpore currently supports 'AIR', 'ONNX' and 'MINDIR' format for exported model. | file_format (str): MindSpore currently supports 'AIR', 'ONNX' and 'MINDIR' format for exported model. | ||||
| - AIR: Ascend Intermidiate Representation. An intermidiate representation format of Ascend model. | - AIR: Ascend Intermidiate Representation. An intermidiate representation format of Ascend model. | ||||
| @@ -507,7 +507,7 @@ def parse_print(print_file_name): | |||||
| Loads Print data from a specified file. | Loads Print data from a specified file. | ||||
| Args: | Args: | ||||
| print_file_name (str): The file name of save print data. | |||||
| print_file_name (str): The file name of saved print data. | |||||
| Returns: | Returns: | ||||
| List, element of list is Tensor. | List, element of list is Tensor. | ||||
| @@ -64,29 +64,29 @@ class SummaryRecord: | |||||
| SummaryRecord is used to record the summary data and lineage data. | SummaryRecord is used to record the summary data and lineage data. | ||||
| The API will create a summary file and lineage files lazily in a given directory and writes data to them. | The API will create a summary file and lineage files lazily in a given directory and writes data to them. | ||||
| It writes the data to files by executing the 'record' method. In addition to record the data bubbled up from | |||||
| It writes the data to files by executing the 'record' method. In addition to recording the data bubbled up from | |||||
| the network by defining the summary operators, SummaryRecord also supports to record extra data which | the network by defining the summary operators, SummaryRecord also supports to record extra data which | ||||
| can be added by calling add_value. | can be added by calling add_value. | ||||
| Note: | Note: | ||||
| 1. Make sure to close the SummaryRecord at the end, or the process will not exit. | |||||
| Please see the Example section below on how to properly close with two ways. | |||||
| 2. The SummaryRecord instance can only allow one at a time, otherwise it will cause problems with data writes. | |||||
| 1. Make sure to close the SummaryRecord at the end, otherwise the process will not exit. | |||||
| Please see the Example section below to learn how to close properly in two ways. | |||||
| 2. Only one SummaryRecord instance is allowed at a time, otherwise it will cause data writing problems. | |||||
| Args: | Args: | ||||
| log_dir (str): The log_dir is a directory location to save the summary. | log_dir (str): The log_dir is a directory location to save the summary. | ||||
| queue_max_size (int): Deprecated. The capacity of event queue.(reserved). Default: 0. | queue_max_size (int): Deprecated. The capacity of event queue.(reserved). Default: 0. | ||||
| flush_time (int): Deprecated. Frequency to flush the summaries to disk, the unit is second. Default: 120. | |||||
| flush_time (int): Deprecated. Frequency of flush the summary file to disk. The unit is second. Default: 120. | |||||
| file_prefix (str): The prefix of file. Default: "events". | file_prefix (str): The prefix of file. Default: "events". | ||||
| file_suffix (str): The suffix of file. Default: "_MS". | file_suffix (str): The suffix of file. Default: "_MS". | ||||
| network (Cell): Obtain a pipeline through network for saving graph summary. Default: None. | network (Cell): Obtain a pipeline through network for saving graph summary. Default: None. | ||||
| max_file_size (Optional[int]): The maximum size in bytes each file can be written to the disk. \ | |||||
| max_file_size (Optional[int]): The maximum size of each file that can be written to disk (in bytes). \ | |||||
| Unlimited by default. For example, to write not larger than 4GB, specify `max_file_size=4 * 1024**3`. | Unlimited by default. For example, to write not larger than 4GB, specify `max_file_size=4 * 1024**3`. | ||||
| Raises: | Raises: | ||||
| TypeError: If `max_file_size`, `queue_max_size` or `flush_time` is not int, \ | |||||
| or `file_prefix` and `file_suffix` is not str. | |||||
| RuntimeError: If the log_dir can not be resolved to a canonicalized absolute pathname. | |||||
| TypeError: If the data type of `max_file_size`, `queue_max_size` or `flush_time` is not int, \ | |||||
| or the data type of `file_prefix` and `file_suffix` is not str. | |||||
| RuntimeError: If the log_dir is not a normalized absolute path name. | |||||
| Examples: | Examples: | ||||
| >>> # use in with statement to auto close | >>> # use in with statement to auto close | ||||
| @@ -171,10 +171,10 @@ class SummaryRecord: | |||||
| def set_mode(self, mode): | def set_mode(self, mode): | ||||
| """ | """ | ||||
| Set the mode for the recorder to be aware. The mode is set 'train' by default. | |||||
| Set the mode for the recorder to be aware. The mode is set to 'train' by default. | |||||
| Args: | Args: | ||||
| mode (str): The mode to set, which should be 'train' or 'eval'. | |||||
| mode (str): The mode to be set, which should be 'train' or 'eval'. | |||||
| Raises: | Raises: | ||||
| ValueError: When the mode is not recognized. | ValueError: When the mode is not recognized. | ||||
| @@ -190,29 +190,30 @@ class SummaryRecord: | |||||
| def add_value(self, plugin, name, value): | def add_value(self, plugin, name, value): | ||||
| """ | """ | ||||
| Add value to be record later on. | |||||
| Add value to be recorded later. | |||||
| When the plugin is 'tensor', 'scalar', 'image' or 'histogram', | When the plugin is 'tensor', 'scalar', 'image' or 'histogram', | ||||
| the name should be the tag name, and the value should be a Tensor. | the name should be the tag name, and the value should be a Tensor. | ||||
| When the plugin plugin is 'graph', the value should be a GraphProto. | |||||
| When the plugin is 'graph', the value should be a GraphProto. | |||||
| When the plugin 'dataset_graph', 'train_lineage', 'eval_lineage', | |||||
| When the plugin is 'dataset_graph', 'train_lineage', 'eval_lineage', | |||||
| or 'custom_lineage_data', the value should be a proto message. | or 'custom_lineage_data', the value should be a proto message. | ||||
| Args: | Args: | ||||
| plugin (str): The plugin for the value. | |||||
| name (str): The name for the value. | |||||
| plugin (str): The value of the plugin. | |||||
| name (str): The value of the name. | |||||
| value (Union[Tensor, GraphProto, TrainLineage, EvaluationLineage, DatasetGraph, UserDefinedInfo]): \ | value (Union[Tensor, GraphProto, TrainLineage, EvaluationLineage, DatasetGraph, UserDefinedInfo]): \ | ||||
| The value to store. | The value to store. | ||||
| - GraphProto: The 'value' should be a serialized string this type when the plugin is 'graph'. | |||||
| - Tensor: The 'value' should be this type when the plugin is 'scalar', 'image', 'tensor' or 'histogram'. | |||||
| - TrainLineage: The 'value' should be this type when the plugin is 'train_lineage'. | |||||
| - EvaluationLineage: The 'value' should be this type when the plugin is 'eval_lineage'. | |||||
| - DatasetGraph: The 'value' should be this type when the plugin is 'dataset_graph'. | |||||
| - UserDefinedInfo: The 'value' should be this type when the plugin is 'custom_lineage_data'. | |||||
| - The data type of value should be 'GraphProto' when the plugin is 'graph'. | |||||
| - The data type of value should be 'Tensor' when the plugin is 'scalar', 'image', 'tensor' | |||||
| or 'histogram'. | |||||
| - The data type of value should be 'TrainLineage' when the plugin is 'train_lineage'. | |||||
| - The data type of value should be 'EvaluationLineage' when the plugin is 'eval_lineage'. | |||||
| - The data type of value should be 'DatasetGraph' when the plugin is 'dataset_graph'. | |||||
| - The data type of value should be 'UserDefinedInfo' when the plugin is 'custom_lineage_data'. | |||||
| Raises: | Raises: | ||||
| ValueError: When the name is not valid. | ValueError: When the name is not valid. | ||||
| @@ -248,9 +249,9 @@ class SummaryRecord: | |||||
| Args: | Args: | ||||
| step (int): Represents training step number. | step (int): Represents training step number. | ||||
| train_network (Cell): The network that called the callback. | |||||
| train_network (Cell): The network to call the callback. | |||||
| plugin_filter (Optional[Callable[[str], bool]]): The filter function, \ | plugin_filter (Optional[Callable[[str], bool]]): The filter function, \ | ||||
| which is used to filter out plugins from being written by return False. | |||||
| which is used to filter out plugins from being written by returning False. | |||||
| Returns: | Returns: | ||||
| bool, whether the record process is successful or not. | bool, whether the record process is successful or not. | ||||
| @@ -342,7 +343,7 @@ class SummaryRecord: | |||||
| def close(self): | def close(self): | ||||
| """ | """ | ||||
| Flush all events and close summary records. Please use with statement to autoclose. | |||||
| Flush all events and close summary records. Please use the statement to autoclose. | |||||
| Examples: | Examples: | ||||
| >>> try: | >>> try: | ||||