WO2021128293A1 - 模型训练方法、装置、存储介质和程序产品 - Google Patents

模型训练方法、装置、存储介质和程序产品 Download PDF

Info

Publication number
WO2021128293A1
WO2021128293A1 PCT/CN2019/129265 CN2019129265W WO2021128293A1 WO 2021128293 A1 WO2021128293 A1 WO 2021128293A1 CN 2019129265 W CN2019129265 W CN 2019129265W WO 2021128293 A1 WO2021128293 A1 WO 2021128293A1
Authority
WO
WIPO (PCT)
Prior art keywords
value
network layer
training
activation
neural network
Prior art date
Application number
PCT/CN2019/129265
Other languages
English (en)
French (fr)
Inventor
李慧霞
纪荣嵘
吕宏亮
杨帆
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to PCT/CN2019/129265 priority Critical patent/WO2021128293A1/zh
Priority to CN201980102629.8A priority patent/CN114730367A/zh
Publication of WO2021128293A1 publication Critical patent/WO2021128293A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks

Definitions

  • This application relates to the field of data processing technology, in particular to a model training method, device, storage medium and program product.
  • the neural network model is a network system formed by a large number of simple processing units (called neurons) widely connected to each other. It can be applied to image classification, image detection, and single image super resolution (SISR) Tasks and other scenes.
  • the training process of the neural network model can include a forward propagation process and a back propagation process.
  • the sample data is input into the neural network model, and the sample data is processed according to the weights in the neural network model to obtain output data.
  • the weight value in the neural network model is adjusted according to the loss value between the output data and the sample label.
  • the intermediate result in the processing of the neural network model can be called the activation value.
  • the activation values in the neural network model generally use high-precision data formats. In order to reduce the storage space occupied by the neural network model, reduce the hardware bandwidth and cache occupation of the neural network model in the calculation process, and improve the operation efficiency of the neural network, the activation value is often quantified during the forward propagation process.
  • This application provides a model training method, device, storage medium, and program product, which can solve the problem of poor performance of neural network models trained in related technologies.
  • the technical solution is as follows:
  • a model training method is provided.
  • training samples are used to train the neural network model for multiple iterations.
  • the operation of one iterative training in multiple iterative training can be: in the process of forward propagation, the sample data in the training sample is processed according to the weight value in the neural network model and the current cut-off value of the network layer to obtain the output Data, where the cut-off value of the network layer is used to quantify the activation value of the network layer; in the back propagation process, the neural network is adjusted according to the loss value between the output data and the sample label in the training sample The weight in the model, and adjust the cut-off value of the network layer according to the loss value, the current cut-off value and activation value of the network layer.
  • the cutoff value in the neural network model is obtained through training, that is, the upper and lower limits when quantizing the activation value can be adaptively adjusted during the model training process, thereby reducing the quantization error and improving the neural network model Performance.
  • the training samples can be set in advance, and the training samples can include sample data and sample labels.
  • the training sample may include an image (sample data) and a label of the image (sample label), and the label of the image may be the type and identity of the object contained in the image; or, the training sample may include low resolution A (low resolution, LR) image (sample data) and a high resolution (HR) image (sample label) corresponding to the LR image.
  • LR low resolution A
  • HR high resolution
  • the network layer may include m parts, each part may share a cutoff value, and m is a positive integer.
  • m 1
  • the network layer shares a cutoff value, that is, all activation values in the network layer are quantified according to this cutoff value;
  • m is an integer greater than or equal to 2
  • the network layer includes multiple parts , Each part shares a cutoff value, that is, the activation value of each part is quantified according to the corresponding cutoff value.
  • the network layer includes m parts means that the input of the network layer can be defined as m parts according to the number of output neurons or the number of output channels of the network layer. Specifically, when the network layer has m output neurons or m output channels, the input of the network layer can be divided into m parts corresponding to the m output neurons or m output channels one-to-one.
  • the m parts of the network layer are m groups of input neurons corresponding to the m output neurons of the network layer, or the m parts of the network layer are related to the m output neurons of the network layer.
  • the channels correspond to m groups of input channels one by one.
  • the operation of adjusting the cutoff value of the network layer may be: determining the first adjustment degree according to the loss value and the inverse quantization value of the network layer; according to The size relationship between the current cut-off value and the activation value of the network layer determines the second adjustment degree; the first adjustment degree is multiplied by the second adjustment degree to obtain the target adjustment degree; the current cut-off value of the network layer is subtracted The product of the learning rate and the target adjustment degree is the adjusted cutoff value of the network layer.
  • the key to the operation of adjusting the cutoff value of the network layer according to the loss value is to obtain the partial derivative of the loss function of the neural network model with respect to the cutoff value (referred to as the target adjustment degree in this application).
  • the partial derivative of the loss function with respect to the cutoff value is obtained according to the loss value, the current cutoff value and activation value of the network layer.
  • the partial derivative of the loss function with respect to the cutoff value is defined as: the partial derivative of the loss function with respect to the inverse quantization value of the network layer (referred to as the first adjustment degree in this application) and the quantization function of the network layer with respect to the The product of the partial derivative of the cutoff value of the network layer (referred to as the second adjustment degree in this application).
  • the partial derivative of the quantization function with respect to the cutoff value is actually approximated to the partial derivative of the cutoff function with respect to the cutoff value.
  • the partial derivative of the cutoff function with respect to the cutoff value depends on the magnitude relationship between the current cutoff value of the network layer and the activation value of the network layer.
  • the operation of determining the second adjustment degree may be: when the activation value of the network layer is less than or equal to the current cutoff value of the network layer When the value is the inverse of the value, the second adjustment degree is determined to be -1; when the activation value of the network layer is greater than the inverse number of the current cutoff value of the network layer and less than the current cutoff value of the network layer, the second adjustment degree is determined to be 0; When the activation value of the network layer is greater than or equal to the current cutoff value of the network layer, the second adjustment degree is determined to be 1.
  • the second adjustment degree is determined to be -1; when the activation value of the network layer is greater than or equal to the network layer When the inverse number of the current cutoff value of the layer is less than or equal to the current cutoff value of the network layer, the second adjustment degree is determined to be 0; when the activation value of the network layer is greater than the current cutoff value of the network layer, the second adjustment is determined
  • the degree is 1. Or other similar conditional segmentation methods, no more details.
  • the cutoff value in the neural network model may be initialized first. That is, before using the training samples to train the neural network model for multiple iterations, the cutoff value in the neural network model can be initialized.
  • the operation of initializing the cutoff value in the neural network model may be: using the training sample to train the neural network model for t iterations, and then training the neural network model according to the t iterations of the m parts of the network layer.
  • the activation value determines the initial cutoff value of the network layer.
  • t can be set in advance, and t can be a positive integer.
  • the cutoff value is initialized according to the statistical characteristics of the activation value in the neural network model, so that the stability of the model can be improved and the convergence can be accelerated.
  • the operation of determining the initial cutoff value of the network layer according to the activation values of the m parts of the network layer in the t iteration training may be: in the first iteration training of the t iteration training, obtaining the The maximum activation value among the activation values of each of the m parts of the network layer, and the average value of the obtained m maximum activation values is used as the first cutoff value; the i-th iteration in the t-iteration training During training, obtain the maximum activation value among the activation values of each part of the m parts of the network layer, and perform a weighted average of the average value of the obtained m maximum activation values and the i-1th cutoff value to obtain the first i cut-off values, i is an integer greater than or equal to 2 and less than or equal to t; the t-th cut-off value is used as the initial cut-off value corresponding to each of the m parts of the network layer.
  • a model training device in a second aspect, is provided, and the model training device has the function of realizing the behavior of the model training method in the first aspect.
  • the model training device includes at least one module, and the at least one module is used to implement the model training method provided in the above-mentioned first aspect.
  • a model training device in a third aspect, includes a processor and a memory, and the memory is used to store a program that supports the model training device to execute the model training method provided in the first aspect. And storing the data involved in implementing the model training method described in the first aspect.
  • the processor is configured to execute a program stored in the memory.
  • the model training device may further include a communication bus for establishing a connection between the processor and the memory.
  • a computer-readable storage medium is provided, and instructions are stored in the computer-readable storage medium, which when run on a computer, cause the computer to execute the model training method described in the first aspect.
  • a computer program product containing instructions which when running on a computer, causes the computer to execute the model training method described in the first aspect.
  • the cutoff value in the neural network model in this application is obtained through training, that is, the upper and lower limits when quantizing the activation value can be adaptively adjusted during the model training process, thereby reducing the quantization error and improving the final training result.
  • the performance of the neural network model is obtained through training, that is, the upper and lower limits when quantizing the activation value can be adaptively adjusted during the model training process, thereby reducing the quantization error and improving the final training result.
  • Fig. 1 is a schematic structural diagram of a computer device provided by an embodiment of the present application.
  • Fig. 2 is a flowchart of a model training method provided by an embodiment of the present application
  • FIG. 3 is a flowchart of an iterative training operation provided by an embodiment of the present application.
  • Fig. 4 is a schematic structural diagram of a model training device provided by an embodiment of the present application.
  • Fig. 1 is a schematic structural diagram of a computer device provided by an embodiment of the present application.
  • the computer device includes at least one processor 101, a communication bus 102, a memory 103, and at least one communication interface 104.
  • the processor 101 may be a microprocessor (including a central processing unit (CPU), etc.), an application-specific integrated circuit (ASIC), or may be one or more for controlling the solution of the application Integrated circuit for program execution.
  • a microprocessor including a central processing unit (CPU), etc.
  • ASIC application-specific integrated circuit
  • the communication bus 102 may include a path for transferring information between the aforementioned components.
  • the memory 103 can be read-only memory (ROM), random access memory (RAM), electrically erasable programmable read-only memory (read-only memory, EEPROM), optical disk ( Including read-only discs (compact disc read-only memory, CD-ROM), compact discs, laser discs, digital versatile discs, Blu-ray discs, etc.), magnetic disk storage media or other magnetic storage devices, or can be used to carry or store The desired program code in the form of instructions or data structures and any other medium that can be accessed by the computer, but not limited to this.
  • the memory 103 may exist independently and is connected to the processor 101 through the communication bus 102.
  • the memory 103 may also be integrated with the processor 101.
  • the communication interface 104 uses any device such as a transceiver to communicate with other devices or communication networks, such as Ethernet, radio access network (RAN), wireless local area network (WLAN), and so on.
  • RAN radio access network
  • WLAN wireless local area network
  • the processor 101 may include one or more CPUs, such as CPU0 and CPU1 as shown in FIG. 1.
  • the computer device may include multiple processors, such as the processor 101 and the processor 105 as shown in FIG. 1. Each of these processors can be a single-core processor or a multi-core processor.
  • the processor here may refer to one or more devices, circuits, and/or processing cores for processing data (such as computer program instructions).
  • the computer device may further include an output device 106 and an input device 107.
  • the output device 106 communicates with the processor 101 and can display information in a variety of ways.
  • the output device 106 may be a liquid crystal display (LCD), a light emitting diode (LED) display device, a cathode ray tube (CRT) display device, or a projector, etc.
  • the input device 107 communicates with the processor 101 and can receive user input in a variety of ways.
  • the input device 107 may be a mouse, a keyboard, a touch screen device, a sensor device, or the like.
  • the above-mentioned computer equipment may be a general-purpose computer equipment or a special-purpose computer equipment.
  • the computer device may be a desktop computer, a portable computer, a network server, a palmtop computer, a mobile phone, a tablet computer, a wireless terminal device, a communication device, or an embedded device.
  • the embodiment of the application does not limit the type of the computer device.
  • the memory 103 is used to store the program code 110 for executing the solution of the present application, and the processor 101 is used to execute the program code 110 stored in the memory 103.
  • the computer device can implement the model training method provided in the embodiment of FIG. 2 below through the processor 101 and the program code 110 in the memory 103.
  • Fig. 2 is a flowchart of a model training method provided by an embodiment of the present application. Referring to Figure 2, the method includes:
  • Step 201 Use training samples to train the neural network model for multiple iterations.
  • training samples can be set in advance, and the training samples can include sample data and sample labels.
  • the neural network model can be a network system formed by a large number of simple processing units (called neurons) widely connected to each other.
  • the neural network model may include multiple network layers, and the multiple network layers include an input layer, a hidden layer, and an output layer.
  • the input layer is responsible for receiving sample data; the output layer is responsible for outputting the processed data; the hidden layer is located between the input layer and the output layer and is responsible for processing data, and the hidden layer is invisible to the outside.
  • the neural network model may be a deep neural network, etc., and may be a convolutional neural network in a deep neural network, or the like.
  • the neural network model trained in the embodiments of this application can be applied to various scenarios, for example, it can be applied to scenarios such as image classification, image detection, and SISR tasks.
  • the goal of the SISR task is to reconstruct the corresponding HR image from the LR image.
  • the training sample When applied to an image classification scene or an image detection scene, the training sample may include an image (sample data) and a mark of the image (sample mark), and the mark of the image may be the type, identity, etc. of the object contained in the image .
  • the training sample When applied to a SISR task scenario, the training sample may include an LR image (sample data) and an HR image (sample label) corresponding to the LR image.
  • each iteration training in the multiple iteration training may at least include a forward propagation process, and sample data may be processed during the forward propagation process to obtain output data.
  • the iterative training is ended to obtain a neural network model that meets the requirements; if the neural network model is If the loss value between the output data and the sample label exceeds the specified range, the back-propagation process is continued to adjust the parameters in the neural network model. After the back-propagation process is completed, the next iterative training can be continued.
  • the operation of one iteration training in the multiple iteration training may include the following steps 2011-2014.
  • Step 2011 In the forward propagation process, the sample data in the training sample is processed according to the weight value in the neural network model and the current cutoff value of the network layer to obtain output data.
  • cutoff value of the network layer is used to quantify the activation value of the network layer.
  • the intermediate result in the processing of the neural network model can be called the activation value.
  • the sample data can be directly used as the activation value of the input layer; for any network layer except the output layer, the activation value of this network layer can be processed , Get the activation value of the next network layer.
  • the activation value in the neural network model generally adopts a high-precision data format (such as FP32, etc., which is a data representation format standard defined by IEEE 754).
  • FP32 a data representation format standard defined by IEEE 754
  • the activation value is often quantified during the forward propagation process.
  • a possible quantization technique is to use cut-off symmetric linear quantization, which can be implemented according to the following quantization function:
  • x is the activation value
  • n is the number of quantization bits, which can be set in advance
  • a is the cutoff value, a is a positive number
  • s(n) is quantization unit, or ⁇ > means rounding to the nearest integer.
  • the activation value is quantified during the forward propagation process, which may specifically be: for a network layer other than the output layer in the neural network model, according to the current network layer
  • the network layer may include m parts, each part may share a cutoff value, and m is a positive integer.
  • m 1
  • the network layer shares a cutoff value, that is, all activation values in the network layer are quantified according to this cutoff value;
  • m is an integer greater than or equal to 2
  • the network layer includes multiple parts , Each part shares a cutoff value, that is, the activation value of each part is quantified according to the corresponding cutoff value.
  • the network layer includes m parts means that the input of the network layer can be defined as m parts according to the number of output neurons or the number of output channels of the network layer. Specifically, when the network layer has m output neurons or m output channels, the input of the network layer can be divided into m parts corresponding to the m output neurons or m output channels one-to-one.
  • the m parts of the network layer are m groups of input neurons corresponding to the m output neurons of the network layer, or the m parts of the network layer are related to the m output neurons of the network layer.
  • the channels correspond to m groups of input channels one by one. Wherein, each group of input neurons may include one or more input neurons, and each group of input channels may include one or more input channels.
  • the activation value of the network layer is quantified according to the current cutoff value of the network layer, and the operation of obtaining the quantized value of the network layer can be implemented according to the quantization function of the network layer, and the number of quantization bits in the quantization function of the network layer And the quantization unit has been preset.
  • the current cutoff value and activation value of the network layer can be substituted into , Get the quantized value of the network layer.
  • the operation of processing the quantized value of the network layer to obtain the processed quantized value may vary according to the type of the network layer. For specific operations, reference may be made to related technologies. The embodiments of this application will not describe this in detail. Elaboration. For example, when the network layer has a weight and an activation function, the quantized value of the network layer can be processed first according to the weight in the network layer to obtain the first processing result, and then the activation function in the network layer The first processing result is processed, and the second processing result is obtained as the processed quantized value.
  • the processed quantized value is inversely quantized, and the operation of obtaining the inverse quantized value of the network layer can be implemented according to the quantization function of the network layer, and the quantization bit number and quantization unit in the quantization function of the network layer have been preset. Specifically, the processed quantized value can be multiplied by s(n) to obtain the inverse quantized value of the network layer.
  • Step 2012 Determine whether the loss value between the output data and the sample label in the training sample exceeds a prescribed range. If not, perform the following step 2013; if yes, perform the following step 2014.
  • Step 2013 End iterative training and obtain a neural network model that meets the requirements.
  • Step 2014 In the back propagation process, adjust the weights in the neural network model according to the loss value between the output data and the sample label in the training sample, and according to the loss value and the current cutoff of the network layer Value and activation value, adjust the cutoff value of the network layer.
  • step 2011 may be returned to perform the next iteration training.
  • the cutoff value in the neural network model is kept unchanged, and only the weight value in the neural network model is adjusted.
  • the cutoff value in the neural network model can also be adjusted. In this way, the cutoff value in the neural network model is obtained through training, that is, the upper and lower limits when quantizing the activation value can be adaptively adjusted during the model training process, thereby reducing the quantization error and improving the performance of the neural network model .
  • both the weight value and the cutoff value in the neural network model can be referred to as parameters in the neural network model. That is, the embodiment of the present application actually adjusts the parameters in the neural network model according to the loss value between the output data of the neural network model and the sample label of the training sample.
  • the loss value between the output data and the sample label of the training sample can be obtained through the loss function of the neural network model.
  • the loss function may be a general loss function, such as a cross entropy loss function, a mean square error loss function, and so on.
  • the loss function may be a regularized loss function, and the regularized loss function is the sum of a general loss function and a regular function.
  • the operation of adjusting the weight value in the neural network model can refer to related technologies, which will not be described in detail in the embodiment of the present application.
  • the partial derivative of the loss function of the neural network model with respect to this weight can be obtained according to the loss value and this weight; this weight is subtracted from the learning rate and this The product of the partial derivatives of the weights gives the adjusted weights.
  • the learning rate can be set in advance.
  • the learning rate can be 0.001, 0.000001, and so on.
  • the key to the operation of adjusting the cutoff value of the network layer according to the loss value is to obtain the partial derivative of the loss function of the neural network model with respect to the cutoff value (referred to as the target adjustment degree in the embodiment of the present application).
  • the partial derivative of the loss function with respect to the cut-off value is obtained according to the loss value, the current cut-off value and the activation value of the network layer.
  • the partial derivative of the loss function with respect to the cutoff value is defined as: the partial derivative of the loss function with respect to the inverse quantization value of the network layer (referred to as the first adjustment degree in the embodiment of this application) and the quantization function of the network layer
  • the product of the partial derivative of the cut-off value of the network layer referred to as the second degree of adjustment in the embodiment of the present application).
  • the operation of adjusting the cutoff value of the network layer may be: determining the first adjustment degree according to the loss value and the inverse quantization value of the network layer; Determine the second adjustment degree according to the size relationship between the current cut-off value and the activation value of the network layer; multiply the first adjustment degree and the second adjustment degree to obtain the target adjustment degree; subtract the current cut-off value of the network layer Get the adjusted cutoff value of the network layer by removing the product of the learning rate and the target adjustment degree.
  • the learning rate may be set in advance, and the learning rate may be the same as the learning rate when adjusting the weights in the neural network model, or may be different from the learning rate when adjusting the weights in the neural network model.
  • the learning rate can be 0.001, 0.000001, and so on.
  • obtaining the partial derivative of the loss function with respect to the inverse quantization value of the network layer is to determine the first adjustment degree according to the loss value and the inverse quantization value of the network layer.
  • the partial derivative of the loss function with respect to the inverse quantization value is obtained as the first adjustment degree.
  • the partial derivative with respect to a is taken as the partial derivative of x q with respect to a.
  • the partial derivative with respect to a depends on the magnitude relationship between a (the current cut-off value of the network layer) and x (the activation value of the network layer).
  • obtaining the partial derivative of the quantization function with respect to the cutoff value is to determine the second adjustment degree according to the magnitude relationship between the current cutoff value and the activation value of the network layer.
  • the second adjustment degree is determined to be -1; when the activation value of the network layer is greater than the current cutoff value of the network layer When the opposite number and less than the current cutoff value of the network layer, the second adjustment degree is determined to be 0; when the activation value of the network layer is greater than or equal to the current cutoff value of the network layer, the second adjustment degree is determined to be 1.
  • the second adjustment degree is determined to be -1; when the activation value of the network layer is greater than or equal to the network layer When the inverse number of the current cutoff value of the layer is less than or equal to the current cutoff value of the network layer, the second adjustment degree is determined to be 0; when the activation value of the network layer is greater than the current cutoff value of the network layer, the second adjustment is determined
  • the degree is 1. Or other similar conditional segmentation methods, no more details.
  • the loss value and the inverse quantization value of this part can be used to determine this
  • the first adjustment degree corresponding to the part; for any activation value among all the activation values of this part, the second adjustment degree corresponding to this activation value is determined according to the size relationship between the current cut-off value corresponding to this part and the activation value ; Take the average of the second adjustment degree corresponding to all activation values of this part as the second adjustment degree corresponding to this part; take the product of the first adjustment degree and the second adjustment degree corresponding to this part as the target adjustment for this part Degree; subtract the product of the learning rate and the target adjustment degree corresponding to this part from the current cut-off value corresponding to this part to obtain the adjusted cut-off value corresponding to this part.
  • the cutoff value in the neural network model may be initialized first. That is, before step 201, the cutoff value in the neural network model can be initialized.
  • the operation of initializing the cutoff value in the neural network model may be: using the training sample to train the neural network model for t iterations, and then training the neural network model according to the t iterations of the m parts of the network layer.
  • the activation value determines the initial cutoff value of the network layer.
  • t can be set in advance, and t can be a positive integer.
  • the cutoff value is initialized according to the statistical characteristics of the activation value in the neural network model, so that the stability of the model can be improved and the convergence can be accelerated.
  • each iteration training in the t iteration training may be: in the forward propagation process, processing the sample data in the training sample according to the weight in the neural network model to obtain the output data; In the back propagation process, the weight value in the neural network model is adjusted according to the loss value between the output data and the sample label in the training sample.
  • the operation of determining the initial cutoff value of the network layer according to the activation values of the m parts of the network layer in the t iteration training may be: in the first iteration training of the t iteration training, obtaining the The maximum activation value among the activation values of each of the m parts of the network layer, and the average value of the obtained m maximum activation values is used as the first cutoff value; the i-th iteration in the t-iteration training During training, obtain the maximum activation value among the activation values of each part of the m parts of the network layer, and perform a weighted average of the average value of the obtained m maximum activation values and the i-1th cutoff value to obtain the first i cut-off values, i is an integer greater than or equal to 2 and less than or equal to t; the t-th cut-off value is used as the initial cut-off value corresponding to each of the m parts of the network layer.
  • the weight of the average value of the m maximum activation values and the weight of the i-1th cutoff value can be preset, and the sum of these two weights is 1.
  • the weight of the i-1th cutoff value can be set to 0.9997.
  • training samples are used to train the neural network model for multiple iterations.
  • the sample data in the training sample is processed according to the weight value in the neural network model and the current cutoff value of the network layer to obtain output data.
  • the back propagation process according to the loss value between the output data and the sample label in the training sample, adjust the weight value in the neural network model, and adjust according to the loss value, the current cut-off value and activation value of the network layer The cutoff value of this network layer.
  • the cutoff value in the neural network model is obtained through training, that is, the upper and lower limits when quantizing the activation value can be adaptively adjusted during the model training process, thereby reducing the quantization error and improving the final training nerve The performance of the network model.
  • the neural network model obtained by the training can be applied, for example, the neural network model can be used for image classification, image detection, SISR tasks, etc. Among them, the weights and cutoffs in the neural network model are all obtained by training.
  • the low-resolution image to be reconstructed can be input into the neural network model to obtain the corresponding high-resolution image. Since the cutoff value in the neural network model is obtained through training, the neural network model has a smaller quantization error and better performance, so the high-resolution image reconstructed by the neural network model has a higher quality.
  • Fig. 4 is a schematic structural diagram of a model training device provided by an embodiment of the present application.
  • the model training device can be implemented as part or all of computer equipment by software, hardware or a combination of the two.
  • the computer equipment can be as shown in Fig. 1 Computer equipment.
  • the device includes: a training module 401.
  • the first training module 401 is configured to perform step 201 in the embodiment of FIG. 2 above;
  • the first training module 401 includes:
  • the processing unit 4011 is configured to execute step 2011 in the embodiment of FIG. 2 above;
  • the adjustment unit 4012 is configured to perform step 2014 in the embodiment of FIG. 2 above.
  • the adjustment unit 4012 is used to:
  • the current cutoff value of the network layer is subtracted from the product of the learning rate and the target adjustment degree to obtain the adjusted cutoff value of the network layer.
  • the adjustment unit 4012 is used to:
  • the second adjustment degree is determined to be -1;
  • the second adjustment degree is determined to be 0;
  • the second adjustment degree is determined to be 1.
  • the device further includes:
  • the second training module is used to train the neural network model for t iterations using training samples, where t is a positive integer;
  • the determining module is used to determine the initial cutoff value of the network layer according to the activation values of the m parts of the network layer in t iterations of training, where m is a positive integer.
  • the m parts of the network layer correspond to m groups of input neurons one-to-one with the m output neurons of the network layer, or the m parts of the network layer correspond to the m output channels of the network layer one-to-one. M groups of input channels.
  • the sample data is a low-resolution image
  • the sample is marked as a high-resolution image corresponding to the low-resolution image.
  • training samples are used to train the neural network model for multiple iterations.
  • the sample data in the training sample is processed according to the weight value in the neural network model and the current cutoff value of the network layer to obtain output data.
  • the back propagation process according to the loss value between the output data and the sample label in the training sample, adjust the weight value in the neural network model, and adjust according to the loss value, the current cut-off value and activation value of the network layer The cutoff value of this network layer.
  • the cutoff value in the neural network model is obtained through training, that is, the upper and lower limits when quantizing the activation value can be adaptively adjusted during the model training process, thereby reducing the quantization error and improving the final trained nerve The performance of the network model.
  • model training device provided in the above embodiment only uses the division of the above functional modules as an example.
  • the above functions can be allocated by different functional modules according to needs, i.e.
  • the internal structure of the device is divided into different functional modules to complete all or part of the functions described above.
  • model training device provided in the foregoing embodiment and the model training method embodiment belong to the same concept, and the specific implementation process is detailed in the method embodiment, which will not be repeated here.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium.
  • the computer instructions may be transmitted from a website, computer, server, or data center.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer, or a data storage device such as a server or a data center integrated with one or more available media.
  • the usable medium may be a magnetic medium (for example: floppy disk, hard disk, tape), optical medium (for example: Digital Versatile Disc (DVD)) or semiconductor medium (for example: Solid State Disk (SSD)) Wait.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

本申请公开了一种模型训练方法、装置、存储介质和程序产品,属于数据处理技术领域。该方法包括:使用训练样本对神经网络模型进行多次迭代训练。其中,该多次迭代训练中的一次迭代训练,包括:在前向传播过程中,根据神经网络模型中的权值和网络层当前的截断值对样本数据进行处理,得到输出数据;在反向传播过程中,根据该输出数据与样本标记之间的损失值,调整神经网络模型中的权值,以及根据该损失值、该网络层当前的截断值和激活值,调整该网络层的截断值。本申请中神经网络模型中的截断值是通过训练得到的,即可以在模型训练过程中自适应地调整对激活值进行量化时的上下限,从而减小了量化误差,提高了神经网络模型的性能。

Description

模型训练方法、装置、存储介质和程序产品 技术领域
本申请涉及数据处理技术领域,特别涉及一种模型训练方法、装置、存储介质和程序产品。
背景技术
神经网络模型是由大量的、简单的处理单元(称为神经元)广泛地互相连接而形成的网络系统,可以应用于图像分类、图像检测、单图超分辨率(single image super resolution,SISR)任务等场景中。神经网络模型的训练过程可以包括前向传播过程和反向传播过程。
在前向传播过程中,将样本数据输入到神经网络模型中,根据神经网络模型中的权值对该样本数据进行处理,得到输出数据。在反向传播过程中,根据该输出数据与样本标记之间的损失值,对神经网络模型中的权值进行调整。
神经网络模型的处理过程中的中间结果可以称为激活值。神经网络模型中的激活值一般均采用高精度数据格式。为了降低神经网络模型所占用的存储空间,以及降低神经网络模型在运算过程中对硬件带宽、缓存的占用,提升神经网络运行效率,往往会在前向传播过程中对激活值采用量化技术。
目前,在对神经网络模型中的激活值进行量化时,先在模型训练之前,为神经网络模型中的网络层设置一个固定的截断值,然后在模型训练过程中,根据该网络层的截断值对该网络层的激活值进行量化。然而,由于样本数据的不确定性会带来神经网络模型中的激活值的不确定,所以对激活值的量化可能会带来较大的量化误差,从而影响最终训练得到的神经网络模型的性能。
发明内容
本申请提供了一种模型训练方法、装置、存储介质和程序产品,可以解决相关技术中训练得到的神经网络模型的性能较差的问题。所述技术方案如下:
第一方面,提供了一种模型训练方法。在该方法中,使用训练样本对神经网络模型进行多次迭代训练。其中,多次迭代训练中的一次迭代训练的操作可以为:在前向传播过程中,根据神经网络模型中的权值和网络层当前的截断值对训练样本中的样本数据进行处理,得到输出数据,其中,该网络层的截断值用于对该网络层的激活值进行量化;在反向传播过程中,根据该输出数据与该训练样本中的样本标记之间的损失值,调整神经网络模型中的权值,以及根据该损失值、该网络层当前的截断值和激活值,调整该网络层的截断值。
本申请中,神经网络模型中的截断值是通过训练得到的,即可以实现在模型训练过程中自适应地调整对激活值进行量化时的上下限,进而可以减小量化误差,提高神经网络模型的性能。
需要说明的是,训练样本可以预先进行设置,该训练样本可以包括样本数据和样本标记。 例如,该训练样本可以包括图像(样本数据)和该图像的标记(样本标记),该图像的标记可以是该图像中包含的对象的类型、身份等;或者,该训练样本可以包括低分辨率(low resolution,LR)图像(样本数据)和该LR图像对应的高分辨率(high resolution,HR)图像(样本标记)。
另外,该网络层可以包括m个部分,每个部分可以共享一个截断值,m为正整数。当m为1时,该网络层共享一个截断值,即该网络层中所有的激活值均根据这一个截断值进行量化;当m为大于或等于2的整数时,该网络层包括多个部分,每个部分共享一个截断值,即每个部分的激活值均根据对应的截断值进行量化。
在一种可行的实施方式中,该网络层包括m个部分是指:可以根据该网络层的输出神经元数量或输出通道数量将该网络层的输入定义为m个部分。具体来说,当该网络层具有m个输出神经元或m个输出通道时,可以将该网络层的输入划分为与该m个输出神经元或m个输出通道一一对应的m个部分。换句话说,该网络层的m个部分是与该网络层的m个输出神经元一一对应的m组输入神经元,或者,该网络层的m个部分是与该网络层的m个输出通道一一对应的m组输入通道。
其中,根据该损失值、该网络层当前的截断值和激活值,调整该网络层的截断值的操作可以为:根据该损失值和该网络层的反量化值,确定第一调整度;根据该网络层当前的截断值与激活值之间的大小关系,确定第二调整度;将第一调整度与第二调整度相乘,得到目标调整度;将该网络层当前的截断值减去学习率与目标调整度之积,得到该网络层的调整后的截断值。
需要说明的是,根据该损失值调整该网络层的截断值的操作,关键在于求取该神经网络模型的损失函数关于该截断值的偏导数(本申请中称为目标调整度)。
本申请中是根据该损失值、该网络层当前的截断值和激活值,来求取该损失函数关于该截断值的偏导数。具体是将该损失函数关于该截断值的偏导数定义为:该损失函数关于该网络层的反量化值的偏导数(本申请中称为第一调整度)与该网络层的量化函数关于该网络层的截断值的偏导数(本申请中称为第二调整度)的乘积。
其中,在求取该量化函数关于截断值的偏导数时,本申请中实际上是将量化函数关于截断值的偏导数近似为截断函数关于截断值的偏导数。截断函数关于截断值的偏导数取决于该网络层当前的截断值与该网络层的激活值之间的大小关系。
具体地,根据该网络层当前的截断值与该网络层的激活值之间的大小关系,确定第二调整度的操作可以为:当该网络层的激活值小于或等于该网络层当前的截断值的相反数时,确定第二调整度为-1;当该网络层的激活值大于该网络层当前的截断值的相反数且小于该网络层当前的截断值时,确定第二调整度为0;当该网络层的激活值大于或等于该网络层当前的截断值时,确定第二调整度为1。
应理解,该实施方式也可以为当该网络层的激活值小于该网络层当前的截断值的相反数时,确定第二调整度为-1;当该网络层的激活值大于或等于该网络层当前的截断值的相反数且小于或等于该网络层当前的截断值时,确定第二调整度为0;当该网络层的激活值大于该网络层当前的截断值时,确定第二调整度为1。或者其他类似的条件分段方式,不再赘述。
进一步地,在根据该神经网络模型的损失值对该神经网络模型中的截断值进行调整之前,可以先对该神经网络模型中的截断值进行初始化。也即是,在使用训练样本对神经网络模型进行多次迭代训练之前,可以先对该神经网络模型中的截断值进行初始化。
具体地,对该神经网络模型中的截断值进行初始化的操作可以为:使用该训练样本对该神经网络模型进行t次迭代训练,然后根据该t次迭代训练中该网络层的m个部分的激活值,确定该网络层的初始截断值。其中,t可以预先进行设置,且t可以为正整数。
本申请中,是根据该神经网络模型中的激活值的统计特征来对截断值进行初始化,从而可以提高模型稳定性并加速收敛。
其中,根据该t次迭代训练中该网络层的m个部分的激活值,确定该网络层的初始截断值的操作可以为:在该t次迭代训练中的第1次迭代训练中,获取该网络层的m个部分中每个部分的激活值中的最大激活值,将获取到的m个最大激活值的平均值作为第1个截断值;在该t次迭代训练中的第i次迭代训练中,获取该网络层的m个部分中每个部分的激活值中的最大激活值,将获取到的m个最大激活值的平均值和第i-1个截断值进行加权平均,得到第i个截断值,i为大于或等于2且小于或等于t的整数;将第t个截断值作为该网络层的m个部分中每个部分对应的初始截断值。
第二方面,提供了一种模型训练装置,所述模型训练装置具有实现上述第一方面中模型训练方法行为的功能。所述模型训练装置包括至少一个模块,所述至少一个模块用于实现上述第一方面所提供的模型训练方法。
第三方面,提供了一种模型训练装置,所述模型训练装置的结构中包括处理器和存储器,所述存储器用于存储支持模型训练装置执行上述第一方面所提供的模型训练方法的程序,以及存储用于实现上述第一方面所述的模型训练方法所涉及的数据。所述处理器被配置为用于执行所述存储器中存储的程序。所述模型训练装置还可以包括通信总线,所述通信总线用于在所述处理器与所述存储器之间建立连接。
第四方面,提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,当其在计算机上运行时,使得计算机执行上述第一方面所述的模型训练方法。
第五方面,提供了一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述第一方面所述的模型训练方法。
上述第二方面、第三方面、第四方面和第五方面所获得的技术效果与上述第一方面中对应的技术手段获得的技术效果近似,在这里不再赘述。
本申请提供的技术方案至少可以带来以下有益效果:
本申请中的神经网络模型中的截断值是通过训练得到的,即可以实现在模型训练过程中自适应地调整对激活值进行量化时的上下限,从而可以减小量化误差,提高最终训练得到的神经网络模型的性能。
附图说明
图1是本申请实施例提供的一种计算机设备的结构示意图;
图2是本申请实施例提供的一种模型训练方法的流程图;
图3是本申请实施例提供的一种迭代训练操作的流程图;
图4是本申请实施例提供的一种模型训练装置的结构示意图。
具体实施方式
为使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请的实施方式作进一步地详细描述。
图1是本申请实施例提供的一种计算机设备的结构示意图。参见图1,该计算机设备包括至少一个处理器101、通信总线102、存储器103以及至少一个通信接口104。
处理器101可以是微处理器(包括中央处理器(central processing unit,CPU)等)、特定应用集成电路(application-specific integrated circuit,ASIC),或者可以是一个或多个用于控制本申请方案程序执行的集成电路。
通信总线102可包括一通路,用于在上述组件之间传送信息。
存储器103可以是只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、电可擦可编程只读存储器(electrically erasable programmable read-Only memory,EEPROM)、光盘(包括只读光盘(compact disc read-only memory,CD-ROM)、压缩光盘、激光盘、数字通用光盘、蓝光光盘等)、磁盘存储介质或者其它磁存储设备,或者是能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其它介质,但不限于此。存储器103可以是独立存在,并通过通信总线102与处理器101相连接。存储器103也可以和处理器101集成在一起。
通信接口104使用任何收发器一类的装置,用于与其它设备或通信网络通信,如以太网、无线接入网(radio access network,RAN)、无线局域网(wireless local area network,WLAN)等。
在具体实现中,作为一种实施例,处理器101可以包括一个或多个CPU,如图1中所示的CPU0和CPU1。
在具体实现中,作为一种实施例,计算机设备可以包括多个处理器,如图1中所示的处理器101和处理器105。这些处理器中的每一个可以是一个单核处理器,也可以是一个多核处理器。这里的处理器可以指一个或多个设备、电路、和/或用于处理数据(如计算机程序指令)的处理核。
在具体实现中,作为一种实施例,计算机设备还可以包括输出设备106和输入设备107。输出设备106和处理器101通信,可以以多种方式来显示信息。例如,输出设备106可以是液晶显示器(liquid crystal display,LCD)、发光二级管(light emitting diode,LED)显示设备、阴极射线管(cathode ray tube,CRT)显示设备或投影仪(projector)等。输入设备107和处理器101通信,可以以多种方式接收用户的输入。例如,输入设备107可以是鼠标、键盘、触摸屏设备或传感设备等。
上述的计算机设备可以是一个通用计算机设备或一个专用计算机设备。在具体实现中,计算机设备可以是台式机、便携式电脑、网络服务器、掌上电脑、移动手机、平板电脑、无线终端设备、通信设备或嵌入式设备,本申请实施例不限定计算机设备的类型。
其中,存储器103用于存储执行本申请方案的程序代码110,处理器101用于执行存储 器103中存储的程序代码110。该计算机设备可以通过处理器101以及存储器103中的程序代码110,来实现下文图2实施例提供的模型训练方法。
图2是本申请实施例提供的一种模型训练方法的流程图。参见图2,该方法包括:
步骤201:使用训练样本对神经网络模型进行多次迭代训练。
需要说明的是,训练样本可以预先进行设置,该训练样本可以包括样本数据和样本标记。
另外,神经网络模型可以是由大量的、简单的处理单元(称为神经元)广泛地互相连接而形成的网络系统。该神经网络模型可以包括多个网络层,该多个网络层中包括输入层、隐含层和输出层。输入层负责接收样本数据;输出层负责输出处理后的数据;隐含层位于输入层与输出层之间,负责处理数据,隐含层对于外部是不可见的。例如,该神经网络模型可以为深度神经网络等,且可以是深度神经网络中的卷积神经网络等。
值得说明的是,本申请实施例训练得到的神经网络模型可以应用于各种场景中,如可以应用于图像分类、图像检测、SISR任务等场景中。SISR任务的目标是从LR图像中重构对应的HR图像。
当应用于图像分类场景或图像检测场景中时,该训练样本可以包括图像(样本数据)和该图像的标记(样本标记),该图像的标记可以是该图像中包含的对象的类型、身份等。当应用于SISR任务场景中时,该训练样本可以包括LR图像(样本数据)和该LR图像对应的HR图像(样本标记)。
其中,该多次迭代训练中的每次迭代训练至少可以包括前向传播过程,在前向传播过程中可以对样本数据进行处理,得到输出数据。在完成前向传播过程后,如果神经网络模型本次的输出数据与样本标记之间的损失值不超过规定范围,则结束迭代训练,获得满足要求的神经网络模型;如果神经网络模型本次的输出数据与样本标记之间的损失值超过规定范围,则继续进行反向传播过程来调整神经网络模型中的参数,在完成反向传播过程后,可以继续下一次迭代训练。
具体地,参见图3,该多次迭代训练中的一次迭代训练的操作可以包括如下步骤2011-步骤2014。
步骤2011:在前向传播过程中,根据神经网络模型中的权值和网络层当前的截断值对该训练样本中的样本数据进行处理,得到输出数据。
需要说明的是,该网络层的截断值用于对该网络层的激活值进行量化。
在前向传播过程中,神经网络模型的处理过程中的中间结果可以称为激活值。具体来讲,在前向传播过程中,对于输入层,可以将样本数据直接作为输入层的激活值;对于除输出层之外的任一网络层,可以对这一网络层的激活值进行处理,得到下一网络层的激活值。
神经网络模型中的激活值一般均采用高精度数据格式(如FP32等,FP32是IEEE 754定义的一种数据表示格式标准)。为了降低神经网络模型所占用的存储空间,以及降低神经网络模型在运算过程中对硬件带宽、缓存的占用,提升神经网络运行效率,往往会在前向传播过程中对激活值采用量化技术。
为了获得良好的神经网络加速性能,一种可能的量化技术是采用截断值的对称线性量化,具体可以根据如下量化函数实现:
量化函数:
Figure PCTCN2019129265-appb-000001
其中,x是激活值;n是量化位数,可以预先设置;f(x)是截断函数,f(x)=max(min(x,a),-a),表示将x限制到[-a,a]内,即当x大于a时,将x截断为a,当x小于-a时,将x截断为-a;a是截断值,a是一个正数;s(n)是量化单位,
Figure PCTCN2019129265-appb-000002
Figure PCTCN2019129265-appb-000003
<>表示四舍五入到最近的整数。
值得注意的是,在本申请实施例中,在前向传播过程中对激活值采用量化技术,具体可以是:对于该神经网络模型中除输出层之外的一个网络层,根据该网络层当前的截断值对该网络层的激活值进行量化,得到该网络层的量化值;对该网络层的量化值进行处理,得到处理后的量化值;对处理后的量化值进行反量化,得到该网络层的反量化值来作为下一个网络层的激活值。
需要说明的是,该网络层可以包括m个部分,每个部分可以共享一个截断值,m为正整数。当m为1时,该网络层共享一个截断值,即该网络层中所有的激活值均根据这一个截断值进行量化;当m为大于或等于2的整数时,该网络层包括多个部分,每个部分共享一个截断值,即每个部分的激活值均根据对应的截断值进行量化。
在一种可行的实施方式中,该网络层包括m个部分是指:可以根据该网络层的输出神经元数量或输出通道数量将该网络层的输入定义为m个部分。具体来说,当该网络层具有m个输出神经元或m个输出通道时,可以将该网络层的输入划分为与该m个输出神经元或m个输出通道一一对应的m个部分。换句话说,该网络层的m个部分是与该网络层的m个输出神经元一一对应的m组输入神经元,或者,该网络层的m个部分是与该网络层的m个输出通道一一对应的m组输入通道。其中,每组输入神经元可以包括一个或多个输入神经元,每组输入通道可以包括一个或多个输入通道。
其中,根据该网络层当前的截断值对该网络层的激活值进行量化,得到该网络层的量化值的操作可以根据该网络层的量化函数实现,该网络层的量化函数中的量化位数和量化单位已经预先设置。具体地,可以将该网络层当前的截断值和激活值均代入
Figure PCTCN2019129265-appb-000004
中,得到该网络层的量化值。
其中,对该网络层的量化值进行处理,得到处理后的量化值的操作可以根据该网络层的类型的不同而有所不同,具体操作可以参考相关技术,本申请实施例对此不进行详细阐述。例如,当该网络层具有权值和激活函数时,可以先根据该网络层中的权值对该网络层的量化值进行处理,得到第一处理结果,再根据该网络层中的激活函数对第一处理结果进行处理,得到第二处理结果来作为处理后的量化值。
其中,对处理后的量化值进行反量化,得到该网络层的反量化值的操作可以根据该网络层的量化函数实现,该网络层的量化函数中的量化位数和量化单位已经预先设置。具体地, 可以将处理后的量化值与s(n)相乘,得到该网络层的反量化值。
步骤2012:判断该输出数据与该训练样本中的样本标记之间的损失值是否超过规定范围。若否,则执行如下步骤2013;若是,则执行如下步骤2014。
步骤2013:结束迭代训练,获得满足要求的神经网络模型。
步骤2014:在反向传播过程中,根据该输出数据与该训练样本中的样本标记之间的损失值,调整该神经网络模型中的权值,以及根据该损失值、该网络层当前的截断值和激活值,调整该网络层的截断值。
需要说明的是,在执行步骤2014之后,可以返回步骤2011来进行下一次迭代训练。
值得说明的是,现有技术中,在反向传播过程中,是保持神经网络模型中的截断值不变,仅对神经网络模型中的权值进行调整。而本申请实施例中,在反向传播过程中,在对神经网络模型中的权值进行调整的同时,还可以对神经网络模型中的截断值进行调整。如此,神经网络模型中的截断值是通过训练得到的,即可以实现在模型训练过程中自适应地调整对激活值进行量化时的上下限,进而可以减小量化误差,提高神经网络模型的性能。
需要说明的是,在本申请实施例中,神经网络模型中的权值和截断值都可以称为神经网络模型中的参数。也即是,本申请实施例实际上是根据该神经网络模型的输出数据与该训练样本的样本标记之间的损失值,来对神经网络模型中的参数进行调整。
另外,该输出数据与该训练样本的样本标记之间的损失值可以通过该神经网络模型的损失函数得到。该损失函数可以是一般损失函数,如交叉熵损失函数、均方误差损失函数等。或者,该损失函数可以是正则化损失函数,该正则化损失函数是一般损失函数与正则函数之和。
其中,根据该输出数据与该训练样本中的样本标记之间的损失值,调整该神经网络模型中的权值的操作可以参考相关技术,本申请实施例对此不进行详细阐述。
例如,对于该神经网络模型中的任意一个权值,可以根据该损失值和这个权值,获取该神经网络模型的损失函数关于这个权值的偏导数;将这个权值减去学习率与这个权值的偏导数之积,得到调整后的权值。需要说明的是,该学习率可以预先进行设置。例如,该学习率可以为0.001、0.000001等。
其中,根据该损失值,调整该网络层的截断值的操作,关键在于求取该神经网络模型的损失函数关于该截断值的偏导数(本申请实施例中称为目标调整度)。
本申请实施例中是根据该损失值、该网络层当前的截断值和激活值,来求取该损失函数关于该截断值的偏导数。具体是将该损失函数关于该截断值的偏导数定义为:该损失函数关于该网络层的反量化值的偏导数(本申请实施例中称为第一调整度)与该网络层的量化函数关于该网络层的截断值的偏导数(本申请实施例中称为第二调整度)的乘积。
具体地,根据该损失值、该网络层当前的截断值和激活值,调整该网络层的截断值的操作可以为:根据该损失值和该网络层的反量化值,确定第一调整度;根据该网络层当前的截断值与激活值之间的大小关系,确定第二调整度;将第一调整度与第二调整度相乘,得到目标调整度;将该网络层当前的截断值减去学习率与目标调整度之积,得到该网络层的调整后的截断值。
需要说明的是,该学习率可以预先进行设置,该学习率可以与调整神经网络模型中的权值时的学习率相同,也可以与调整神经网络模型中的权值时的学习率不同。例如,该学习率 可以为0.001、0.000001等。
其中,本申请实施例中,求取该损失函数关于该网络层的反量化值的偏导数,即是根据该损失值和该网络层的反量化值确定第一调整度。换句话说,是根据该损失值和该网络层的反量化值,获取该损失函数关于该反量化值的偏导数来作为第一调整度。
值得注意的是,对于量化函数
Figure PCTCN2019129265-appb-000005
而言,令
Figure PCTCN2019129265-appb-000006
Figure PCTCN2019129265-appb-000007
为对激活值进行截断后的结果。此时,可以将该量化函数重构为:
Figure PCTCN2019129265-appb-000008
或者,
Figure PCTCN2019129265-appb-000009
其中,在求取该量化函数关于截断值的偏导数时,即求取
Figure PCTCN2019129265-appb-000010
时,可以令
Figure PCTCN2019129265-appb-000011
Figure PCTCN2019129265-appb-000012
不可导,但是可以通过straight-through estimator方法将其近似为1。因而
Figure PCTCN2019129265-appb-000013
就可以近似为
Figure PCTCN2019129265-appb-000014
此时实际上是将
Figure PCTCN2019129265-appb-000015
关于a的偏导数作为x q关于a的偏导数。
Figure PCTCN2019129265-appb-000016
关于a的偏导数取决于a(该网络层当前的截断值)与x(该网络层的激活值)之间的大小关系。
也即是,本申请实施例中求取该量化函数关于截断值的偏导数,即是根据该网络层当前的截断值与激活值之间的大小关系,确定第二调整度。
具体地,当该网络层的激活值小于或等于该网络层当前的截断值的相反数时,确定第二调整度为-1;当该网络层的激活值大于该网络层当前的截断值的相反数且小于该网络层当前的截断值时,确定第二调整度为0;当该网络层的激活值大于或等于该网络层当前的截断值时,确定第二调整度为1。
应理解,该实施方式也可以为当该网络层的激活值小于该网络层当前的截断值的相反数时,确定第二调整度为-1;当该网络层的激活值大于或等于该网络层当前的截断值的相反数且小于或等于该网络层当前的截断值时,确定第二调整度为0;当该网络层的激活值大于该网络层当前的截断值时,确定第二调整度为1。或者其他类似的条件分段方式,不再赘述。
值得注意的是,当该网络层包括m个部分,且每个部分共享一个截断值时,对于该m个部分中的任意一个部分,可以根据该损失值和这个部分的反量化值,确定这个部分对应的第一调整度;对于这个部分的所有激活值中的任意一个激活值,根据这个部分当前对应的截断值与这个激活值之间的大小关系,确定这个激活值对应的第二调整度;将这个部分的所有激活值对应的第二调整度的平均值作为这个部分对应的第二调整度;将这个部分对应的第一调整度与第二调整度的乘积作为这个部分对应的目标调整度;将这个部分当前对应的截断值减去学习率与这个部分对应的目标调整度之积,得到这个部分对应的调整后的截断值。
进一步地,在根据该神经网络模型的损失值对该神经网络模型中的截断值进行调整之前,可以先对该神经网络模型中的截断值进行初始化。也即是,在步骤201之前,可以先对该神经网络模型中的截断值进行初始化。
具体地,对该神经网络模型中的截断值进行初始化的操作可以为:使用该训练样本对该神经网络模型进行t次迭代训练,然后根据该t次迭代训练中该网络层的m个部分的激活值,确定该网络层的初始截断值。其中,t可以预先进行设置,且t可以为正整数。
值得说明的是,本申请实施例中是根据该神经网络模型中的激活值的统计特征来对截断值进行初始化,从而可以提高模型稳定性并加速收敛。
其中,该t次迭代训练中的每次迭代训练的操作可以为:在前向传播过程中,根据该神经网络模型中的权值对该训练样本中的样本数据进行处理,得到输出数据;在反向传播过程中,根据该输出数据与该训练样本中的样本标记之间的损失值,调整该神经网络模型中的权值。
其中,根据该t次迭代训练中该网络层的m个部分的激活值,确定该网络层的初始截断值的操作可以为:在该t次迭代训练中的第1次迭代训练中,获取该网络层的m个部分中每个部分的激活值中的最大激活值,将获取到的m个最大激活值的平均值作为第1个截断值;在该t次迭代训练中的第i次迭代训练中,获取该网络层的m个部分中每个部分的激活值中的最大激活值,将获取到的m个最大激活值的平均值和第i-1个截断值进行加权平均,得到第i个截断值,i为大于或等于2且小于或等于t的整数;将第t个截断值作为该网络层的m个部分中每个部分对应的初始截断值。
需要说明的是,可以预先设置该m个最大激活值的平均值的权重和第i-1个截断值的权重,这两个权重之和为1。例如,可以将第i-1个截断值的权重设置为0.9997。之后,将该m个最大激活值的平均值与其权重相乘,得到第一数值;将第i-1个截断值与其权重相乘,得到第二数值;将第一数值与第二数值相加,得到第i个截断值。
在本申请实施例中,使用训练样本对神经网络模型进行多次迭代训练。其中,对于该多次迭代训练中的一次迭代训练,在前向传播过程中,根据神经网络模型中的权值和网络层当前的截断值对训练样本中的样本数据进行处理,得到输出数据。在反向传播过程中,根据输出数据与训练样本中的样本标记之间的损失值,调整神经网络模型中的权值,以及根据该损失值、该网络层当前的截断值和激活值,调整该网络层的截断值。如此,神经网络模型中的截断值是通过训练得到的,即可以实现在模型训练过程中自适应地调整对激活值进行量化时的上下限,从而可以减小量化误差,提高最终训练得到的神经网络模型的性能。
值得说明的是,在通过上述模型训练方法完成模型训练后,可以对训练得到的神经网络模型进行应用,如可以使用该神经网络模型进行图像分类、图像检测、SISR任务等。其中,该神经网络模型中的权值和截断值均是训练得到的。
例如,在SISR场景下,可以将待重构的低分辨率图像输入该神经网络模型,获得对应的高分辨率图像。由于该神经网络模型中的截断值是通过训练得到的,所以该神经网络模型的量化误差较小、性能较好,从而通过该神经网络模型重构出的高分辨率图像的质量较高。
图4是本申请实施例提供的一种模型训练装置的结构示意图,该模型训练装置可以由软件、硬件或者两者的结合实现成为计算机设备的部分或者全部,该计算机设备可以为图1所示的计算机设备。参见图4,该装置包括:训练模块401。
第一训练模块401,用于执行上文图2实施例中的步骤201;
其中,第一训练模块401包括:
处理单元4011,用于执行上文图2实施例中的步骤2011;
调整单元4012,用于执行上文图2实施例中的步骤2014。
可选地,调整单元4012用于:
根据损失值和网络层的反量化值,确定第一调整度;
根据网络层当前的截断值与激活值之间的大小关系,确定第二调整度;
将第一调整度与第二调整度相乘,得到目标调整度;
将网络层当前的截断值减去学习率与目标调整度之积,得到网络层的调整后的截断值。
可选地,调整单元4012用于:
当网络层的激活值小于或等于网络层当前的截断值的相反数时,确定第二调整度为-1;
当网络层的激活值大于网络层当前的截断值的相反数且小于网络层当前的截断值时,确定第二调整度为0;
当网络层的激活值大于或等于网络层当前的截断值时,确定第二调整度为1。
可选地,该装置还包括:
第二训练模块,用于使用训练样本对神经网络模型进行t次迭代训练,t为正整数;
确定模块,用于根据t次迭代训练中网络层的m个部分的激活值,确定网络层的初始截断值,m为正整数。
可选地,网络层的m个部分是与网络层的m个输出神经元一一对应的m组输入神经元,或者,网络层的m个部分是与网络层的m个输出通道一一对应的m组输入通道。
可选地,样本数据为低分辨率图像,样本标记为低分辨率图像对应的高分辨率图像。
在本申请实施例中,使用训练样本对神经网络模型进行多次迭代训练。其中,对于该多次迭代训练中的一次迭代训练,在前向传播过程中,根据神经网络模型中的权值和网络层当前的截断值对训练样本中的样本数据进行处理,得到输出数据。在反向传播过程中,根据输出数据与训练样本中的样本标记之间的损失值,调整神经网络模型中的权值,以及根据该损失值、该网络层当前的截断值和激活值,调整该网络层的截断值。如此,神经网络模型中的截断值是通过训练得到的,即可以实现在模型训练过程中自适应地调整对激活值进行量化时的上下限,从而可以减小量化误差,提高最终训练得到的神经网络模型的性能。
需要说明的是:上述实施例提供的模型训练装置在模型训练时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的模型训练装置与模型训练方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意结合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络或其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站 站点、计算机、服务器或数据中心通过有线(例如:同轴电缆、光纤、数据用户线(Digital Subscriber Line,DSL))或无线(例如:红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质,或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质(例如:软盘、硬盘、磁带)、光介质(例如:数字通用光盘(Digital Versatile Disc,DVD))或半导体介质(例如:固态硬盘(Solid State Disk,SSD))等。
以上所述为本申请提供的实施例,并不用以限制本申请,凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。

Claims (14)

  1. 一种模型训练方法,其特征在于,所述方法包括:
    使用训练样本对神经网络模型进行多次迭代训练;
    其中,所述多次迭代训练中的一次迭代训练,包括:
    在前向传播过程中,根据所述神经网络模型中的权值和网络层当前的截断值对所述训练样本中的样本数据进行处理,得到输出数据,其中,所述网络层的截断值用于对所述网络层的激活值进行量化;
    在反向传播过程中,根据所述输出数据与所述训练样本中的样本标记之间的损失值,调整所述神经网络模型中的权值,以及根据所述损失值、所述网络层当前的截断值和激活值,调整所述网络层的截断值。
  2. 如权利要求1所述的方法,其特征在于,所述根据所述损失值、所述网络层当前的截断值和激活值,调整所述网络层的截断值,包括:
    根据所述损失值和所述网络层的反量化值,确定第一调整度;
    根据所述网络层当前的截断值与激活值之间的大小关系,确定第二调整度;
    将所述第一调整度与所述第二调整度相乘,得到目标调整度;
    将所述网络层当前的截断值减去学习率与所述目标调整度之积,得到所述网络层的调整后的截断值。
  3. 如权利要求2所述的方法,其特征在于,所述根据所述网络层当前的截断值与激活值之间的大小关系,确定第二调整度,包括:
    当所述网络层的激活值小于或等于所述网络层当前的截断值的相反数时,确定所述第二调整度为-1;
    当所述网络层的激活值大于所述网络层当前的截断值的相反数且小于所述网络层当前的截断值时,确定所述第二调整度为0;
    当所述网络层的激活值大于或等于所述网络层当前的截断值时,确定所述第二调整度为1。
  4. 如权利要求1所述的方法,其特征在于,所述使用训练样本对神经网络模型进行多次迭代训练之前,还包括;
    使用所述训练样本对所述神经网络模型进行t次迭代训练,t为正整数;
    根据所述t次迭代训练中所述网络层的m个部分的激活值,确定所述网络层的初始截断值,所述m为正整数。
  5. 如权利要求4所述的方法,其特征在于,所述网络层的m个部分是与所述网络层的m个输出神经元一一对应的m组输入神经元,或者,所述网络层的m个部分是与所述网络层的m个输出通道一一对应的m组输入通道。
  6. 如权利要求1-5任一所述的方法,其特征在于,所述样本数据为低分辨率图像,所述样本标记为所述低分辨率图像对应的高分辨率图像。
  7. 一种模型训练装置,其特征在于,所述装置包括:
    第一训练模块,用于使用训练样本对神经网络模型进行多次迭代训练;
    其中,所述第一训练模块包括:
    处理单元,用于在前向传播过程中,根据所述神经网络模型中的权值和网络层当前的截断值对所述训练样本中的样本数据进行处理,得到输出数据,其中,所述网络层的截断值用于对所述网络层的激活值进行量化;
    调整单元,用于在反向传播过程中,根据所述输出数据与所述训练样本中的样本标记之间的损失值,调整所述神经网络模型中的权值,以及根据所述损失值、所述网络层当前的截断值和激活值,调整所述网络层的截断值。
  8. 如权利要求7所述的装置,其特征在于,所述调整单元用于:
    根据所述损失值和所述网络层的反量化值,确定第一调整度;
    根据所述网络层当前的截断值与激活值之间的大小关系,确定第二调整度;
    将所述第一调整度与所述第二调整度相乘,得到目标调整度;
    将所述网络层当前的截断值减去学习率与所述目标调整度之积,得到所述网络层的调整后的截断值。
  9. 如权利要求8所述的装置,其特征在于,所述调整单元用于:
    当所述网络层的激活值小于或等于所述网络层当前的截断值的相反数时,确定所述第二调整度为-1;
    当所述网络层的激活值大于所述网络层当前的截断值的相反数且小于所述网络层当前的截断值时,确定所述第二调整度为0;
    当所述网络层的激活值大于或等于所述网络层当前的截断值时,确定所述第二调整度为1。
  10. 如权利要求7所述的装置,其特征在于,所述装置还包括:
    第二训练模块,用于使用所述训练样本对所述神经网络模型进行t次迭代训练,t为正整数;
    确定模块,用于根据所述t次迭代训练中所述网络层的m个部分的激活值,确定所述网络层的初始截断值,所述m为正整数。
  11. 如权利要求10所述的装置,其特征在于,所述网络层的m个部分是与所述网络层的m个输出神经元一一对应的m组输入神经元,或者,所述网络层的m个部分是与所述网络层的m个输出通道一一对应的m组输入通道。
  12. 如权利要求7-11任一所述的装置,其特征在于,所述样本数据为低分辨率图像,所述样本标记为所述低分辨率图像对应的高分辨率图像。
  13. 一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,当其在计算机上运行时,使得计算机执行如权利要求1-6任意一项所述的方法。
  14. 一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行如权利要求1-6任意一项所述的方法。
PCT/CN2019/129265 2019-12-27 2019-12-27 模型训练方法、装置、存储介质和程序产品 WO2021128293A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2019/129265 WO2021128293A1 (zh) 2019-12-27 2019-12-27 模型训练方法、装置、存储介质和程序产品
CN201980102629.8A CN114730367A (zh) 2019-12-27 2019-12-27 模型训练方法、装置、存储介质和程序产品

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/129265 WO2021128293A1 (zh) 2019-12-27 2019-12-27 模型训练方法、装置、存储介质和程序产品

Publications (1)

Publication Number Publication Date
WO2021128293A1 true WO2021128293A1 (zh) 2021-07-01

Family

ID=76573515

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/129265 WO2021128293A1 (zh) 2019-12-27 2019-12-27 模型训练方法、装置、存储介质和程序产品

Country Status (2)

Country Link
CN (1) CN114730367A (zh)
WO (1) WO2021128293A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115271366A (zh) * 2022-07-01 2022-11-01 中铁二十局集团有限公司 高原隧道围岩分级模型训练方法、装置、设备及介质
CN117035123A (zh) * 2023-10-09 2023-11-10 之江实验室 一种并行训练中的节点通信方法、存储介质、设备
CN117058525A (zh) * 2023-10-08 2023-11-14 之江实验室 一种模型的训练方法、装置、存储介质及电子设备

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109871976A (zh) * 2018-12-20 2019-06-11 浙江工业大学 一种基于聚类及神经网络的含分布式电源配电网电能质量预测方法
CN109902745A (zh) * 2019-03-01 2019-06-18 成都康乔电子有限责任公司 一种基于cnn的低精度训练与8位整型量化推理方法
CN110413255A (zh) * 2018-04-28 2019-11-05 北京深鉴智能科技有限公司 人工神经网络调整方法和装置
CN110414679A (zh) * 2019-08-02 2019-11-05 厦门美图之家科技有限公司 模型训练方法、装置、电子设备和计算机可读存储介质
US10510003B1 (en) * 2019-02-14 2019-12-17 Capital One Services, Llc Stochastic gradient boosting for deep neural networks

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110413255A (zh) * 2018-04-28 2019-11-05 北京深鉴智能科技有限公司 人工神经网络调整方法和装置
CN109871976A (zh) * 2018-12-20 2019-06-11 浙江工业大学 一种基于聚类及神经网络的含分布式电源配电网电能质量预测方法
US10510003B1 (en) * 2019-02-14 2019-12-17 Capital One Services, Llc Stochastic gradient boosting for deep neural networks
CN109902745A (zh) * 2019-03-01 2019-06-18 成都康乔电子有限责任公司 一种基于cnn的低精度训练与8位整型量化推理方法
CN110414679A (zh) * 2019-08-02 2019-11-05 厦门美图之家科技有限公司 模型训练方法、装置、电子设备和计算机可读存储介质

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115271366A (zh) * 2022-07-01 2022-11-01 中铁二十局集团有限公司 高原隧道围岩分级模型训练方法、装置、设备及介质
CN117058525A (zh) * 2023-10-08 2023-11-14 之江实验室 一种模型的训练方法、装置、存储介质及电子设备
CN117058525B (zh) * 2023-10-08 2024-02-06 之江实验室 一种模型的训练方法、装置、存储介质及电子设备
CN117035123A (zh) * 2023-10-09 2023-11-10 之江实验室 一种并行训练中的节点通信方法、存储介质、设备
CN117035123B (zh) * 2023-10-09 2024-01-09 之江实验室 一种并行训练中的节点通信方法、存储介质、设备

Also Published As

Publication number Publication date
CN114730367A (zh) 2022-07-08

Similar Documents

Publication Publication Date Title
WO2019184823A1 (zh) 基于卷积神经网络模型的图像处理方法和装置
WO2021128293A1 (zh) 模型训练方法、装置、存储介质和程序产品
WO2021022685A1 (zh) 一种神经网络训练方法、装置及终端设备
US11018692B2 (en) Floating point data set compression
WO2020207174A1 (zh) 用于生成量化神经网络的方法和装置
WO2017128632A1 (zh) 一种图像压缩方法、图像重构方法、装置及系统
WO2022021834A1 (zh) 神经网络模型确定方法、装置、电子设备、介质及产品
WO2023020456A1 (zh) 网络模型的量化方法、装置、设备和存储介质
US20210065011A1 (en) Training and application method apparatus system and stroage medium of neural network model
CN114548426B (zh) 异步联邦学习的方法、业务服务的预测方法、装置及系统
CN110795235B (zh) 一种移动web深度学习协作的方法及系统
WO2023020289A1 (zh) 网络模型的处理方法、装置、设备和存储介质
US20110196916A1 (en) Client terminal, server, cloud computing system, and cloud computing method
CN111355814A (zh) 一种负载均衡方法、装置及存储介质
WO2023206889A1 (zh) 模型推理方法、装置、设备及存储介质
CN109086819B (zh) caffemodel模型压缩方法、系统、设备及介质
WO2021073638A1 (zh) 运行神经网络模型的方法、装置和计算机设备
US10164889B1 (en) High throughput flow control
CN111783731B (zh) 用于提取视频特征的方法和装置
US20060126739A1 (en) SIMD optimization for H.264 variable block size motion estimation algorithm
US11729349B2 (en) Method, electronic device, and computer program product for video processing
CN114065913A (zh) 模型量化方法、装置及终端设备
KR20180110524A (ko) 제한된 캐시 메모리 환경에서 파일 및 화질의 선호도를 고려한 파일 저장 방법 및 파일 저장 장치
US20230237613A1 (en) Method for generating metadata, image processing method, electronic device, and program product
US20230289609A1 (en) As-Light-As-Possible Autoencoder Neural Network

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19957811

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19957811

Country of ref document: EP

Kind code of ref document: A1