WO2022222649A1 - 神经网络模型的训练方法、装置、设备及存储介质 - Google Patents

神经网络模型的训练方法、装置、设备及存储介质 Download PDF

Info

Publication number
WO2022222649A1
WO2022222649A1 PCT/CN2022/081023 CN2022081023W WO2022222649A1 WO 2022222649 A1 WO2022222649 A1 WO 2022222649A1 CN 2022081023 W CN2022081023 W CN 2022081023W WO 2022222649 A1 WO2022222649 A1 WO 2022222649A1
Authority
WO
WIPO (PCT)
Prior art keywords
weight parameter
quantization
neural network
network model
bits
Prior art date
Application number
PCT/CN2022/081023
Other languages
English (en)
French (fr)
Inventor
赵娟萍
Original Assignee
Oppo广东移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oppo广东移动通信有限公司 filed Critical Oppo广东移动通信有限公司
Publication of WO2022222649A1 publication Critical patent/WO2022222649A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models

Definitions

  • the embodiments of the present application relate to the technical field of artificial intelligence, and in particular, to a method, apparatus, device, and storage medium for training a neural network model.
  • the network model based on computer vision is applied to image recognition and image processing
  • the network model based on natural language processing is applied to semantic recognition and automatic question answering and so on.
  • Embodiments of the present application provide a training method, apparatus, device, and storage medium for a neural network model.
  • the technical solution is as follows:
  • an embodiment of the present application provides a method for training a neural network model, the method is executed by a computer device, and the method includes:
  • An inference loss of the neural network model is determined based on the sample inference result, and the first weight parameter is adjusted based on the inference loss.
  • an embodiment of the present application provides a training device for a neural network model, the device comprising:
  • the weight generation module is used to generate corresponding first weight parameters for the network layers in the neural network model, wherein different network layers in the neural network model correspond to different weight parameters;
  • a loss simulation module configured to perform quantization loss simulation on the first weight parameter based on the number of quantized bits to obtain a second weight parameter
  • an inference module configured to input the training sample into the neural network model using the second weight parameter to obtain a sample inference result
  • An adjustment module configured to determine an inference loss of the neural network model based on the sample inference result, and adjust the first weight parameter based on the inference loss.
  • an embodiment of the present application provides a computer device, the computer device includes a processor and a memory, and the processor includes a central processing unit (Central Processing Unit, CPU) and a neural network processor (Neural-network Processing Unit) Unit, NPU);
  • CPU Central Processing Unit
  • NPU neural network processor
  • the memory stores at least one instruction for execution by the processor to implement the following steps:
  • An inference loss of the neural network model is determined based on the sample inference result, and the first weight parameter is adjusted based on the inference loss.
  • an embodiment of the present application further provides a computer-readable storage medium, where the computer-readable storage medium stores at least one instruction, and the at least one instruction is configured to be executed by a processor to implement the above aspects The training method of the neural network model.
  • an embodiment of the present application provides a computer program product, where the computer program product includes computer instructions, and the computer instructions are stored in a computer-readable storage medium.
  • the processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device executes the neural network model training method provided by the above aspects.
  • Fig. 1 is the contrast diagram of the neural network model training process in the related art and the embodiment of the present application;
  • FIG. 2 shows a flowchart of a training method for a neural network model provided by an exemplary embodiment of the present application
  • FIG. 3 is a schematic diagram of a weight generator provided by an exemplary embodiment of the present application.
  • FIG. 4 shows a flowchart of a training method for a neural network model provided by another exemplary embodiment of the present application
  • FIG. 5 is a schematic diagram of a weight generator provided by another exemplary embodiment of the present application.
  • FIG. 6 shows a flowchart of a training method for a neural network model provided by another exemplary embodiment of the present application.
  • FIG. 7 is a schematic diagram of a weight generator provided by another exemplary embodiment of the present application.
  • FIG. 8 shows a structural block diagram of a training apparatus for a neural network model provided by an embodiment of the present application
  • FIG. 9 shows a structural block diagram of a computer device provided by an exemplary embodiment of the present application.
  • plural refers to two or more.
  • “And/or”, which describes the association relationship of the associated objects, means that there can be three kinds of relationships, for example, A and/or B, which can mean that A exists alone, A and B exist at the same time, and B exists alone.
  • the character “/” generally indicates that the associated objects are an "or" relationship.
  • Quantization processing A model compression technique that converts floating-point (float) storage operations to integer (int) storage operations (ie, converts floating-point numbers to fixed-point numbers).
  • float floating-point
  • integer integer
  • Quantization processing A model compression technique that converts floating-point numbers to fixed-point numbers.
  • the energy consumption of floating-point operations is higher than that of integer operations
  • speed of integer operations is higher than that of floating-point operations. Therefore, after quantizing the neural network model, the memory usage of the neural network model can be reduced. , improve the inference (running) speed of the neural network model, and reduce the power consumption during the inference process of the neural network model.
  • linear quantization is usually used when quantizing the neural network model.
  • Inverse quantization processing The inverse process of quantization processing, which is used to convert integer storage operations to floating-point storage operations. Usually, when using the quantized neural network model to perform the inference process, it is necessary to perform inverse quantization processing on the inference results, that is, to convert the integer inference results into floating-point inference results.
  • the neural network model is quantized, and the weight parameters in the model are converted from floating point type to integer type, for example, 32-bit floating point data is converted into 8-bit integer data, so that the quantized neural network model is deployed in the terminal.
  • the quantized neural network model for inference, it is only necessary to quantize the floating-point input data when inputting, and perform inverse quantization processing on the integer output data when outputting, and all other operations can use integer operations. Completed, which not only reduces the memory occupied by the neural network model, but also improves the inference speed of the neural network model.
  • the scaling factor used in the quantization process can be expressed as:
  • ** is the power
  • bit num is the number of quantization bits (such as 8bit).
  • the quantized weight parameter can be expressed as:
  • quant data floor(scale*float data )
  • quant data is the integer data after quantization
  • float data is the floating point data before quantization
  • floor means round down.
  • the weight parameters are directly applied to the convolution layer, and the weight parameters of each convolution layer are adjusted based on the sample inference results corresponding to the training samples in the model training process.
  • the weight generator is configured for the multi-layer layer, and the weight generator is used to generate the weight parameters required for the generation of the convolution layer, and the quantization loss simulation is performed on the weight parameters based on the number of quantization bits used in the subsequent quantization processing, so as to simulate the weight of the quantization loss.
  • the parameters are applied to the convolution layer, and after obtaining the sample inference results, the weight parameters are adjusted by adjusting the weight generator, and the training of the neural network model is realized.
  • the subsequent quantization of the trained neural network model based on the number of quantized bits can reduce the quantization loss caused by the quantization and improve the post-quantization loss.
  • the accuracy of the neural network model at the same time, the weight generator is used to replace the weight parameters of the original network layer, so that developers do not need to implement a quantization layer for the network layer (only need to connect the weight generator to the neural network model), which simplifies model training.
  • the pre-preparation process improves the efficiency of model training. Illustrative embodiments are used for description below.
  • FIG. 2 shows a flowchart of a training method of a neural network model provided by an exemplary embodiment of the present application. This embodiment is described by taking the method applied to a computer device as an example, and the method may include the following steps.
  • Step 201 generating corresponding first weight parameters for the network layers in the neural network model, wherein different network layers in the neural network model correspond to different weight parameters.
  • the neural network model is applied to computer vision (Computer Vision, CV), speech technology (Speech Technology), natural language processing (Nature Language Processing, NLP) and other technical fields, and the neural network model can be a convolutional neural network ( Convolutional Neural Networks, CNN) model or dense network (DenseNet) model, etc., the embodiments of the present application do not limit the specific use and type of the neural network model.
  • the developer connects the weight generator to the network layer in the neural network model in advance, and uses the weight generator to generate the required first weight parameter (such as a floating-point weight parameter) for the network layer.
  • the required first weight parameter such as a floating-point weight parameter
  • developers set the corresponding weight generators for each convolutional layer in the convolutional neural network model.
  • the network layer is a network layer that needs to use weight parameters in the forward reasoning process.
  • the weight generator is used to generate the parameters of the convolution kernel in the convolutional layer.
  • the weight generator is used to generate parameters of a dense block (dense block) in the dense layer, and the embodiment of the present application does not limit the specific types of the network layer and the first weight parameter.
  • Step 202 Perform quantization loss simulation on the first weight parameter based on the number of quantized bits to obtain a second weight parameter.
  • the weight parameters before applying the generated weight parameters to the network layer of the neural network model, the weight parameters also need to be simulated and quantized based on the subsequent quantization target (ie, the number of quantization bits). loss, where the quantization loss simulation is used to simulate the parameter loss caused by the quantization and inverse quantization processes.
  • the weight generator in addition to generating the first weight parameter, is also configured to perform quantization loss simulation on the first weight parameter according to the number of quantization bits to obtain the second weight parameter.
  • the second weight parameter is the weight parameter actually applied by the neural network model.
  • the number of quantization bits may be preset by the developer and input into the weight generator as a parameter, and the number of quantization bits corresponding to different weight generation models may be the same or different.
  • the number of quantization bits may be common 1bit, 2bit, 4bit, 8bit, etc., and may also be other common bits other than the above-mentioned bits. This embodiment does not limit the specific number of quantization bits. In an illustrative example, when the number of quantization bits is 8 bits, it means that the floating-point weight parameter needs to be quantized to an integer weight parameter within the range of (-127, 127).
  • Step 203 input the training sample into the neural network model using the second weight parameter, and obtain the sample inference result output by the neural network model.
  • the computer equipment After setting weight parameters for each network layer in the neural network model, the computer equipment uses the training samples to perform a round of training on the neural network model. During each round of training, the computer equipment inputs the training samples into the neural network model, and the neural network model performs forward inference on the training samples to obtain the sample inference results.
  • the inference results of this sample may be different.
  • the sample inference result is the image classification result
  • the sample inference result is the semantic recognition result
  • the embodiment of the present application is not correct
  • the specific content of the sample inference results is limited.
  • the computer device divides the training samples in the sample training set into several batches. In each round of training, a batch of training samples is input into the neural network model, so as to obtain the corresponding training samples of the batch. sample inference results.
  • Step 204 Determine the inference loss of the neural network model based on the sample inference result, and adjust the first weight parameter based on the inference loss.
  • the computer device uses the annotation corresponding to the training sample as supervision, and determines the difference between the sample annotation and the sample inference result as the inference loss of the neural network model.
  • the weight parameters of the network layer are directly adjusted by algorithms such as backpropagation or gradient descent.
  • the simulated first weight parameter is directly adjusted by algorithms such as backpropagation or gradient descent.
  • the computer device adjusts the first weight parameter by adjusting the weight generator.
  • the adjusted first weight parameter still needs to be quantized loss simulation before it can be actually applied to the network layer of the neural network model. .
  • the computer equipment uses the training samples to perform the next round of iterative training on the neural network model, that is, the above steps are performed cyclically until the training completion condition is met, and the training is stopped, and the network layer when the training completion condition is met.
  • the adopted weight parameter is determined as the target weight parameter of the neural network model.
  • the training completion condition includes convergence of inference loss or reaching the number of iterations, which is not limited in this embodiment.
  • the quantized loss simulation is performed on the weight parameters, and the weight parameters simulated by the quantized loss are applied to the neural network model, so that the The neural network model infers the training samples, obtains the sample inference results, and then determines the inference loss based on the sample inference results, realizes the adjustment of the weight parameters in the neural network model, and completes the training of the neural network model; since the weight parameters are applied to the neural network Before the model, the quantization loss of the number of quantized bits is simulated, so it can reduce the quantization loss when quantizing the neural network model obtained by training later, and help to improve the accuracy of the neural network model after quantization.
  • the computer device when the training completion condition is satisfied, performs quantization processing on the weight parameters of each network layer in the neural network model based on the number of quantized bits to obtain a quantized neural network model and reduce the data volume of the model.
  • the quantitative neural network model will be deployed in devices such as terminals or servers.
  • the deployed quantitative neural network model When using the deployed quantitative neural network model for inference, first quantify the input data based on the number of quantized bits, and then input the quantized fixed-point data into the quantitative neural network model, and each network layer processes the fixed-point data to obtain the output data . Since the output data is fixed-point data, the output layer needs to perform inverse quantization processing on the output data, and finally output floating-point output data.
  • the weight parameter of the convolutional neural network model obtained by training is 32-bit floating-point data
  • the weight parameter of the quantized convolutional neural network model is 8-bit integer data.
  • the quantized convolutional neural network model When using the quantized convolutional neural network model to infer the data, first convert the 32-bit floating-point input data into 8-bit integer input data, and then input the 8-bit integer input data into the quantized convolutional neural network model. The data is processed by convolution to obtain 8bit integer output data. Further, the output layer dequantizes the 8-bit integer output data into 32-bit floating-point output data and outputs it.
  • the weight generator 31 is composed of a weight parameter generator 311 and an analog quantizer 312 , wherein the weight parameter generator 311 is used for The first weight parameter is generated for the network layer, and the simulated quantizer 312 is configured to perform simulated quantization loss on the first weight parameter generated by the weight parameter generator 311 based on the number of quantized bits, so as to provide the second weight parameter simulated by the quantization loss to the network layer.
  • the corresponding network layer 32 The specific generation process of the weight parameter is described below by using an exemplary embodiment.
  • FIG. 4 shows a flowchart of a training method of a neural network model provided by another exemplary embodiment of the present application. This embodiment is described by taking the method applied to a computer device as an example, and the method may include the following steps .
  • Step 401 Generate a first weight parameter for the network layer through a weight parameter generator.
  • the weight parameter generator is composed of n layers of dense layers (consisting of dense blocks) and n layers of activation layers, where n is an integer greater than or equal to 2, where the n layers of dense layers are used as network layers Generate weight parameters.
  • the computer device first generates a first weight parameter through the weight parameter generator, where the first weight parameter is floating-point data.
  • the process of generating the first weight parameter may include the following sub-steps.
  • the weight parameter generator is composed of n subunits, and each subunit is composed of a dense layer and an activation layer.
  • the computer device obtains the activation result output by the m-1th subunit, performs a fully connected (Full Connected, FC) process on the input data through the mth dense layer in the mth subunit, and converts the fully connected result Input the mth activation layer of the mth subunit for activation processing.
  • FC Fully Connected
  • the activation layer may use a ReLU function, a Sigmoid function, a Tanh function, or other activation functions to activate the fully connected result, which is not limited in this embodiment.
  • the input data of the first dense layer in the first subunit is a preset floating-point constant, for example, the preset floating-point constant is 1.0, and the specific value of the preset floating-point constant is not limited in this embodiment .
  • the mth activation layer inputs the activation result into the m+1th subunit, and the m+1th dense layer and the m+1th activation layer in the m+1th subunit perform full connection and activation processing.
  • the weight parameter generator 51 includes n subunits (the contents in the dashed box), after obtaining the input preset floating-point constant, the dense layer 511 is used for full connection processing, and then the The activation layer 512 performs activation processing on the fully connected result, from which the activation result is input to the next subunit.
  • the computer device determines the activation result output by the nth layer of activation layer as the first weight parameter.
  • the computer device acquires the floating-point first weight parameter output by the nth subunit.
  • the data amount of the first weight parameter is determined by the network layer.
  • the network layer is a convolutional layer
  • the convolution kernel size of the convolutional layer is k h ⁇ k w
  • the number of channels of the input feature map is in c
  • the number of channels of the output feature map is out
  • the data volume of the first weight parameter is:
  • Step 402 based on the number of quantized bits, perform quantization loss simulation on the first weight parameter by an analog quantizer to obtain a second weight parameter.
  • the computer equipment uses an analog quantizer to simulate floating-point data conversion during model training. Integer data and the process of converting integer data to floating point data, so that the quantization loss is integrated into the model training, so as to reduce the quantization loss caused by subsequent model quantization.
  • the following sub-steps may be included.
  • the computer device determines the parameter range of the first weight parameter, so as to perform linear quantization processing on the first weight parameter based on the parameter range and the number of quantization bits set by the developer, to obtain the corresponding value of the first weight parameter.
  • the quantized weight parameter (fixed-point number).
  • the parameter range may be determined according to the absolute value of the minimum first weight parameter and the maximum first weight parameter.
  • the first weight parameter when the first weight parameter is 32-bit floating-point data and the number of quantization bits is 8 bits, the first weight parameter is quantized as an integer in the range of (-127, 127). data.
  • the computer device performs inverse quantization processing on the quantization weight parameter based on the number of quantized bits, simulates the process of converting integer data into floating point data, and obtains the second weight parameter. Due to the quantization error in the quantization and inverse quantization process, the second weight parameter output by the analog quantizer is different from the input first weight parameter, so as to achieve the effect of simulating quantization loss.
  • the process that the computer device performs quantization loss simulation on the first weight parameter can be expressed by the following formula:
  • data 1 is the weight parameter without quantization loss simulation
  • data 2 is the weight parameter with quantization loss simulation
  • bitnum is the number of quantized bits
  • a is the parameter range of the weight parameter without quantization loss simulation.
  • the analog quantizer 52 first performs quantization processing on the first weight parameter to obtain an integer quantization weight parameter, and then performs inverse quantization processing on the quantization weight parameter to obtain a floating-point second weight parameter. weight parameter.
  • the computer device before performing the simulated quantization loss, the computer device detects whether the number of quantization bits is less than the preset number of bits, and if it is less than the preset number of bits, simulates the quantization loss. If it is equal, it is unnecessary to perform an analog quantization loss (ie, the analog quantizer fails).
  • the computer device directly determines the first weight parameter as the second weight parameter.
  • the preset number of bits is the number of bits occupied by the floating point number.
  • the computer equipment simulates the quantization loss through the analog quantizer, and when the set number of quantization bits is equal to 32 bits, the computer equipment does not need to simulate the quantization loss through the analog quantizer.
  • Step 403 input the training sample into the neural network model using the second weight parameter, and obtain the sample inference result output by the neural network model.
  • step 202 For the implementation of this step, reference may be made to step 202, and details are not described herein again in this embodiment.
  • Step 404 Determine the inference loss of the neural network model based on the sample inference result, and adjust the weight parameter generator based on the inference loss.
  • the computer device adjusts the parameters of the dense layer in each weight generator based on the inference loss, so as to achieve adjustment The effect of network layer weights.
  • the computer device generates the weight parameters required by the network layer through multiple dense layers and activation layers, and performs quantization and inverse quantization processing on the generated weight parameters based on the number of quantization bits, and simulates the weight parameters between floating point and For the loss generated when converting between integers, taking the quantization loss as part of the model training helps to reduce the quantization loss caused by post-quantization and improve the accuracy of the post-quantization model.
  • the weight generator further includes a normalization layer.
  • the weight parameters are normalized through the normalization layer, so that the weight parameters satisfy the symmetrical distribution, thereby reducing the loss in the quantization process.
  • FIG. 6 shows a flowchart of a training method for a neural network model provided by another exemplary embodiment of the present application. This embodiment is described by taking the method applied to a computer device as an example, and the method may include the following steps .
  • Step 601 Generate a first weight parameter for the network layer through a weight parameter generator.
  • step 401 For the implementation of this step, reference may be made to step 401, and details are not described herein again in this embodiment.
  • Step 602 perform normalization processing on the first weight parameter to obtain a normalized weight parameter, and the average value of the normalized weight parameter is 0.
  • the computer device performs normalization processing on the first weight parameter through the normalization layer of the weight generator (a special normalization processing method), and changes the data distribution of the first weight parameter, so that the The mean value of the normalized weight parameter obtained after processing is 0, that is, the data of the normalized weight parameter is symmetrical.
  • the normalization layer can be normalized by any one of layer normalization (Layer Norm), batch normalization (Batch Norm), or group normalization (Group Norm).
  • Layer Norm layer normalization
  • Batch Norm batch normalization
  • Group Norm group normalization
  • the computer device normalizes the floating-point first weight parameter output by the weight parameter generator 51 through the normalization layer 53 to obtain a mean value of 0.
  • Step 603 based on the number of quantized bits, perform quantization loss simulation on the normalized weight parameter by an analog quantizer to obtain a second weight parameter.
  • the computer device performs quantization loss simulation on the normalized weight parameter by using an analog quantizer to obtain the second weight parameter. Since the normalized weight parameter conforms to a symmetric distribution, when the normalized weight parameter is quantized based on a symmetric fixed-point range, the quantization loss of the weight parameter is smaller.
  • the computer device performs quantization and inverse quantization processing on the normalized weight parameter through the analog quantizer 52, and finally outputs the second weight parameter.
  • Step 604 input the training sample into the neural network model using the second weight parameter, and obtain the sample inference result output by the neural network model.
  • Step 605 Determine the inference loss of the neural network model based on the sample inference result, and adjust the weight parameter generator based on the inference loss.
  • steps 604 to 605 For the implementation of steps 604 to 605, reference may be made to steps 403 to 404, and details are not described herein again in this embodiment.
  • normalized weight parameters of symmetrical distribution are obtained by normalizing the generated weight parameters, and then the quantization loss caused by the asymmetry of the weight parameters is reduced by simulating the quantization loss of the normalized weight parameters. , which further reduces the quantization loss of the subsequent model quantization; and the parameter range of the weight parameters is constrained by normalization, which helps to speed up the training of the model.
  • the above weight generator can also introduce a mask parameter, so as to use the mask parameter to sparse or prune the generated weight parameter, further reducing the data volume of the neural network model and the calculation of the subsequent inference process quantity.
  • FIG. 8 shows a structural block diagram of an apparatus for training a neural network model provided by an embodiment of the present application.
  • the apparatus can be implemented by software, hardware or a combination of the two to become all or a part of computer equipment.
  • the device includes:
  • a weight generation module 801 configured to generate a corresponding first weight parameter for a network layer in the neural network model, wherein different network layers in the neural network model correspond to different weight parameters;
  • a loss simulation module 802 configured to perform quantization loss simulation on the first weight parameter based on the number of quantized bits to obtain a second weight parameter
  • an inference module 803, configured to input the training sample into the neural network model using the second weight parameter to obtain a sample inference result
  • An adjustment module 804 configured to determine an inference loss of the neural network model based on the sample inference result, and adjust the first weight parameter based on the inference loss.
  • the loss simulation module 802 is used for:
  • the first weight parameter is quantized to obtain a quantization weight parameter, where the quantization weight parameter is a fixed-point number, and the quantization weight parameter occupies
  • the number of bits is the number of quantization bits
  • inverse quantization processing is performed on the quantization weight parameter to obtain the second weight parameter.
  • the weight generation module 801 is used for:
  • the activation result output by the n-th activation layer is determined as the first weight parameter, m is a positive integer smaller than n, and n is an integer greater than or equal to 2.
  • the adjustment module 804 is used for:
  • the parameters of each layer of the dense layer are adjusted.
  • the device further includes:
  • a normalization module configured to perform normalization processing on the first weight parameter to obtain a normalized weight parameter, and the average value of the normalized weight parameter is 0;
  • the loss simulation module 802 is used for:
  • quantization loss simulation is performed on the normalized weight parameter to obtain the second weight parameter.
  • the loss simulation module 802 is further configured to:
  • quantization loss simulation is performed on the first weight parameter based on the number of quantization bits to obtain the second weight parameter, where the preset number of bits is occupied by floating-point numbers. number of bits.
  • the device further includes:
  • a quantization module configured to perform quantization processing on the weight parameters of each of the network layers in the neural network model based on the number of quantized bits, in response to satisfying the training completion condition, to obtain a quantized neural network model.
  • the quantized loss simulation is performed on the weight parameters, and the weight parameters simulated by the quantized loss are applied to the neural network model, so that the The neural network model infers the training samples, obtains the sample inference results, and then determines the inference loss based on the sample inference results, realizes the adjustment of the weight parameters in the neural network model, and completes the training of the neural network model; since the weight parameters are applied to the neural network Before the model, the quantization loss of the number of quantized bits is simulated, so it can reduce the quantization loss when quantizing the neural network model obtained by training later, and help to improve the accuracy of the neural network model after quantization.
  • FIG. 9 shows a structural block diagram of a computer device provided by an exemplary embodiment of the present application.
  • a computer device in this application may include one or more of the following components: a processor 910 and a memory 920 .
  • Processor 910 may include one or more processing cores.
  • the processor 910 uses various interfaces and lines to connect various parts of the entire device, and executes the computer device by running or executing instructions, programs, code sets or instruction sets stored in the memory 920, and calling data stored in the memory 920. various functions and processing data.
  • the processor 910 may adopt at least one of a digital signal processing (Digital Signal Processing, DSP), a Field-Programmable Gate Array (Field-Programmable Gate Array, FPGA), and a Programmable Logic Array (Programmable Logic Array, PLA).
  • DSP Digital Signal Processing
  • FPGA Field-Programmable Gate Array
  • PLA Programmable Logic Array
  • the processor 910 may integrate one or a combination of a CPU, a graphics processor (Graphics Processing Unit, GPU), an NPU, a modem, and the like.
  • the CPU mainly handles the operating system, user interface and applications;
  • the GPU is used to render and draw the content that needs to be displayed on the touch screen;
  • the NPU is used to implement artificial intelligence (AI) functions;
  • the modem is used to process Wireless communication. It can be understood that, the above-mentioned modem may not be integrated into the processor 910, but is implemented by a single chip.
  • the memory 920 may include a random access memory (Random Access Memory, RAM), or may include a read-only memory (Read-Only Memory, ROM).
  • the memory 920 includes a non-transitory computer-readable storage medium.
  • Memory 920 may be used to store instructions, programs, codes, sets of codes, or sets of instructions.
  • the memory 920 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for at least one function, instructions for implementing the following various method embodiments, etc.; storage data
  • the area may store data or the like created according to the use of the computer device.
  • Embodiments of the present application further provide a computer-readable storage medium, where the storage medium stores at least one instruction, where the at least one instruction is used to be executed by a processor to implement the method for training a neural network model according to the foregoing embodiments.
  • Embodiments of the present application provide a computer program product, where the computer program product includes computer instructions, and the computer instructions are stored in a computer-readable storage medium.
  • the processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device executes the neural network model training method provided by the above embodiments.
  • Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another.
  • a storage medium can be any available medium that can be accessed by a general purpose or special purpose computer.
  • quantization loss simulation is performed on the first weight parameter based on the number of quantized bits to obtain the second weight parameter, including:
  • the first weight parameter is quantized to obtain a quantized weight parameter, where the quantized weight parameter is a fixed-point number, and the number of bits occupied by the quantized weight parameter is the number of quantized bits;
  • inverse quantization processing is performed on the quantized weight parameter to obtain the second weight parameter.
  • first weight parameters for the network layers in the neural network model including:
  • the activation result output by the nth layer activation layer is determined as the first weight parameter, m is a positive integer less than n, and n is an integer greater than or equal to 2.
  • adjust the first weight parameter based on the inference loss including:
  • the method before obtaining the second weight parameter by performing quantization loss simulation on the first weight parameter based on the number of quantized bits, the method further includes:
  • the first weight parameter is normalized to obtain a normalized weight parameter, and the average of the normalized weight parameter is 0;
  • Perform quantization loss simulation on the first weight parameter based on the number of quantized bits to obtain the second weight parameter including:
  • quantization loss simulation is performed on the normalized weight parameter to obtain the second weight parameter.
  • performing quantization loss simulation on the first weight parameter based on the number of quantized bits to obtain the second weight parameter further including:
  • quantization loss simulation is performed on the first weight parameter based on the number of quantized bits to obtain a second weight parameter, where the preset number of bits is the number of bits occupied by the floating point number.
  • the method further includes:
  • the weight parameters of each network layer in the neural network model are quantized based on the number of quantized bits to obtain a quantized neural network model.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

一种神经网络模型的训练方法、装置、设备及存储介质,属于人工智能领域。所述方法包括:为神经网络模型中的网络层生成对应的第一权重参数(201);基于量化比特数对第一权重参数进行量化损失模拟,得到第二权重参数(202);将训练样本输入采用第二权重参数的神经网络模型,得到样本推理结果(203);基于样本推理结果确定神经网络模型的推理损失,并基于推理损失调整第一权重参数(204)。采用本申请实施例方案,能够降低后续对训练得到神经网络模型进行量化时的量化损失,有助于提高量化后神经网络模型的精度。

Description

神经网络模型的训练方法、装置、设备及存储介质
本申请要求于2021年4月23日提交的申请号为202110442606.0、发明名称为“神经网络模型的训练方法、装置、设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请实施例涉及人工智能技术领域,特别涉及一种神经网络模型的训练方法、装置、设备及存储介质。
背景技术
随着人工智能技术的不断发展,越来越多的领域开始应用基于深度学习的网络模型。比如,将基于计算机视觉的网络模型应用于图像识别和图像处理,将基于自然语言处理的网络模型应用于语义识别和自动问答等等。
然而,随着网络模型精度的不断提高,网络模型的深度也不断增加,导致网络模型的数据量也不断增加。为了能够在损失少量精度的前提下对网络模型进行压缩,使复杂的网络模型能够智能手机等嵌入式终端中,并提高模型的运行速度,模型量化(model quantization)技术也应运而生。相关技术中,通常采用后量化方式对训练完成的网络模型的权重参数进行量化。
发明内容
本申请实施例提供了一种神经网络模型的训练方法、装置、设备及存储介质。所述技术方案如下:
一方面,本申请实施例提供了一种神经网络模型的训练方法,所述方法由计算机设备执行,所述方法包括:
为神经网络模型中的网络层生成对应的第一权重参数,其中,神经网络模型中不同网络层对应不同权重参数;
基于量化比特数对所述第一权重参数进行量化损失模拟,得到第二权重参数;
将训练样本输入采用所述第二权重参数的所述神经网络模型,得到样本推理结果;以及,
基于所述样本推理结果确定所述神经网络模型的推理损失,并基于所述推理损失调整所述第一权重参数。
另一方面,本申请实施例提供了一种神经网络模型的训练装置,所述装置包括:
权重生成模块,用于为神经网络模型中的网络层生成对应的第一权重参数,其中,神经网络模型中不同网络层对应不同权重参数;
损失模拟模块,用于基于量化比特数对所述第一权重参数进行量化损失模拟,得到第二权重参数;
推理模块,用于将训练样本输入采用所述第二权重参数的所述神经网络模型,得到样本推理结果;以及,
调整模块,用于基于所述样本推理结果确定所述神经网络模型的推理损失,并基于所述推理损失调整所述第一权重参数。
另一方面,本申请实施例提供了一种计算机设备,所述计算机设备包括处理器和存储器,所述处理器包括中央处理器(Central Processing Unit,CPU)和神经网络处理器(Neural-network Processing Unit,NPU);
所述存储器存储有至少一条指令,所述至少一条指令用于被所述处理器执行以实现以下步骤:
为神经网络模型中的网络层生成对应的第一权重参数,其中,神经网络模型中不同网络层对应不同权重参数;
基于量化比特数对所述第一权重参数进行量化损失模拟,得到第二权重参数;
将训练样本输入采用所述第二权重参数的所述神经网络模型,得到样本推理结果;以及,
基于所述样本推理结果确定所述神经网络模型的推理损失,并基于所述推理损失调整所述第一权重参数。
另一方面,本申请实施例还提供了一种计算机可读存储介质,所述计算机可读存储介质存储有至少一条指令,所述至少一条指令用于被处理器执行以实现如上述方面所述的神经网络模型的训练方法。
另一方面,本申请实施例提供了一种计算机程序产品,该计算机程序产品包括计算机指令,该计算机指令存储在计算机可读存储介质中。计算机设备的处理器从计算机可读存储介质读取该计算机指令,处理器执行该计算机指令,使得该计算机设备执行上述方面提供的神经网络模型的训练方法。
附图说明
图1是相关技术与本申请实施例中神经网络模型训练过程的对比图;
图2示出了本申请一个示例性实施例提供的神经网络模型的训练方法的流程图;
图3是本申请一个示例性实施例提供的权重生成器的示意图;
图4示出了本申请另一个示例性实施例提供的神经网络模型的训练方法的流程图;
图5是本申请另一个示例性实施例提供的权重生成器的示意图;
图6示出了本申请另一个示例性实施例提供的神经网络模型的训练方法的流程图;
图7是本申请另一个示例性实施例提供的权重生成器的示意图;
图8示出了本申请一个实施例提供的神经网络模型的训练装置的结构框图;
图9示出了本申请一个示例性实施例提供的计算机设备的结构方框图。
具体实施方式
为使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施方式作进一步地详细描述。
在本文中提及的“多个”是指两个或两个以上。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。字符“/”一般表示前后关联对象是一种“或”的关系。
为了方便理解,下面对本申请实施例中涉及的名词进行说明。
量化处理:一种将浮点(float)存储运算转换为整型(int)存储运算的模型压缩技术(即将浮点数转换为定点数)。由于通常情况下浮点运算的能耗高于整型运算的能耗,且整型运算的速度高于浮点运算的速度,因此对神经网络模型进行量化处理后,能够降低神经网络模型的内存占用,提高神经网络模型的推理(运行)速度,并降低神经网络模型推理过程中的功耗。
在一种可能的实施方式中,对神经网络模型进行量化处理时通常采用线性量化。对浮点型数据进行线性量化的数学表达式为:r=round(S×(q-Z)),其中,r为量化后的整型数据,q为浮点型数据,Z(Zero Point)为浮点型数据的偏移量,S(Scale)为浮点型数据的缩放因子。
反量化处理:量化处理的逆过程,用于将整型存储运算转换为浮点存储运算。通常情况下,利用量化后的神经网络模型进行推理过程,需要对推理结果进行反量化处理,即将整型推理结果转换为浮点型推理结果。
相关技术中,开发人员完成对神经网络模型的模型训练后,对神经网络模型进行量化处理,将模型中的权重参数由浮点型转化为整型,比如将32位的浮点型数据转化为8位的整型数据,从而将量化后的神经网络模型部署在终端中。后续利用量化后的神经网络模型进行推理时,只需要在输入时对浮点型的输入数据进行量化处理,在输出时对整型的输出数据进行反量化处理,其余运算均可以使用整型运算完成,既降低了神经网络模型占用的内存,又提高了神经网络模型的推理速度。
示意性的,当神经网络模型中权重参数的数据范围为(float min,float max)时,量化过程中采用的缩放因子可以表示为:
Figure PCTCN2022081023-appb-000001
其中,**为乘方,bit num为量化比特数(比如8bit)。
相应的,量化后的权重参数可以表示为:
quant data=floor(scale*float data)
其中,quant data为量化后的整型数据,float data为量化前的浮点数据,floor表示向下取整。
然而,采用这种后量化方式对神经网络模型进行量化时,神经网络模型的精度损失较大,无法应用于对精度要求较高的模型。
为了提高量化后神经网络模型的精度,以神经网络模型由多个串联的“卷积层+归一化层+激活层”构成为例,如图1所示,不同于相关技术中,在训练过程中直接将权重参数应用于卷积层,并在模型训练过程中基于训练样本对应的样本推理结果对各个卷积层的权重参数进行调整,采用本申请实施例提供的方案,通过为各个卷积层配置权重生成器,利用该权重生成器生成为卷积层生成所需的权重参数,并基于后续量化处理采用的量化比特数对权重参数进行量化损失模拟,从而将经过量化损失模拟的权重参数应用于卷积层,后续得到样本推理结果后,即通过调整权重生成器实现对权重参数的调整,实现对神经网络模型的训练。
模型训练过程中,由于在生成权重参数时模拟出量化过程产生的量化损失, 因此后续基于量化比特数对训练得到的神经网络模型进行量化时,能够降低量化所带来的量化损失,提高量化后神经网络模型的精度;同时,利用权重生成器替换原先网络层的权重参数,使得开发者无需为网络层实现量化层(只需要将权重生成器接入神经网络模型即可),简化了模型训练的前期准备流程,提高了模型训练的效率。下面采用示意性的实施例进行说明。
请参考图2,其示出了本申请一个示例性实施例提供的神经网络模型的训练方法的流程图,本实施例以该方法应用于计算机设备为例进行说明,该方法可以包括如下步骤。
步骤201,为神经网络模型中的网络层生成对应的第一权重参数,其中,神经网络模型中不同网络层对应不同权重参数。
其中,该神经网络模型应用于计算机视觉(Computer Vision,CV)、语音技术(Speech Technology)、自然语言处理(Nature Language Processing,NLP)等技术领域,且该神经网络模型可以为卷积神经网络(Convolutional Neural Networks,CNN)模型或稠密网络(DenseNet)模型等等,本申请实施例并不对神经网络模型的具体用途以及类型进行限定。
在一种可能的实施方式中,开发人员预先将权重生成器接入神经网络模型中的网络层,利用权重生成器为网络层生成所需的第一权重参数(比如浮点型权重参数)。如图1所示,开发人员为卷积神经网络模型中的各个卷积层设置对应的权重生成器。
其中,该网络层为前向推理过程中需要使用权重参数的网络层。可选的,当该网络层为卷积神经网络模型中的卷积层时,权重生成器用于生成卷积层中卷积核的参数,当该网络层稠密网络模型中的稠密层时,该权重生成器用于生成稠密层中稠密块(dense block)的参数,本申请实施例并不对网络层以及第一权重参数的具体类型进行限定。
步骤202,基于量化比特数对第一权重参数进行量化损失模拟,得到第二权重参数。
为了降低后续模型量化过程中的量化损失,本申请实施例中,将生成的权重参数应用于神经网络模型的网络层之前,还需要基于后续量化目标(即量化比特数)对权重参数进行模拟量化损失,其中,该量化损失模拟即用于模拟量化以及反量化过程造成的参数损失。
在一种可能的实施方式中,权重生成器除了用于生成第一权重参数外,还用于根据量化比特数,对第一权重参数进行量化损失模拟,得到第二权重参数。该第二权重参数即神经网络模型实际应用的权重参数。
可选的,该量化比特数可以由开发人员预先设置,并作为参数输入权重生成器,且不同权重生成模型对应量化比特数可以相同,也可以不同。
在一些实施例中,量化比特数可以是常见的1bit,2bit、4bit、8bit等等,也可以是上述比特外其他常见的比特,本实施例对具体的量化比特数不作限定。在一个示意性的例子中,当量化比特数为8bit时,表示需要将浮点型权重参数量化为(-127,127)这一范围内的整型权重参数。
步骤203,将训练样本输入采用第二权重参数的神经网络模型,得到神经网络模型输出的样本推理结果。
为神经网络模型中各个网络层设置权重参数后,计算机设备利用训练样本对神经网络模型进行一轮训练。每一轮训练过程中,计算机设备将训练样本输入神经网络模型,由神经网络模型对训练样本进行前向推理,从而得到样本推理结果。
针对实现不同任务的神经网络模型,该样本推理结果可能不同。比如,当神经网络模型用于实现图像分类任务时,该样本推理结果为图像分类结果;当神经网络模型用于实现语义识别任务时,该样本推理结果为语义识别结果,本申请实施例并不对样本推理结果的具体内容进行限定。
在一种可能的实施方式中,计算机设备将样本训练集中的训练样本划分为若干批(batch),每一轮训练时,即将一批训练样本输入神经网络模型,从而得到该批次训练样本对应的样本推理结果。
步骤204,基于样本推理结果确定神经网络模型的推理损失,并基于推理损失调整第一权重参数。
在一种可能的实施方式中,计算机设备以训练样本对应的标注为监督,将样本标注与样本推理结果之间的差异确定为神经网络模型的推理损失。
不同于相关技术中,基于神经网络模型的推理损失,采用反向传播或梯度下降等算法直接对网络层的权重参数进行调整,本申请实施例中,计算机设备基于推理损失,对未经过量化损失模拟的第一权重参数。
可选的,计算机设备通过调整权重生成器,实现对第一权重参数的调整, 相应的,调整后的第一权重参数仍旧需要进行量化损失模拟后,才能被实际应用于神经网络模型的网络层。
可选的,完成权重参数调整后,计算机设备利用训练样本对神经网络模型进行下一轮迭代训练,即循环执行上述步骤,直至满足训练完成条件时停止训练,并将满足训练完成条件时网络层采用的权重参数确定为神经网络模型的目标权重参数。其中,训练完成条件包括推理损失收敛或达到迭代次数,本实施例对此不作限定。
综上所述,本申请实施例中,为神经网络模型中的网络层生成权重参数后,对权重参数进行量化损失模拟,并将经过量化损失模拟的权重参数应用于神经网络模型,从而通过该神经网络模型对训练样本进行推理,得到样本推理结果,进而基于该样本推理结果确定推理损失,实现对神经网络模型中权重参数的调整,完成神经网络模型的训练;由于将权重参数应用于神经网络模型前,模拟出量化比特数的量化损失,因此能够降低后续对训练得到神经网络模型进行量化时的量化损失,有助于提高量化后神经网络模型的精度。在一种可能的实施方式中,当满足训练完成条件时,计算机设备基于量化比特数,对神经网络模型中各个网络层的权重参数进行量化处理,得到量化神经网络模型,降低模型的数据量。后续进行模型部署时,即将量化神经网络模型部署在终端或服务器等设备中。
利用部署的量化神经网络模型进行推理时,首先基于量化比特数对输入数据进行量化处理,然后将量化处理后的定点数据输入量化神经网络模型,由各个网络层对定点数据进行处理,得到输出数据。由于输出数据为定点数据,因此输出层需要对输出数据进行反量化处理,最终输出浮点型的输出数据。
在一个示意性的例子中,当训练得到的卷积神经网络模型的权重参数为32bit浮点数据,经过8bit量化处理后,量化卷积神经网络模型的权重参数为8bit整型数据。利用量化卷积神经网络模型对数据进行推理时,首先将32bit浮点型输入数据转化为8bit整型输入数据,然后将8bit整型输入数据输入量化卷积神经网络模型,由卷积层对8bit数据进行卷积处理,得到8bit整型输出数据。进一步的,输出层将8bit整型输出数据反量化为32bit浮点型输出数据后输出。
关于利用生成权重参数的具体方式,在一种可能的实施方式中,如图3所示,权重生成器31由权重参数生成器311以及模拟量化器312构成,其中,权 重参数生成器311用于为网络层生成第一权重参数,模拟量化器312用于基于量化比特数,对权重参数生成器311生成的第一权重参数进行模拟量化损失,从而将经过量化损失模拟的第二权重参数提供给对应的网络层32。下面采用示例性的实施例对权重参数的具体生成过程进行说明。
请参考图4,其示出了本申请另一个示例性实施例提供的神经网络模型的训练方法的流程图,本实施例以该方法应用于计算机设备为例进行说明,该方法可以包括如下步骤。
步骤401,通过权重参数生成器,为网络层生成第一权重参数。
在一种可能的实施方式中,权重参数生成器由n层稠密层(由稠密块构成)以及n层激活层构成,n为大于等于2的整数,其中,n层稠密层用于为网络层生成权重参数。相应的,计算机设备首先通过权重参数生成器生成第一权重参数,该第一权重参数为浮点型数据。可选的,生成第一权重参数的过程可以包括如下子步骤。
1、通过第m稠密层对输入数据进行全连接处理,并将处理结果输入第m激活层,m为小于n的正整数,其中,第一层稠密层的输入数据为预设浮点型常量。
可选的,权重参数生成器由n个子单元构成,每个子单元由一层稠密层和一层激活层构成。对于第m个子单元,计算机设备获取第m-1个子单元输出的激活结果,通过第m个子单元中的第m稠密层对输入数据进行全连接(Full Connected,FC)处理,并将全连接结果输入第m个子单元的第m激活层进行激活处理。
可选的,激活层可以采用ReLU函数、Sigmoid函数、Tanh函数或其他激活函数对全连接结果进行激活处理,本实施例对此不作限定。
其中,第一个子单元中第一稠密层的输入数据为预设浮点型常量,比如,该预设浮点型常量为1.0,本实施例对预设浮点型常量的具体数值不作限定。
2、将第m激活层的激活结果输入第m+1稠密层。
进一步的,第m激活层将激活结果输入第m+1个子单元,由第m+1个子单元中的第m+1层稠密层和第m+1层激活层进行全连接和激活处理。
示意性的,如图5所示,权重参数生成器51中包含n个子单元(虚线框中内容),获取到输入的预设浮点型常量后,通过稠密层511进行全连接处理, 然后通过激活层512对全连接结果进行激活处理,从将激活结果输入下一个子单元。
3、将第n层激活层输出的激活结果确定为第一权重参数。
当第n层激活层完成激活处理后(对第n层稠密层输出的全连接结果进行激活),计算机设备将第n层激活层输出的激活结果确定为第一权重参数。
示意性的,如图5所示,计算机设备获取第n个子单元输出的浮点型第一权重参数。
其中,第一权重参数的数据量由网络层决定。在一个示意性的例子中,当网络层为卷积层,且卷积层的卷积核大小为k h×k w,输入特征图的通道数为in c,输出特征图的通道数为out c时,第一权重参数的数据量为:
number=k h×k w×in c×out c
步骤402,基于量化比特数,通过模拟量化器对第一权重参数进行量化损失模拟,得到第二权重参数。
由于浮点型数据转换为整型数据,整型数据转换为浮点型数据过程中会产生损失,因此本实施例中,计算机设备在模型训练过程中,利用模拟量化器模拟浮点型数据转换为整型数据,以及整型数据转换为浮点型数据的过程,从而将量化损失融入模型训练,以此降低后续进行模型量化造成的量化损失。
可选的,利用模拟量化器对第一权重参数进行模拟量化损失时,可以包括如下子步骤。
1、基于量化比特数以及第一权重参数的参数范围,对第一权重参数进行量化处理,得到量化权重参数,量化权重参数为定点数,且量化权重参数占用的比特数为量化比特数。
在一种可能的实施方式中,计算机设备确定第一权重参数的参数范围,从而基于该参数范围以及开发人员设置的量化比特数,对第一权重参数进行线性量化处理,得到第一权重参数对应的量化权重参数(定点数)。其中,该参数范围可以根据最小第一权重参数以及最大第一权重参数的绝对值确定。
在一个示意性的例子中,当第一权重参数为32bit浮点型数据时,且量化比特数为8bit时,第一权重参数即被量化为(-127,127)这一范围内的整型数据。
2、基于量化比特数,对量化权重参数进行反量化处理,得到第二权重参数。
进一步的,计算机设备基于量化比特数,对量化权重参数进行反量化处理, 模拟出整型数据转化为浮点型数据的过程,得到第二权重参数。由于量化以及反量化过程中存在量化误差,因此模拟量化器输出的第二权重参数不同于输入的第一权重参数,从而达到模拟量化损失的效果。
示意性的,计算机设备对第一权重参数进行量化损失模拟的过程可以采用如下公式表示:
Figure PCTCN2022081023-appb-000002
其中,data 1为未经过量化损失模拟的权重参数,data 2为经过量化损失模拟的权重参数,bitnum为量化比特数,a为未经过量化损失模拟的权重参数的参数范围。
示意性的,如图5所示,模拟量化器52首先对第一权重参数进行量化处理,得到整型的量化权重参数,然后对量化权重参数进行反量化处理,重新得到浮点型的第二权重参数。
在一种可能的实施方式中,在进行模拟量化损失前,计算机设备检测量化比特数是否小于预设比特数,若小于,则进行模拟量化损失。若等于,则无需进行模拟量化损失(即模拟量化器失效),可选的,计算机设备直接将第一权重参数确定为第二权重参数。其中,预设比特数为浮点数占用的比特数。
比如,当设置的量化比特数小于32bit时,计算机设备通过模拟量化器模拟量化损失,当设置的量化比特数等于32bit时,计算机设备则无需通过模拟量化器模拟量化损失。
步骤403,将训练样本输入采用第二权重参数的神经网络模型,得到神经网络模型输出的样本推理结果。
本步骤的实施方式可以参考步骤202,本实施例在此不再赘述。
步骤404,基于样本推理结果确定神经网络模型的推理损失,并基于推理损失调整权重参数生成器。
在一种可能的实施方式中,当权重生成器中的权重参数生成器由稠密层和激活层构成时,计算机设备基于推理损失,对各个权重生成器中稠密层的参数进行调整,从而达到调整网络层权重的效果。
本实施例中,计算机设备通过多个稠密层和激活层产生网络层所需的权重参数,并基于量化比特数对产生的权重参数进行量化和反量化处理,模拟出权重参数在浮点型和整型之间转换时产生的损失,将量化损失作为模型训练的一 部分,有助于降低后量化造成的量化损失,提高量化后模型的精度。
由于权重参数的分布较为分散(不满足对称分布),而进行量化处理时需要基于对称的定点范围,因此权重参数的量化损失较大。为了进一步降低量化损失,提高模型精度,在一种可能的实施方式中,权重生成器中还包含归一化(normalization)层。通过该归一化层对权重参数进行归一化处理,使权重参数满足对称分布,从而降低量化过程中的损失。
请参考图6,其示出了本申请另一个示例性实施例提供的神经网络模型的训练方法的流程图,本实施例以该方法应用于计算机设备为例进行说明,该方法可以包括如下步骤。
步骤601,通过权重参数生成器,为网络层生成第一权重参数。
本步骤的实施方式可以参考步骤401,本实施例在此不再赘述。
步骤602,对第一权重参数进行归一化处理,得到归一化权重参数,归一化权重参数的均值为0。
在一种可能的实施方式中,计算机设备通过权重生成器的归一化层对第一权重参数进行标准化处理(一种特殊的归一化处理方式),改变第一权重参数的数据分布,使处理后得到的归一化权重参数的均值为0,即归一化权重参数的数据对称。
可选的,归一化层可以采用层归一化(Layer Norm)、批归一化(Batch Norm)或组归一化(Group Norm)中的任一种方式进行归一化处理,本实施例对此不作限定。
示意性的,在图5的基础上,如图7所示,计算机设备通过归一化层53对权重参数生成器51输出的浮点型第一权重参数进行归一化处理,得到均值为0的归一化权重参数(仍旧为浮点型)。
步骤603,基于量化比特数,通过模拟量化器对归一化权重参数进行量化损失模拟,得到第二权重参数。
可选的,计算机设备通过模拟量化器对归一化权重参数进行量化损失模拟,得到第二权重参数。由于归一化权重参数符合对称分布,因此基于对称的定点范围对归一化权重参数进行量化处理时,权重参数的量化损失更小。
示意性的,如图7所示,计算机设备通过模拟量化器52对归一化权重参数进行量化和反量化处理,最终输出第二权重参数。
步骤604,将训练样本输入采用第二权重参数的神经网络模型,得到神经网络模型输出的样本推理结果。
步骤605,基于样本推理结果确定神经网络模型的推理损失,并基于推理损失调整权重参数生成器。
步骤604至605的实施方式可以参考步骤403至404,本实施例在此不再赘述。
本实施例中,通过对生成的权重参数进行归一化处理,得到对称分布的归一化权重参数,进而通过对归一化权重参数进行模拟量化损失,降低因权重参数不对称导致的量化损失,进一步降低后续模型量化的量化损失;并且,通过归一化处理约束了权重参数的参数范围,有助于加快模型的训练速度。
需要说明的是,上述权重生成器还可以引入掩膜(mask)参数,从而利用mask参数对产生的权重参数进行稀疏化或剪枝处理,进一步降低神经网络模型的数据量以及后续推理过程的计算量。
请参考图8,其示出了本申请一个实施例提供的神经网络模型的训练装置的结构框图。该装置可以通过软件、硬件或者两者的结合实现成为计算机设备的全部或一部分。该装置包括:
权重生成模块801,用于为神经网络模型中的网络层生成对应的第一权重参数,其中,神经网络模型中不同网络层对应不同权重参数;
损失模拟模块802,用于基于量化比特数对所述第一权重参数进行量化损失模拟,得到第二权重参数;
推理模块803,用于将训练样本输入采用所述第二权重参数的所述神经网络模型,得到样本推理结果;以及,
调整模块804,用于基于所述样本推理结果确定所述神经网络模型的推理损失,并基于所述推理损失调整所述第一权重参数。
可选的,所述损失模拟模块802,用于:
基于所述量化比特数以及所述第一权重参数的参数范围,对所述第一权重参数进行量化处理,得到量化权重参数,所述量化权重参数为定点数,且所述量化权重参数占用的比特数为所述量化比特数;以及,
基于所述量化比特数,对所述量化权重参数进行反量化处理,得到所述第 二权重参数。
可选的,所述权重生成模块801,用于:
通过第m稠密层对输入数据进行全连接处理,并将处理结果输入第m激活层,其中,第一层稠密层的输入数据为预设浮点数常量;
将所述第m激活层的激活结果输入第m+1稠密层;以及,
将第n层激活层输出的激活结果确定为所述第一权重参数,m为小于n的正整数,n为大于等于2的整数。
可选的,所述调整模块804,用于:
基于所述推理损失,对各层所述稠密层的参数进行调整。
可选的,所述装置还包括:
归一化模块,用于对所述第一权重参数进行归一化处理,得到归一化权重参数,所述归一化权重参数的均值为0;
所述损失模拟模块802,用于:
基于所述量化比特数,对所述归一化权重参数进行量化损失模拟,得到所述第二权重参数。
可选的,所述损失模拟模块802,还用于:
响应于所述量化比特数小于预设比特数,基于所述量化比特数对所述第一权重参数进行量化损失模拟,得到所述第二权重参数,所述预设比特数为浮点数占用的比特数。
可选的,所述装置还包括:
量化模块,用于响应于满足训练完成条件,基于所述量化比特数对所述神经网络模型中各个所述网络层的权重参数进行量化处理,得到量化神经网络模型。
综上所述,本申请实施例中,为神经网络模型中的网络层生成权重参数后,对权重参数进行量化损失模拟,并将经过量化损失模拟的权重参数应用于神经网络模型,从而通过该神经网络模型对训练样本进行推理,得到样本推理结果,进而基于该样本推理结果确定推理损失,实现对神经网络模型中权重参数的调整,完成神经网络模型的训练;由于将权重参数应用于神经网络模型前,模拟出量化比特数的量化损失,因此能够降低后续对训练得到神经网络模型进行量化时的量化损失,有助于提高量化后神经网络模型的精度。
请参考图9,其示出了本申请一个示例性实施例提供的计算机设备的结构方框图。本申请中的计算机设备可以包括一个或多个如下部件:处理器910和存储器920。
处理器910可以包括一个或者多个处理核心。处理器910利用各种接口和线路连接整个设备内的各个部分,通过运行或执行存储在存储器920内的指令、程序、代码集或指令集,以及调用存储在存储器920内的数据,执行计算机设备的各种功能和处理数据。可选地,处理器910可以采用数字信号处理(Digital Signal Processing,DSP)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)、可编程逻辑阵列(Programmable Logic Array,PLA)中的至少一种硬件形式来实现。处理器910可集成CPU、图像处理器(Graphics Processing Unit,GPU)、NPU和调制解调器等中的一种或几种的组合。其中,CPU主要处理操作系统、用户界面和应用程序等;GPU用于负责触摸显示屏所需要显示的内容的渲染和绘制;NPU用于实现人工智能(Artificial Intelligence,AI)功能;调制解调器用于处理无线通信。可以理解的是,上述调制解调器也可以不集成到处理器910中,单独通过一块芯片进行实现。
存储器920可以包括随机存储器(Random Access Memory,RAM),也可以包括只读存储器(Read-Only Memory,ROM)。可选地,该存储器920包括非瞬时性计算机可读介质(non-transitory computer-readable storage medium)。存储器920可用于存储指令、程序、代码、代码集或指令集。存储器920可包括存储程序区和存储数据区,其中,存储程序区可存储用于实现操作系统的指令、用于至少一个功能的指令、用于实现下述各个方法实施例的指令等;存储数据区可存储根据计算机设备的使用所创建的数据等。
除此之外,本领域技术人员可以理解,上述附图所示出的计算机设备的结构并不构成对计算机设备的限定,计算机设备可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置,本实施例在此不再赘述。
本申请实施例还提供了一种计算机可读存储介质,该存储介质存储有至少一条指令,至少一条指令用于被处理器执行以实现如上述实施例所述的神经网络模型的训练方法。
本申请实施例提供了一种计算机程序产品,该计算机程序产品包括计算机指令,该计算机指令存储在计算机可读存储介质中。计算机设备的处理器从计 算机可读存储介质读取该计算机指令,处理器执行该计算机指令,使得该计算机设备执行上述实施例提供的神经网络模型的训练方法。
本领域技术人员应该可以意识到,在上述一个或多个示例中,本申请实施例所描述的功能可以用硬件、软件、固件或它们的任意组合来实现。当使用软件实现时,可以将这些功能存储在计算机可读介质中或者作为计算机可读介质上的一个或多个指令或代码进行传输。计算机可读介质包括计算机存储介质和通信介质,其中通信介质包括便于从一个地方向另一个地方传送计算机程序的任何介质。存储介质可以是通用或专用计算机能够存取的任何可用介质。
可选的,基于量化比特数对第一权重参数进行量化损失模拟,得到第二权重参数,包括:
基于量化比特数以及第一权重参数的参数范围,对第一权重参数进行量化处理,得到量化权重参数,量化权重参数为定点数,且量化权重参数占用的比特数为量化比特数;以及,
基于量化比特数,对量化权重参数进行反量化处理,得到第二权重参数。
可选的,为神经网络模型中的网络层生成对应的第一权重参数,包括:
通过第m稠密层对输入数据进行全连接处理,并将处理结果输入第m激活层,其中,第一层稠密层的输入数据为预设浮点数常量;
将第m激活层的激活结果输入第m+1稠密层;以及,
将第n层激活层输出的激活结果确定为第一权重参数,m为小于n的正整数,n为大于等于2的整数。
可选的,基于推理损失调整第一权重参数,包括:
基于推理损失,对各层稠密层的参数进行调整。
可选的,基于量化比特数对第一权重参数进行量化损失模拟,得到第二权重参数之前,该方法还包括:
对第一权重参数进行归一化处理,得到归一化权重参数,归一化权重参数的均值为0;
基于量化比特数对第一权重参数进行量化损失模拟,得到第二权重参数,包括:
基于量化比特数,对归一化权重参数进行量化损失模拟,得到第二权重参数。
可选的,基于量化比特数对第一权重参数进行量化损失模拟,得到第二权重参数,还包括:
响应于量化比特数小于预设比特数,基于量化比特数对第一权重参数进行量化损失模拟,得到第二权重参数,预设比特数为浮点数占用的比特数。
可选的,该方法还包括:
响应于满足训练完成条件,基于量化比特数对神经网络模型中各个网络层的权重参数进行量化处理,得到量化神经网络模型。
以上所述仅为本申请的可选实施例,并不用以限制本申请,凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。

Claims (17)

  1. 一种神经网络模型的训练方法,所述方法包括:
    为神经网络模型中的网络层生成对应的第一权重参数,其中,神经网络模型中不同网络层对应不同权重参数;
    基于量化比特数对所述第一权重参数进行量化损失模拟,得到第二权重参数;
    将训练样本输入采用所述第二权重参数的所述神经网络模型,得到样本推理结果;以及,
    基于所述样本推理结果确定所述神经网络模型的推理损失,并基于所述推理损失调整所述第一权重参数。
  2. 根据权利要求1所述的方法,其中,所述基于量化比特数对所述第一权重参数进行量化损失模拟,得到第二权重参数,包括:
    基于所述量化比特数以及所述第一权重参数的参数范围,对所述第一权重参数进行量化处理,得到量化权重参数,所述量化权重参数为定点数,且所述量化权重参数占用的比特数为所述量化比特数;以及,
    基于所述量化比特数,对所述量化权重参数进行反量化处理,得到所述第二权重参数。
  3. 根据权利要求1所述的方法,其中,所述为神经网络模型中的网络层生成对应的第一权重参数,包括:
    通过第m稠密层对输入数据进行全连接处理,并将处理结果输入第m激活层,其中,第一层稠密层的输入数据为预设浮点数常量;
    将所述第m激活层的激活结果输入第m+1稠密层;以及,
    将第n层激活层输出的激活结果确定为所述第一权重参数,m为小于n的正整数,n为大于等于2的整数。
  4. 根据权利要求3所述的方法,其中,所述基于所述推理损失调整所述第一权重参数,包括:
    基于所述推理损失,对各层所述稠密层的参数进行调整。
  5. 根据权利要求1所述的方法,其中,所述基于量化比特数对所述第一权重参数进行量化损失模拟,得到第二权重参数之前,所述方法还包括:
    对所述第一权重参数进行归一化处理,得到归一化权重参数,所述归一化权重参数的均值为0;
    所述基于量化比特数对所述第一权重参数进行量化损失模拟,得到第二权重参数,包括:
    基于所述量化比特数,对所述归一化权重参数进行量化损失模拟,得到所述第二权重参数。
  6. 根据权利要求1至5任一所述的方法,其中,所述基于量化比特数对所述第一权重参数进行量化损失模拟,得到第二权重参数,还包括:
    响应于所述量化比特数小于预设比特数,基于所述量化比特数对所述第一权重参数进行量化损失模拟,得到所述第二权重参数,所述预设比特数为浮点数占用的比特数。
  7. 根据权利要求1至5任一所述的方法,其中,所述方法还包括:
    响应于满足训练完成条件,基于所述量化比特数对所述神经网络模型中各个所述网络层的权重参数进行量化处理,得到量化神经网络模型。
  8. 一种神经网络模型的训练装置,所述装置包括:
    权重生成模块,用于为神经网络模型中的网络层生成对应的第一权重参数,其中,神经网络模型中不同网络层对应不同权重参数;
    损失模拟模块,用于基于量化比特数对所述第一权重参数进行量化损失模拟,得到第二权重参数;
    推理模块,用于将训练样本输入采用所述第二权重参数的所述神经网络模型,得到样本推理结果;以及,
    调整模块,用于基于所述样本推理结果确定所述神经网络模型的推理损失,并基于所述推理损失调整所述第一权重参数。
  9. 根据权利要求8所述的装置,其中,所述损失模拟模块,用于:
    基于所述量化比特数以及所述第一权重参数的参数范围,对所述第一权重参数进行量化处理,得到量化权重参数,所述量化权重参数为定点数,且所述量化权重参数占用的比特数为所述量化比特数;以及,
    基于所述量化比特数,对所述量化权重参数进行反量化处理,得到所述第二权重参数。
  10. 根据权利要求8所述的装置,其中,所述权重生成模块,用于:
    通过第m稠密层对输入数据进行全连接处理,并将处理结果输入第m激活层,其中,第一层稠密层的输入数据为预设浮点数常量;
    将所述第m激活层的激活结果输入第m+1稠密层;以及,
    将第n层激活层输出的激活结果确定为所述第一权重参数,m为小于n的正整数,n为大于等于2的整数。
  11. 根据权利要求10所述的装置,其中,所述调整模块,用于:
    基于所述推理损失,对各层所述稠密层的参数进行调整。
  12. 根据权利要求8所述的装置,其中,所述装置还包括:
    归一化模块,用于对所述第一权重参数进行归一化处理,得到归一化权重参数,所述归一化权重参数的均值为0;
    所述损失模拟模块,用于:
    基于所述量化比特数,对所述归一化权重参数进行量化损失模拟,得到所述第二权重参数。
  13. 根据权利要求8至12任一所述的装置,其中,所述损失模拟模块,还用于:
    响应于所述量化比特数小于预设比特数,基于所述量化比特数对所述第一权重参数进行量化损失模拟,得到所述第二权重参数,所述预设比特数为浮点数占用的比特数。
  14. 根据权利要求8至12任一所述的装置,其中,所述装置还包括:
    量化模块,用于响应于满足训练完成条件,基于所述量化比特数对所述神经网络模型中各个所述网络层的权重参数进行量化处理,得到量化神经网络模型。
  15. 一种计算机设备,所述计算机设备包括处理器和存储器,所述处理器包括CPU和NPU;
    所述存储器存储有至少一条指令,所述至少一条指令用于被所述处理器执行以实现以下步骤:
    为神经网络模型中的网络层生成对应的第一权重参数,其中,神经网络模型中不同网络层对应不同的权重参数;
    基于量化比特数对所述第一权重参数进行量化损失模拟,得到第二权重参数;
    将训练样本输入采用所述第二权重参数的所述神经网络模型,得到样本推理结果;以及,
    基于所述样本推理结果确定所述神经网络模型的推理损失,并基于所述推理损失调整所述第一权重参数。
  16. 一种计算机可读存储介质,所述计算机可读存储介质存储有至少一条指令,所述至少一条指令用于被处理器执行以实现如权利要求1至7任一所述的神经网络模型的训练方法。
  17. 一种计算机程序产品,所述计算机程序产品包括计算机指令,所述计算机指令存储在计算机可读存储介质中;可穿戴式设备的处理器从所述计算机可读存储介质读取所述计算机指令,所述处理器执行所述计算机指令,使得所述终端执行如权利要求1至7任一所述的神经网络模型的训练方法。
PCT/CN2022/081023 2021-04-23 2022-03-15 神经网络模型的训练方法、装置、设备及存储介质 WO2022222649A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110442606.0 2021-04-23
CN202110442606.0A CN115238883A (zh) 2021-04-23 2021-04-23 神经网络模型的训练方法、装置、设备及存储介质

Publications (1)

Publication Number Publication Date
WO2022222649A1 true WO2022222649A1 (zh) 2022-10-27

Family

ID=83666885

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/081023 WO2022222649A1 (zh) 2021-04-23 2022-03-15 神经网络模型的训练方法、装置、设备及存储介质

Country Status (2)

Country Link
CN (1) CN115238883A (zh)
WO (1) WO2022222649A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116894189B (zh) * 2023-09-11 2024-01-05 中移(苏州)软件技术有限公司 一种模型训练方法、装置、设备及可读存储介质
CN117077740B (zh) * 2023-09-25 2024-03-12 荣耀终端有限公司 模型量化方法和设备

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190347550A1 (en) * 2018-05-14 2019-11-14 Samsung Electronics Co., Ltd. Method and apparatus with neural network parameter quantization
CN110555508A (zh) * 2018-05-31 2019-12-10 北京深鉴智能科技有限公司 人工神经网络调整方法和装置
CN110929837A (zh) * 2018-09-19 2020-03-27 北京搜狗科技发展有限公司 神经网络模型压缩方法及装置
CN110969251A (zh) * 2019-11-28 2020-04-07 中国科学院自动化研究所 基于无标签数据的神经网络模型量化方法及装置
CN111783957A (zh) * 2020-07-02 2020-10-16 厦门美图之家科技有限公司 模型量化训练方法、装置、机器可读存储介质及电子设备
CN112101524A (zh) * 2020-09-07 2020-12-18 上海交通大学 可在线切换比特位宽的量化神经网络的方法及系统

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190347550A1 (en) * 2018-05-14 2019-11-14 Samsung Electronics Co., Ltd. Method and apparatus with neural network parameter quantization
CN110555508A (zh) * 2018-05-31 2019-12-10 北京深鉴智能科技有限公司 人工神经网络调整方法和装置
CN110929837A (zh) * 2018-09-19 2020-03-27 北京搜狗科技发展有限公司 神经网络模型压缩方法及装置
CN110969251A (zh) * 2019-11-28 2020-04-07 中国科学院自动化研究所 基于无标签数据的神经网络模型量化方法及装置
CN111783957A (zh) * 2020-07-02 2020-10-16 厦门美图之家科技有限公司 模型量化训练方法、装置、机器可读存储介质及电子设备
CN112101524A (zh) * 2020-09-07 2020-12-18 上海交通大学 可在线切换比特位宽的量化神经网络的方法及系统

Also Published As

Publication number Publication date
CN115238883A (zh) 2022-10-25

Similar Documents

Publication Publication Date Title
CN108510067B (zh) 基于工程化实现的卷积神经网络量化方法
WO2022222649A1 (zh) 神经网络模型的训练方法、装置、设备及存储介质
CN112257858B (zh) 一种模型压缩方法及装置
WO2020233130A1 (zh) 一种深度神经网络压缩方法及相关设备
CN111368993B (zh) 一种数据处理方法及相关设备
EP4131020A1 (en) Data processing method and device
CN109934336B (zh) 基于最优结构搜索的神经网络动态加速平台设计方法及神经网络动态加速平台
CN107909147A (zh) 一种数据处理方法及装置
WO2020237904A1 (zh) 一种基于幂指数量化的神经网络压缩方法
TWI744724B (zh) 處理卷積神經網路的方法
CN111401550A (zh) 神经网络模型量化方法、装置及电子设备
CN111126602A (zh) 一种基于卷积核相似性剪枝的循环神经网络模型压缩方法
WO2023020613A1 (zh) 一种模型蒸馏方法及相关设备
WO2023098544A1 (zh) 基于局部稀疏约束的结构化剪枝方法和装置
CN112598129A (zh) 基于ReRAM神经网络加速器的可调硬件感知的剪枝和映射框架
CN112884146A (zh) 一种训练基于数据量化与硬件加速的模型的方法及系统
TWI738048B (zh) 算數框架系統及操作浮點至定點算數框架的方法
CN116126354A (zh) 模型部署方法、装置、电子设备以及存储介质
WO2022163861A1 (ja) ニューラルネットワーク生成装置、ニューラルネットワーク演算装置、エッジデバイス、ニューラルネットワーク制御方法およびソフトウェア生成プログラム
Lai et al. Rethinking machine learning development and deployment for edge devices
WO2022246986A1 (zh) 数据处理方法、装置、设备及计算机可读存储介质
CN114444476A (zh) 信息处理方法、装置和计算机可读存储介质
CN117151178A (zh) 一种面向fpga的cnn定制网络量化加速方法
CN115688868B (zh) 一种模型训练方法及计算设备
CN112183744A (zh) 一种神经网络剪枝方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22790758

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22790758

Country of ref document: EP

Kind code of ref document: A1