CN115705486A - Method and device for training quantitative model, electronic equipment and readable storage medium - Google Patents

Method and device for training quantitative model, electronic equipment and readable storage medium Download PDF

Info

Publication number
CN115705486A
CN115705486A CN202110896946.0A CN202110896946A CN115705486A CN 115705486 A CN115705486 A CN 115705486A CN 202110896946 A CN202110896946 A CN 202110896946A CN 115705486 A CN115705486 A CN 115705486A
Authority
CN
China
Prior art keywords
quantization
data
floating point
range
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110896946.0A
Other languages
Chinese (zh)
Inventor
杨金霄
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Oppo Mobile Telecommunications Corp Ltd
Original Assignee
Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Oppo Mobile Telecommunications Corp Ltd filed Critical Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority to CN202110896946.0A priority Critical patent/CN115705486A/en
Publication of CN115705486A publication Critical patent/CN115705486A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The application relates to a training method and device of a quantitative model, computer equipment and a storage medium. The method comprises the following steps: acquiring sample data; training a quantization model through the sample data at a first stage to obtain a floating point weight range of the quantization model; based on the floating point weight range and the sample data, performing second-stage training on the quantization model trained in the first stage, and obtaining a floating point activation range of the trained target quantization model in the second-stage training; the floating point weight range and the floating point activation range in the target quantization model are used for performing quantization processing on data to be quantized. By adopting the method, the precision of the quantization model can be improved.

Description

Method and device for training quantitative model, electronic equipment and readable storage medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to a method and an apparatus for training a quantization model, an electronic device, and a computer-readable storage medium.
Background
With the development of computer technology, data quantization processing technology has emerged, for example, floating point data can be converted into integer data by performing quantization processing on the floating point data through a quantization model, so as to reduce the amount of computation. While the quantization accuracy of the conventional quantization model is not high.
Disclosure of Invention
The embodiment of the application provides a training method and device of a quantization model, electronic equipment and a computer readable storage medium, which can improve the quantization precision of the quantization model.
A method of training a quantization model, comprising:
acquiring sample data;
training a quantization model through the sample data at a first stage to obtain a floating point weight range of the quantization model;
based on the floating point weight range and the sample data, performing second-stage training on the quantization model trained in the first stage, and obtaining a floating point activation range of the trained target quantization model in the second-stage training;
and the floating point weight range and the floating point activation range in the target quantization model are used for performing quantization processing on data to be quantized.
A training apparatus for a quantification model, comprising:
the sample acquisition module is used for acquiring sample data;
the first training module is used for carrying out first-stage training on a quantization model through the sample data to obtain a floating point weight range of the quantization model;
the second training module is used for carrying out second-stage training on the quantization model trained in the first stage based on the floating point weight range and the sample data, and obtaining a floating point activation range of the trained target quantization model in the second-stage training;
and the floating point weight range and the floating point activation range in the target quantization model are used for performing quantization processing on data to be quantized.
A computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:
acquiring sample data;
training a quantization model through the sample data at a first stage to obtain a floating point weight range of the quantization model;
based on the floating point weight range and the sample data, performing second-stage training on the quantization model trained in the first stage, and obtaining a floating point activation range of the trained target quantization model in the second-stage training;
the floating point weight range and the floating point activation range in the target quantization model are used for performing quantization processing on data to be quantized.
A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the steps of:
acquiring sample data;
training a quantization model through the sample data at a first stage to obtain a floating point weight range of the quantization model;
based on the floating point weight range and the sample data, performing second-stage training on the quantization model which is trained in the first stage, and obtaining a floating point activation range of the trained target quantization model in the second-stage training;
the floating point weight range and the floating point activation range in the target quantization model are used for performing quantization processing on data to be quantized.
In the training method and apparatus for the quantization model in this embodiment, the electronic device and the computer-readable storage medium acquire sample data, perform the first-stage training on the quantization model through the sample data to obtain the floating point weight range of the quantization model, perform the second-stage training on the quantization model that is trained in the first stage based on the floating point weight range and the sample data, and obtain the floating point activation range of the trained target quantization model in the second-stage training, so that the processing precision of the quantization model can be higher through the training in different stages. The floating point weight range and the floating point activation range of the target quantization model can quantize data to be quantized to convert the floating point data into integer data, so that the operation amount of the data can be reduced, and the quantization processing speed is increased. Floating point data is quantized into integer data, and occupation of a memory can be reduced.
A quantization processing method, comprising:
the method comprises the steps of obtaining a quantization type corresponding to data to be quantized, and obtaining a fixed point weight range and a fixed point activation range corresponding to the quantization type;
determining a weight quantization parameter corresponding to the data to be quantized based on the fixed point weight range and a floating point weight range of a trained quantization model;
determining an activated quantization parameter corresponding to the data to be quantized based on the fixed point activation range and the floating point activation range of the trained quantization model; the trained quantization model is obtained by training a first stage floating point weight range and a second stage floating point activation range based on sample data;
quantizing the data to be quantized into target data under the quantization type based on the weight quantization parameter and the activation quantization parameter.
A quantitative processing device comprising:
the type acquisition module is used for acquiring a quantization type corresponding to data to be quantized and acquiring a fixed point weight range and a fixed point activation range corresponding to the quantization type;
the weight determining module is used for determining a weight quantization parameter corresponding to the data to be quantized based on the fixed point weight range and a floating point weight range of a trained quantization model;
the activation determining module is used for determining an activation quantization parameter corresponding to the data to be quantized based on the fixed point activation range and the floating point activation range of the trained quantization model; the trained quantization model is obtained by training a first stage floating point weight range and a second stage floating point activation range based on sample data;
and the quantization module is used for quantizing the data to be quantized into target data under the quantization type based on the weight quantization parameter and the activation quantization parameter.
A computer device comprising a memory storing a computer program and a processor implementing the following steps when the computer program is executed:
the method comprises the steps of obtaining a quantization type corresponding to data to be quantized, and obtaining a fixed point weight range and a fixed point activation range corresponding to the quantization type;
determining a weight quantization parameter corresponding to the data to be quantized based on the fixed point weight range and a floating point weight range of a trained quantization model;
determining an activation quantization parameter corresponding to the data to be quantized based on the fixed point activation range and the floating point activation range of the trained quantization model; the trained quantization model is obtained by training a first stage floating point weight range and a second stage floating point activation range based on sample data;
quantizing the data to be quantized into target data under the quantization type based on the weight quantization parameter and the activation quantization parameter.
A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the steps of:
the method comprises the steps of obtaining a quantization type corresponding to data to be quantized, and obtaining a fixed point weight range and a fixed point activation range corresponding to the quantization type;
determining a weight quantization parameter corresponding to the data to be quantized based on the fixed point weight range and a floating point weight range of a trained quantization model;
determining an activation quantization parameter corresponding to the data to be quantized based on the fixed point activation range and the floating point activation range of the trained quantization model; the trained quantization model is obtained by training a first stage floating point weight range and a second stage floating point activation range based on sample data;
quantizing the data to be quantized into target data under the quantization type based on the weight quantization parameter and the activation quantization parameter.
The quantization processing method, the quantization processing device, the electronic equipment and the computer readable storage medium obtain the quantization type corresponding to the data to be quantized, and obtain the fixed point weight range and the fixed point activation range corresponding to the quantization type; determining a weight quantization parameter corresponding to the data to be quantized based on the fixed point weight range and the floating point weight range of the trained quantization model; determining an activated quantization parameter corresponding to data to be quantized based on the fixed point activation range and the floating point activation range of the trained quantization model; the trained quantization model is obtained by training a first stage floating point weight range and a second stage floating point activation range based on sample data; based on the weight quantization parameter and the activation quantization parameter, the data to be quantized can be accurately quantized from a floating point type to integer target data in a quantization type.
When the method is applied to image processing, the operation is carried out based on the integer data, so that the operation amount can be reduced, the operation speed can be increased, and the subsequent image processing speed can be increased.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the embodiments or the prior art descriptions will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and other drawings can be obtained by those skilled in the art without creative efforts.
FIG. 1 is a diagram of an embodiment of an application environment of a method for training a quantization model;
FIG. 2 is a flow diagram of a method for training a quantitative model in one embodiment;
FIG. 3 is a diagram illustrating a method for training a quantization model in one embodiment;
FIG. 4 is a schematic diagram of a pseudo-quantization node in one embodiment;
FIG. 5 is a flowchart of a method of training a quantization model in another embodiment;
FIG. 6 is a flow diagram of a quantization process in one embodiment;
FIG. 7 is a block diagram showing an example of a structure of a training apparatus for a quantization model;
FIG. 8 is a block diagram showing the structure of a quantization processing apparatus according to an embodiment;
FIG. 9 is a diagram illustrating an internal structure of an electronic device in one embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
FIG. 1 is a diagram illustrating an application environment of a method for training a quantization model in an embodiment. As shown in fig. 1, the application environment includes an electronic device 110 and a server 120. In an embodiment, the electronic device 110 and the server 120 may each separately perform the method for training the quantitative model, and the electronic device 110 and the server 120 may also cooperatively perform the method for training the quantitative model. When the electronic device 110 and the server 120 cooperatively execute the method for training the quantization model, the electronic device 110 may obtain sample data and send the sample data to the server 120. The server 120 performs the first-stage training on the quantization model through the sample data to obtain the floating-point weight range of the quantization model. The server 120 performs the second stage training on the quantization model trained in the first stage based on the floating point weight range and the sample data, and obtains the floating point activation range of the trained target quantization model in the second stage training. The floating point weight range and the floating point activation range in the target quantization model are used for performing quantization processing on data to be quantized.
Wherein the electronic device 110 communicates with the server 120 over a network. The electronic device 110 may be, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices, and the server 120 may be implemented by a stand-alone server or a server cluster composed of a plurality of servers.
FIG. 2 is a flow diagram of a method for training a quantization model in one embodiment. The method for training the quantization model in this embodiment is described by taking the example of the method executed on the electronic device in fig. 1. As shown in fig. 2, the training method of the quantitative model includes:
step 202, sample data is obtained.
The sample data refers to floating point data and is used for training a quantization model, and the sample data can be multimedia data. The multimedia data may specifically be image data, text data, audio data, video data, and the like, but is not limited thereto. Floating point data, i.e., floating point type data, e.g., 0.13, 5.789, etc.
Specifically, the electronic device may obtain sample data from a local or other device or network. For example, the electronic device may obtain multimedia data from a local or other device or network. The multimedia data may be at least one of image data, text data, audio data, and video data, and the at least one of the image data, the text data, the audio data, and the video data is used as sample data.
In one embodiment, the electronic device may obtain multimedia data from a local or other device or a network, and convert content information of the multimedia data into numerical information to obtain numerical information corresponding to the sample multimedia data.
For example, the electronic device may obtain a sample image from a local or other device or a network, may obtain the sample image by shooting, and may further extract a video frame from the video as the sample image, and convert image information of the sample image into data information to obtain image data of the sample image. The Image Data (Image Data) is a set of gradation values of each pixel (pixel) expressed as a numerical value. The image data is used as sample data for training a quantization model, and the trained quantization model is used for performing quantization processing on the image data of the image to be processed.
Similarly, the electronic device may obtain a sample text, convert the sample text into corresponding data information, and obtain text data. The text data is used as sample data for training a quantization model, and the trained quantization model is used for performing quantization processing on the text data of the text to be processed.
Similarly, the electronic device may obtain the sample audio and the sample video, convert the sample audio and the sample video into corresponding data information, and obtain audio data and video data. The audio data and the video data are used as sample data for training a quantization model, and the trained quantization model is used for performing quantization processing on the audio data and the video data of the audio to be processed and the video to be processed.
And 204, training the quantization model in the first stage through sample data to obtain a floating point weight range of the quantization model.
Specifically, the floating-point weight range is a weight range corresponding to floating-point data. The electronic equipment inputs sample data into the quantization model, the quantization model uses the sample data for training in the first stage, the weight range of the quantization model is adjusted in the training in the first stage, and the floating point weight range in the quantization model is obtained until the training in the first stage is stopped.
And step 206, based on the floating point weight range and the sample data, performing second-stage training on the quantization model trained in the first stage, and obtaining a floating point activation range of the trained target quantization model in the second-stage training.
The floating point weight range and the floating point activation range in the target quantization model are used for performing quantization processing on data to be quantized.
Specifically, the floating point activation range is an activation range corresponding to floating point data, and the floating point data can be mapped into a specific range. After the training in the first stage is stopped, the floating point weight range of the quantization model is obtained, and in the training in the second stage, the floating point weight range is fixed, namely the floating point weight range is kept unchanged in the training in the second stage. And performing second-stage training on the quantization model trained in the first stage based on the floating point weight range and the sample data, and adjusting the activation range of the quantization model in the second-stage training until the second-stage training is stopped to obtain the floating point activation range in the quantization model.
The floating point weight range and the floating point activation range in the target quantization model are used for performing quantization processing on data to be quantized.
It can be understood that, by training the quantization model through the image data, the floating point weight range of the image and the floating point activation range of the image in the target quantization model can be obtained. The floating point weight range of the image and the floating point activation range of the image are used for carrying out quantization processing on the image data of the image to be processed so as to quantize the image data in a floating point mode into integer image data. The quantization of the text data, the audio data, and the video data is similar to the quantization of the image data, and is not described herein again.
In the training method of the quantization model in this embodiment, sample data is obtained, the quantization model is trained in the first stage through the sample data to obtain a floating point weight range of the quantization model, the quantization model trained in the first stage is trained in the second stage based on the floating point weight range and the sample data, and a floating point activation range of a trained target quantization model is obtained in the training in the second stage, so that the processing precision of the quantization model can be higher through training in different stages. The floating point weight range and the floating point activation range of the target quantization model can quantize data to be quantized to convert the floating point data into integer data, so that the operation amount of the data can be reduced, and the quantization processing speed is increased. And floating point data are quantized into integer data, and occupation of a memory can be reduced.
In one embodiment, the training of the quantization model in the first stage through the sample data to obtain the floating point weight range of the quantization model includes:
for the training of the first stage, carrying out forward propagation processing on the sample data through a quantitative model to obtain a first intermediate result; determining a first loss error of the quantization model through forward propagation processing, and counting an initial weight range of the quantization model through the forward propagation processing; and in the process of carrying out back propagation processing on the first intermediate result, adjusting the initial weight range based on the first loss error until the quantization model meets a first stop condition, and obtaining the floating point weight range of the quantization model.
The Forward Propagation (Forward Propagation) process refers to a process from input to acquisition of a loss error, and the backward Propagation (backward Propagation) process is a process in which parameters are continuously adjusted by the loss error. The initial weight range is the initial weight range corresponding to the floating point data.
Specifically, the electronic device inputs sample data into the quantization model, and the sample data is used for training the quantization model in the first stage. In the training of the first stage, the quantization model performs forward propagation processing on sample data, and a first intermediate result is obtained through convolution processing, quantization processing and inverse quantization processing on the sample data in the forward propagation processing. And in the forward propagation processing, the initial weight range of the quantization model is counted through convolution processing, quantization processing and inverse quantization processing of the sample data, and the loss error of the quantization model, namely the first loss error, is determined.
In the process of performing back propagation processing on the first intermediate result, the initial weight range is adjusted based on the first loss error. And after the initial weight range is adjusted, continuing training the quantitative model until the quantitative model meets a first stop condition, and obtaining the floating point weight range of the quantitative model.
In one embodiment, the first stop condition may be that the first loss error is smaller than an error threshold, or that the number of training iterations of the first stage reaches a preset number of iterations, or the like.
In one embodiment, the electronic device may determine a first loss error of the quantization model from the first intermediate result and the sample data.
In one embodiment, the initial activation range of the metrology model is also normalized by the forward propagation process.
In this embodiment, for the training at the first stage, forward propagation processing is performed on the sample data through the quantization model to obtain a first intermediate result, a first loss error of the quantization model is determined through the forward propagation processing, and the initial weight range of the quantization model is unified through the forward propagation processing, so that in the process of performing backward propagation processing on the first intermediate result, the initial weight range is adjusted based on the first loss error, so that the weight range of the quantization model is optimal, and the initial weight range is stopped until the quantization model meets a first stop condition, so that the final floating point weight range of the quantization model can be accurately obtained through the training at the first stage.
In one embodiment, the forward propagation processing is performed on the sample data through a quantization model to obtain a first intermediate result, and the method includes:
carrying out quantization processing on the features output by the current convolution operator in the quantization model through convolution processing to obtain quantization features; the output characteristic of the first convolution operator is obtained by performing convolution processing on sample data; and performing inverse quantization processing on the quantized features to obtain inverse quantized features, taking the inverse quantized features as input of a next convolution operator, taking a next convolution operator as a current convolution operator, returning to the step of performing quantization processing on the features output by the current convolution operator through convolution processing, and continuing to execute the steps until a first intermediate result is obtained after the last inverse quantization processing.
Specifically, in the first stage of training, sample data is input to a first convolution operator of the quantization model, and the first convolution operator performs convolution processing on the sample data to obtain output characteristics. And quantizing the output features to obtain quantized features, and performing inverse quantization on the quantized features to obtain inverse quantized features.
And taking the dequantization characteristics as the input of a second convolution operator, performing convolution processing on the input dequantization characteristics through the second convolution operator, performing quantization processing on the characteristics obtained through convolution processing, and performing dequantization processing on the quantization characteristics obtained through quantization processing to obtain dequantization characteristics.
It can be understood that, in the training of the first stage, the quantization processing is performed on the features output by the current convolution operator through the convolution processing to obtain quantization features, the inverse quantization processing is performed on the quantization features to obtain inverse quantization features, and the inverse quantization features are used as the input of the next convolution operator. And taking the next convolution operator as the current convolution operator, returning to the step of performing quantization processing on the features output by the current convolution operator through convolution processing, and continuing to execute the step until the last inverse quantization processing is performed to obtain a first intermediate result.
In one embodiment, the convolution operator convolution kernel size is consistent with the width and height of the input feature, and is a matrix multiplication operation of the input feature input and the convolution kernel, and the convolution operator processes the following formula:
out float =Matmul(input float ,kernel float )
output float =output scale (output quant -output zero )
input float =input scale (input quant -input zero )
kernel float =kernel scale (kernel quant -kernel zero )
wherein, output float Representing output floating-point data, input float Representing input floating-point data, kernel float Represents floating point data corresponding to the convolution kernel, matmul (input) float ,kernel float ) Represents input float And kernel float Matrix multiplication processing is performed.
output scale Representing fixed point quantization scaling factor, output quant Fixed-point data, output, representing output zero Representing a fixed point quantization offset. input scale Representing a floating point quantisation scaling factor, input quant Indicating input fixed-point data zero Representing the floating point quantization offset. kernel (Kernel) scale Representing the quantization scaling factor, kernel, corresponding to the convolution kernel quant Representing fixed-point data, kernel, corresponding to the convolution kernel zero Representing the quantization offset corresponding to the convolution kernel.
The formula can be used for obtaining a quantitative calculation formula and fixed point data output by the convolution operator quant
In this embodiment, a feature output by a current convolution operator in a quantization model through convolution processing is subjected to quantization processing to obtain a quantization feature, and the quantization feature is subjected to inverse quantization processing to obtain an inverse quantization feature, so that an error generated after data is quantized and inverse quantized can be determined, the inverse quantization feature is used as an input of a next convolution operator, a next convolution operator is used as a current convolution operator, the step of performing quantization processing on the feature output by the current convolution operator through convolution processing is returned and continuously executed until a first intermediate result is obtained after the last inverse quantization processing, and therefore a loss error generated in a processing process can be accurately obtained to adjust a weight range of the quantization model.
In one embodiment, the second stage of training is performed on the quantization model trained in the first stage based on the floating point weight range and the sample data, and the floating point activation range of the trained target quantization model is obtained in the second stage of training, which includes:
for the training of the second stage, carrying out forward propagation processing on the sample data through a floating point weight range in the quantization model after the training of the first stage to obtain a second intermediate result; and determining a second loss error of the quantization model based on forward propagation processing, and adjusting the initial activation range of the quantization model based on the second loss error in the process of performing backward propagation processing on the second intermediate result until the quantization model meets a second stop condition, so as to obtain the floating point activation range of the trained target quantization model.
Wherein, the initial activation range is obtained by forward propagation processing statistics in the training of the first stage.
Specifically, in the first stage of training, the initial activation range of the metric model is normalized by the forward propagation process. And performing second-stage training on the quantization model after the first-stage training through the sample data and the floating point weight range so as to adjust the initial activation range in the second-stage training.
The quantization model trained in the first stage performs forward propagation processing on the sample data, convolution processing is performed on the sample data based on the floating point weight range in the forward propagation processing, and quantization processing and inverse quantization processing are performed on the data after the convolution processing based on the initial activation range to obtain a second intermediate result. And performing convolution processing on the sample data based on the floating point weight range, performing quantization processing and inverse quantization processing on the data after the convolution processing based on the initial activation range, and determining a loss error of the quantization model, namely a second loss error.
In the backward propagation processing, the second intermediate result is taken as an input, the processing in the reverse process to the forward propagation processing is performed, and in the backward propagation processing of the second intermediate result, the initial activation range is adjusted based on the second loss error. And continuing training the quantization model based on the adjusted initial activation range until the quantization model meets a second stop condition, so as to obtain the floating point activation range of the quantization model.
In one embodiment, the second stop condition may be that the second loss error is smaller than an error threshold, or that the number of training iterations of the second stage reaches a preset number of iterations, or the like.
In one embodiment, the electronic device may determine a second loss error of the quantization model from the second intermediate result and the sample data.
In this embodiment, for the training of the second stage, forward propagation processing is performed on the sample data through the floating point weight range in the quantization model after the training of the first stage to obtain a second intermediate result, a second loss error of the quantization model is determined based on the forward propagation processing, and in the process of performing backward propagation processing on the second intermediate result, the initial activation range of the quantization model is adjusted based on the second loss error until the quantization model meets a second stop condition, so that the target quantization model and the floating point activation range in the target quantization model can be accurately obtained through the training of the second stage.
Fig. 3 is a schematic diagram of a training method of a quantization model in an embodiment.
The quantization model used for training includes convolution operators (i.e., dense operators) and pseudo quantization nodes (i.e., fake Quant). The quantization model is trained in two stages, in the first stage, the weight range of the quantization model is set as the training parameter, namely param trainable = dense's weights. Training in the first stage involves forward propagationProcessing and back propagation processing. In the forward propagation processing, sample data is input into a first convolution operator for convolution processing so as to count an initial weight range corresponding to the first convolution operator. And taking the characteristic output by the first convolution operator as the input of the first pseudo-quantization node, carrying out quantization processing and inverse quantization processing on the input characteristic through the pseudo-quantization node to obtain the inverse quantization characteristic output by the first pseudo-quantization node, and counting the initial activation range corresponding to the first quantization node. And taking the output of the first pseudo-quantization node as the input of the next convolution operator, and executing the convolution processing, the quantization processing and the inverse quantization processing until the inverse quantization characteristic output by the last pseudo-quantization node, namely the first intermediate result, is obtained.
And calculating a first loss error of the quantization model through the first intermediate result and the sample data, and calculating the gradient of the weight range to the loss of the quantization model according to the first loss error.
The pseudo quantization node has a structure as shown in fig. 4, and includes a quantization process for an input feature, and an inverse quantization process for a quantization feature obtained after the quantization process, so as to obtain an inverse quantization feature.
The pseudo quantization node can be regarded as self-defined operator realization logic, namely input data are quantized and then subjected to inverse quantization so as to simulate errors brought by quantization. The quantization process is as follows:
Figure BDA0003198239140000071
x Q =clamp(0,N levels -1,x int )
Figure BDA0003198239140000072
where x is the floating point data of the input, x Q For the quantized data, Δ is the quantization parameter scale, and z is the quantization zero-point.
N levels Is the quantization range, e.g. up to 8bit, N levels =2 8 =256。
The inverse quantization process is as follows:
x float =(x Q -z)Δ
x float is floating point data obtained after inverse quantization, and the quantization error is diff = x-x float
In the back propagation processing, the last pseudo quantization node in the forward propagation processing is used as the first quantization node in the back propagation processing, and the last convolution operator is used as the first convolution operator in the back propagation processing. And taking the first intermediate result as the input of the first pseudo-quantization node, and carrying out inverse quantization processing and quantization processing on the first intermediate result through the first pseudo-quantization node to obtain the characteristic of processing output of the first pseudo-quantization node. And adjusting the initial weight range of the first convolution operator based on the gradient of the loss of the quantization model, taking the output of the first pseudo-quantization node as the input of the first convolution operator, and performing convolution processing on the input characteristics through the adjusted weight range of the first convolution operator. And taking the output of the first convolution operator as the input of the next pseudo-quantization node, and sequentially executing the inverse quantization processing, the quantization processing and the convolution processing until the characteristics of the output of the last convolution operator are obtained, so that the adjustment of the initial weight range of each convolution operator is completed, and the updated quantization model is obtained.
In the back propagation processing, after the initial weight range of each convolution operator is adjusted, the loss error of the updated quantization model is calculated according to the input features of the back propagation processing and the features output by the last convolution operator, so as to judge whether the updated quantization model reaches a convergence state, namely, a first stop condition is met.
And when the convergence state is not reached, performing the training of the first stage on the updated quantization model again until the obtained quantization model reaches the convergence state, thereby obtaining the floating point weight range corresponding to each convolution operator in the quantization model.
Performing a second stage of training on the quantization model trained in the first stage, wherein the floating point weight range is set as unavailable in the second stage of trainingParameters of training, setting activation Range to parameters of training, i.e. param trainable = activations' range. The second stage of training includes a forward propagation process and a backward propagation process. And inputting the sample data into a first convolution operator, and performing convolution processing on the sample data through a floating point weight range corresponding to the first convolution operator. And taking the characteristic output by the first convolution operator as the input of the first pseudo-quantization node, and performing quantization processing and inverse quantization processing on the input characteristic through the pseudo-quantization node to obtain the inverse quantization characteristic output by the first pseudo-quantization node. And taking the output of the first pseudo-quantization node as the input of the next convolution operator, and executing the convolution processing, the quantization processing and the inverse quantization processing until obtaining the inverse quantization characteristic output by the last pseudo-quantization node, namely a second intermediate result.
And calculating a second loss error of the quantization model according to the second intermediate result and the sample data, and calculating the gradient of the activation range to the loss of the quantization model according to the second loss error.
In the back propagation process, the last pseudo quantization node in the forward propagation process is used as the first quantization node in the back propagation process, and the last convolution operator is used as the first convolution operator in the back propagation process. And adjusting the initial activation range of the first convolution operator based on the gradient of the quantization model loss, taking the second intermediate result as the input of the first pseudo-quantization node, and performing inverse quantization processing and quantization processing on the second intermediate result through the initial activation range of the first pseudo-quantization node to obtain the characteristic of the processing output of the first pseudo-quantization node. And taking the output of the first pseudo-quantization node as the input of a first convolution operator, and performing convolution processing on the input characteristics through the adjusted activation range of the first convolution operator. And taking the output of the first convolution operator as the input of the next pseudo-quantization node, and sequentially executing the inverse quantization processing, the quantization processing and the convolution processing until the characteristics of the output of the last convolution operator are obtained, so that the adjustment of the initial activation range of each convolution operator is completed, and the updated quantization model is obtained.
In the back propagation processing, after the initial activation range of each convolution operator is adjusted, the loss error of the updated quantization model is calculated according to the input features of the back propagation processing and the features output by the last convolution operator, so as to judge whether the updated quantization model reaches a convergence state, namely, a second stop condition is met.
And when the convergence state is not reached, performing the second stage of training on the updated quantization model again until the obtained quantization model reaches the convergence state, thereby obtaining the floating point activation range corresponding to each convolution operator in the quantization model.
It can be understood that the pseudo quantization nodes are used for training, and the trained target quantization model includes convolution operators and does not include the pseudo quantization nodes.
FIG. 5 is a flow diagram illustrating a method for training a quantization model in one embodiment.
And 510, inserting a pseudo quantization node into the floating point model to obtain a quantization model.
Step 520, performing a first stage of training on the quantization model, wherein the first stage of training comprises steps 521-525:
step 521, acquiring sample data and inputting the sample data into the quantization model.
Step 522, forward propagation processing is performed based on the sample data, and the initial weight range and the initial activation range are counted.
Step 523, calculate the first loss error of the quantization model through the forward propagation process, and execute step 524.
And step 524, performing back propagation processing based on the first intermediate result and the first loss error obtained by the forward propagation processing, and adjusting the initial weight range in the back propagation processing to obtain an updated quantization model.
Step 525, determining whether the updated quantization model reaches a convergence state, otherwise, returning to step 521, if yes, completing the training of the first stage, and executing step 530.
Step 530, after the training of the first stage is completed, the floating point weight range in the quantization model trained in the first stage is set as an untrained parameter, and the initial activation range is set as a trainable parameter. And, the activation range and the weight range are not counted in the forward propagation of the second stage.
And 540, performing second-stage training on the quantization model subjected to the first-stage training, wherein the second-stage training comprises steps 541-545:
step 541, sample data is obtained and input into the quantization model trained in the first stage.
And 542, performing forward propagation processing based on the sample data to obtain a second intermediate result.
In step 543, a second loss error of the quantization model is calculated through a forward propagation process, and step 544 is performed.
And 544, performing back propagation processing based on the second intermediate result and the second loss error obtained by the forward propagation processing, and adjusting the initial weight activation range in the back propagation processing to obtain an updated quantization model.
Step 545, judging whether the updated quantization model reaches a convergence state, otherwise, returning to execute step 541, and if so, ending the second stage of training to obtain the trained target quantization model.
In one embodiment, the method further comprises:
acquiring a preset weight range and a preset activation range corresponding to a preset quantization type; determining a quantization parameter of the target quantization model according to the preset weight range, the preset activation range, the floating point weight range and the floating point activation range; the quantization parameter is used for quantizing the data to be quantized into the data corresponding to the preset quantization type.
The preset quantization type refers to a fixed point type to be quantized for floating point data, such as 8 bits (bit), 12 bits, and the like. Different preset quantization types have different preset weight ranges and preset activation ranges, the preset weight ranges are weight ranges of fixed points, and the preset activation ranges are activation ranges of the fixed points.
Specifically, the electronic device obtains a preset weight range and a preset activation range corresponding to a preset quantization type, and calculates a quantization parameter of the target quantization model according to the preset weight range, the preset activation range, the floating point weight range and the floating point activation range. When the data to be quantized needs to be quantized, the data to be quantized is input into the target quantization model, and the target quantization model quantizes the data to be quantized through the quantization parameters to obtain the data corresponding to the preset quantization type.
In one embodiment, the data to be quantized is floating point data, the preset quantization type is a fixed point quantization type, the floating point data is quantized through a quantization parameter in the target quantization model, the floating point data can be quantized into the fixed point data, and the fixed point data is in the specific quantization type.
In one embodiment, the electronic device may obtain a preset weight range and a preset activation range corresponding to a plurality of preset quantization types, respectively, and determine a quantization parameter corresponding to the same preset quantization type according to the preset weight range and the preset activation range corresponding to the same preset quantization type, and the floating point weight range and the floating point activation range, so as to retain the quantization parameter in the target quantization model. And obtaining the quantization parameters corresponding to each preset quantization type respectively according to the same processing mode. When the data to be quantized needs to be quantized, determining which preset quantization type the data to be quantized needs to be quantized, and performing quantization processing on the data to be quantized through the corresponding quantization parameter in the target quantization model to obtain fixed point data under the corresponding preset quantization type.
In the embodiment, a preset weight range and a preset activation range corresponding to a preset quantization type are obtained; and determining quantization parameters of the target quantization model according to the preset weight range, the preset activation range, the floating point weight range and the floating point activation range, and accurately quantizing the data to be quantized into data corresponding to the preset quantization type through the quantization parameters, so that the floating point data can be accurately mapped into fixed point data of a specific type.
FIG. 6 is a flow diagram of a quantization process in one embodiment. The quantization processing method in this embodiment is described by taking the electronic device in fig. 1 as an example. As shown in fig. 6, the quantization processing method includes:
step 602, obtaining a quantization type corresponding to data to be quantized, and obtaining a fixed point weight range and a fixed point activation range corresponding to the quantization type.
The data to be quantized refers to floating point data which needs to be quantized. The quantization type refers to a fixed point type to be quantized by floating point data, for example, 8 bits (bit), 12 bits, and the like. Different quantization types have different fixed point weight ranges and fixed point activation ranges.
Specifically, the electronic device may obtain data to be quantized, determine a quantization type of the data to be quantized, and thereby obtain a fixed point weight range and a fixed point activation range corresponding to the quantization type.
And step 604, determining a weight quantization parameter corresponding to the data to be quantized based on the fixed point weight range and the floating point weight range of the trained quantization model.
Specifically, the electronic device may determine the weight quantization parameter corresponding to the data to be quantized according to the fixed-point weight range and the floating-point weight range of the trained quantization model.
In one embodiment, the electronic device can determine a fixed point weight maximum and a fixed point weight minimum in a fixed point weight range and a floating point weight maximum and a floating point weight minimum in a floating point weight range. And calculating a weight quantization parameter corresponding to the data to be quantized according to the fixed point weight maximum value, the fixed point weight minimum value, the floating point weight maximum value and the floating point weight minimum value.
Step 606, determining an activated quantization parameter corresponding to the data to be quantized based on the fixed point activation range and the floating point activation range of the trained quantization model; the trained quantization model is obtained by training a first stage floating point weight range and a second stage floating point activation range based on sample data.
Specifically, the electronic device may determine an activated quantization parameter corresponding to the data to be quantized according to the fixed-point activation range and the floating-point activation range of the trained quantization model.
In one embodiment, the electronic device may determine a fixed point activation maximum and a fixed point activation minimum in a fixed point activation range, and a floating point activation maximum and a floating point activation minimum in a floating point activation range. And calculating an activation quantization parameter corresponding to the data to be quantized according to the fixed point activation maximum value, the fixed point activation minimum value, the floating point activation maximum value and the floating point activation minimum value.
And step 608, quantizing the data to be quantized into target data under the quantization type based on the weight quantization parameter and the activation quantization parameter.
The target data is integer data under a quantization type.
Specifically, the electronic device performs quantization processing on the data to be quantized according to the weight quantization parameter and the activation quantization parameter to obtain target data in a quantization type.
In one embodiment, the electronic device may obtain a quantization type corresponding to data to be quantized, and obtain a fixed point weight range and a fixed point activation range corresponding to the quantization type. And inputting the data to be quantized, the fixed point weight range and the fixed point activation range into the trained quantization model. The trained quantization model comprises a floating point weight range and a floating point activation range, and the trained quantization model determines a weight quantization parameter corresponding to the data to be quantized according to the fixed point weight range and the floating point weight range of the trained quantization model. And determining an activated quantization parameter corresponding to the data to be quantized according to the fixed point activation range and the floating point activation range of the trained quantization model. And quantizing the data to be quantized through the weight quantization parameter and the activation quantization parameter, and outputting the target data under the quantization type.
In the embodiment, a quantization type corresponding to data to be quantized is obtained, and a fixed point weight range and a fixed point activation range corresponding to the quantization type are obtained; determining a weight quantization parameter corresponding to the data to be quantized based on the fixed point weight range and the floating point weight range of the trained quantization model; determining an activation quantization parameter corresponding to data to be quantized based on the fixed point activation range and the floating point activation range of the trained quantization model; the trained quantization model is obtained by training a first stage floating point weight range and a second stage floating point activation range based on sample data; based on the weight quantization parameter and the activation quantization parameter, the data to be quantized can be accurately quantized from a floating point type to integer target data in a quantization type. The operation is carried out based on the integer data, so that the operation amount can be reduced, the operation speed can be improved, and the subsequent image processing speed can be improved.
In one embodiment, determining a weight quantization parameter corresponding to data to be quantized based on a floating point weight range and a fixed point weight range includes:
determining a weight quantization factor corresponding to data to be quantized based on a floating point weight maximum value and a floating point weight minimum value in a floating point weight range and a fixed point weight maximum value and a fixed point weight minimum value in a fixed point weight range; and determining the weight quantization offset corresponding to the data to be quantized according to the weight quantization factor, the floating point weight minimum and the fixed point weight minimum.
The weight quantization factor refers to a weight scaling factor for quantizing floating point data into fixed point data, and the weight quantization offset, i.e., a weight quantization zero (zero-point), refers to a weight numerical value corresponding to a zero point of the floating point data under the fixed point data when the floating point data is quantized into the fixed point data.
Specifically, the weight quantization parameter includes a weight quantization factor and a weight quantization offset. The electronic device can determine a fixed point weight maximum and a fixed point weight minimum in a fixed point weight range and a floating point weight maximum and a floating point weight minimum in a floating point weight range. And calculating a first difference value between the maximum floating point weight value and the minimum floating point weight value and a second difference value between the maximum fixed point weight value and the minimum fixed point weight value, and determining a weight quantization factor corresponding to the data to be quantized according to the first difference value and the second difference value. Further, the ratio of the first difference value and the second difference value is used as a weight quantization factor corresponding to the data to be quantized.
And calculating the ratio between the floating point weight minimum and the weight quantization factor, and determining the weight quantization offset corresponding to the data to be quantized according to the fixed point weight minimum and the ratio. Further, the difference between the minimum value of the fixed point weight and the ratio is used as the weight quantization offset corresponding to the data to be quantized.
For example, the weight quantization factor and the weight quantization offset may be calculated according to the following formulas:
Figure BDA0003198239140000111
wherein scale is a weight quantization factor, r max Is the maximum value of the floating-point weight, r min Is the floating point weight minimum, q max Is the maximum value of the fixed point weight, q min For fixed point weight minimum, zero is the weight quantization offset.
In this embodiment, based on the floating point weight maximum and the floating point weight minimum in the floating point weight range and the fixed point weight maximum and the fixed point weight minimum in the fixed point weight range, the weight quantization factor corresponding to the data to be quantized can be accurately determined. And accurately determining the weight quantization offset corresponding to the data to be quantized according to the weight quantization factor, the floating point weight minimum and the fixed point weight minimum.
In one embodiment, the active quantization parameter includes an active quantization factor and an active quantization offset; determining an activation quantization parameter corresponding to data to be quantized based on the floating point activation range and the fixed point activation range, including: determining an activation quantization factor corresponding to data to be quantized based on a floating point activation maximum value and a floating point activation minimum value in a floating point activation range and a fixed point activation maximum value and a fixed point activation minimum value in a fixed point activation range; and determining the activation quantization offset corresponding to the data to be quantized according to the activation quantization factor, the floating point activation minimum value and the fixed point activation minimum value.
The activation quantization factor refers to an activation scaling factor for quantizing floating point data into fixed point data, the activation quantization offset refers to an activation zero point, and the activation quantization offset refers to an activation numerical value corresponding to the zero point of the floating point data under the fixed point data when the floating point data is quantized into the fixed point data.
Specifically, the active quantization parameter includes an active quantization factor and an active quantization offset. The electronic device may determine a fixed point activation maximum and a fixed point activation minimum in a fixed point activation range, and a floating point activation maximum and a floating point activation minimum in a floating point activation range. And calculating a third difference value between the floating point activation maximum value and the floating point activation minimum value and a fourth difference value between the fixed point activation maximum value and the fixed point activation minimum value, and determining an activation quantization factor corresponding to the data to be quantized according to the third difference value and the fourth difference value. And further, taking the ratio of the third difference value to the fourth difference value as an activated quantization factor corresponding to the data to be quantized.
And calculating the ratio between the floating point activation minimum value and the activation quantization factor, and determining the activation quantization offset corresponding to the data to be quantized according to the fixed point activation minimum value and the ratio. Further, the difference between the fixed point activation minimum and the ratio is used as the activation quantization offset corresponding to the data to be quantized.
In one embodiment, quantizing the data to be quantized into target data under a quantization type based on the weight quantization parameter and the activation quantization parameter includes:
and performing convolution processing on the data to be quantized based on the weight quantization parameter, and performing activation processing on the result after the convolution processing by activating the quantization parameter to obtain target data under the quantization type.
Specifically, the electronic device performs convolution processing on the data to be quantized based on the weight quantization parameter to obtain a convolution processing result. And activating the result after the convolution processing based on the activated quantization parameter to obtain target data under the quantization type.
In this embodiment, the influence of the weight quantization parameter and the activation quantization parameter on data quantization is considered, convolution processing is performed on the data to be quantized based on the weight quantization parameter, and activation processing is performed on a result after the convolution processing by activating the quantization parameter, so that the obtained target data under the quantization type is more accurate.
In one embodiment, the data to be quantized is multimedia data to be quantized; the well-trained quantization model is a quantization model obtained by training a floating point weight range of multimedia in a first stage based on sample multimedia data and training a floating point activation range of the multimedia in a second stage; and the floating point weight range of the multimedia and the floating point activation range of the multimedia in the trained quantization model are used for performing quantization processing on the multimedia data to be quantized, and the target data is the quantized target multimedia data.
Specifically, the electronic device obtains sample multimedia, converts multimedia information of the sample multimedia into data information, and obtains sample multimedia data of the sample multimedia. And training the quantization model in the first stage through the sample multimedia data to obtain the floating point weight range of the multimedia in the quantization model. And training the quantization model trained in the first stage in the second stage based on the floating point weight range of the multimedia and the sample multimedia data, and acquiring the floating point activation range of the multimedia in the trained target quantization model in the training in the second stage.
The electronic equipment acquires the multimedia to be quantized and the corresponding quantization type, inputs the multimedia to be quantized and the corresponding quantization type into a target quantization model, and extracts data information of the multimedia to be quantized by the target quantization model to obtain the multimedia data to be quantized. And the target quantization model acquires a fixed point weight range and a fixed point activation range corresponding to the quantization type, and determines a weight quantization parameter corresponding to the multimedia data to be quantized based on the fixed point weight range and the floating point weight range of the multimedia. And determining an activated quantization parameter corresponding to the multimedia data to be quantized based on the fixed point activation range and the floating point activation range of the multimedia quantization model. And quantizing the multimedia data to be quantized into the target multimedia data under the quantization type based on the weight quantization parameter and the activation quantization parameter.
In one embodiment, multimedia recognition, multimedia classification, multimedia segmentation, and the like processes may be performed based on the target multimedia data, but are not limited thereto.
It can be understood that, the processing procedure of determining the weight quantization parameter corresponding to the multimedia data to be quantized based on the fixed point weight range and the floating point weight range of the multimedia, the processing procedure of determining the activation quantization parameter corresponding to the multimedia data to be quantized based on the fixed point activation range and the floating point activation range of the quantization model of the multimedia, refer to the above processing procedure of obtaining the quantization type corresponding to the data to be quantized and obtaining the fixed point weight range and the fixed point activation range corresponding to the quantization type, and the processing procedure of determining the weight quantization parameter corresponding to the data to be quantized based on the fixed point weight range and the floating point weight range of the trained quantization model.
It is understood that, when the multimedia data to be quantized is image data to be quantized, the floating point weight range of the multimedia refers to the floating point weight range of the image, the floating point activation range of the multimedia refers to the floating point activation range of the image, and the target multimedia data refers to the target image data. When the multimedia data to be quantized is text data to be quantized, the floating point weight range of the multimedia refers to the floating point weight range of the text, the floating point activation range of the multimedia refers to the floating point activation range of the text, and the target multimedia data refers to the target text data.
In this embodiment, the quantization processing method is applied to quantization processing of multimedia data, two-stage training is performed on a quantization model through sample multimedia data, a floating point weight range of multimedia in the quantization model is determined through the first-stage training, and a floating point activation range of the multimedia in the quantization model is determined based on the sample multimedia data and the weight range of the multimedia in the second-stage training of the quantization model after the first-stage training, so that a trained target quantization model for performing multimedia data quantization is obtained, thereby improving quantization precision of the target quantization model and enabling quantization of the multimedia data to be more accurate.
The quantization processing is carried out on the multimedia data to be quantized through the target quantization model, and the data such as floating-point images, texts, audios and videos can be accurately quantized into integer data. The operation is carried out based on the integer data, so that the operation amount can be reduced, the operation speed can be increased, and the related processing speed of images, texts, audios and videos can be increased.
In one embodiment, a training method of a quantification model is provided, comprising:
the electronic equipment acquires sample data, and carries out quantization processing on the features output by the current convolution operator in the quantization model through convolution processing to obtain quantization features; the output characteristic of the first convolution operator is obtained by performing convolution processing on sample data.
And then, the electronic equipment performs inverse quantization processing on the quantized features to obtain inverse quantized features, the inverse quantized features are used as input of a next convolution operator, a next convolution operator is used as a current convolution operator, the step of performing quantization processing on the features output by the current convolution operator through convolution processing is returned, and the step is continuously executed until a first intermediate result is obtained after the last inverse quantization processing.
Next, the electronic device determines a first loss error of the quantization model through a forward propagation process, and normalizes an initial weight range and an initial activation range of the quantization model through the forward propagation process.
Further, in the process of performing back propagation processing on the first intermediate result, the electronic device adjusts the initial weight range based on the first loss error until the quantization model meets the first stop condition, so as to obtain a floating point weight range of the quantization model.
Then, for the training of the second stage, the electronic device performs forward propagation processing on the sample data through the floating point weight range in the quantization model after the training of the first stage to obtain a second intermediate result.
Further, the electronic device determines a second loss error of the quantization model based on forward propagation processing, and adjusts an initial activation range of the quantization model based on the second loss error in a process of performing backward propagation processing on a second intermediate result until the quantization model meets a second stop condition, so as to obtain a floating point activation range of the trained target quantization model.
The method for quantizing the data to be quantized through the target quantization model comprises the following steps:
and acquiring a quantization type corresponding to the data to be quantized, and inputting the data to be quantized and the corresponding quantization type into a target quantization model.
The target quantization model obtains a fixed point weight range and a fixed point activation range corresponding to the quantization type, and determines a weight quantization factor corresponding to the data to be quantized based on a floating point weight maximum value and a floating point weight minimum value in the floating point weight range and a fixed point weight maximum value and a fixed point weight minimum value in the fixed point weight range.
And then, determining a weight quantization offset corresponding to the data to be quantized according to the weight quantization factor, the floating point weight minimum and the fixed point weight minimum.
Further, an activation quantization factor corresponding to the data to be quantized is determined based on the floating point activation maximum value and the floating point activation minimum value in the floating point activation range and the fixed point activation maximum value and the fixed point activation minimum value in the fixed point activation range.
And then, determining an activated quantization offset corresponding to the data to be quantized according to the activated quantization factor, the floating point activated minimum value and the fixed point activated minimum value.
And further, performing convolution processing on the data to be quantized based on the weight quantization parameter, activating the result after the convolution processing by activating the quantization parameter, and outputting the target data under the quantization type.
In this embodiment, the quantization model is trained in the first stage through sample data to obtain a floating point weight range of the quantization model, the activation range parameters are trained again on the premise that the weight range is fixed after the training in the first stage, the quantization model trained in the first stage is trained in the second stage based on the floating point weight range and the sample data, and the floating point activation range of the trained target quantization model is obtained in the training in the second stage, so that the quantization processing precision of the quantization model is higher through the training in different stages, and the performance of the quantization model is further improved.
The floating point weight range and the floating point activation range of the target quantization model can quantize data to be quantized so as to quantize the floating point data into integer data, reduce the operand of the data and improve the quantization processing speed. Floating point data is quantized into integer data, and occupation of a memory can be reduced.
It should be understood that although the various steps in the flowcharts of fig. 2-6 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 2-6 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performing the sub-steps or stages is not necessarily sequential, but may be performed alternately or alternately with other steps or at least some of the sub-steps or stages of other steps.
Fig. 7 is a block diagram illustrating a structure of a training apparatus for a quantization model according to an embodiment. As shown in fig. 7, the apparatus includes:
a sample obtaining module 702, configured to obtain sample data.
The first training module 704 is configured to perform a first-stage training on the quantization model through sample data to obtain a floating point weight range of the quantization model.
The second training module 706 is configured to perform second-stage training on the quantization model trained in the first stage based on the floating point weight range and the sample data, and obtain a floating point activation range of the trained target quantization model in the second-stage training; the floating point weight range and the floating point activation range in the target quantization model are used for performing quantization processing on data to be quantized.
In this embodiment, sample data is acquired, the quantization model is trained in the first stage through the sample data to obtain a floating point weight range of the quantization model, the quantization model trained in the first stage is trained in the second stage based on the floating point weight range and the sample data, and a floating point activation range of the trained target quantization model is obtained in the training in the second stage, so that the processing precision of the quantization model can be higher through training in different stages. The floating point weight range and the floating point activation range of the target quantization model can quantize data to be quantized to convert the floating point data into integer data, so that the operation amount of the data can be reduced, and the quantization processing speed is increased. Floating point data is quantized into integer data, and occupation of a memory can be reduced.
In an embodiment, the first training module 706 is further configured to, for the training in the first stage, perform forward propagation processing on the sample data through the quantization model to obtain a first intermediate result; determining a first loss error of the quantization model through forward propagation processing, and counting an initial weight range of the quantization model through the forward propagation processing; and in the process of carrying out back propagation processing on the first intermediate result, adjusting the initial weight range based on the first loss error until the quantization model meets a first stop condition, and obtaining the floating point weight range of the quantization model.
In this embodiment, for the training of the first stage, forward propagation processing is performed on sample data through a quantization model to obtain a first intermediate result, a first loss error of the quantization model is determined through the forward propagation processing, and an initial weight range of the quantization model is unified through the forward propagation processing, so that in a process of performing back propagation processing on the first intermediate result, the initial weight range is adjusted based on the first loss error, so that the weight range of the quantization model is optimal, and the quantization model is stopped until the quantization model meets a first stop condition, thereby accurately obtaining a final floating point weight range of the quantization model through the training of the first stage.
In an embodiment, the first training module 704 is further configured to perform quantization processing on a feature output by a current convolution operator in the quantization model through convolution processing, so as to obtain a quantization feature; the output characteristic of the first convolution operator is obtained by performing convolution processing on sample data; and performing inverse quantization processing on the quantized features to obtain inverse quantized features, taking the inverse quantized features as input of a next convolution operator, taking a next convolution operator as a current convolution operator, returning to the step of performing quantization processing on the features output by the current convolution operator through convolution processing, and continuing to perform the step until a first intermediate result is obtained after the last inverse quantization processing.
In this embodiment, a feature output by a current convolution operator in a quantization model through convolution processing is subjected to quantization processing to obtain a quantization feature, and the quantization feature is subjected to inverse quantization processing to obtain an inverse quantization feature, so that an error generated after data is quantized and inverse quantized can be determined, the inverse quantization feature is used as an input of a next convolution operator, the next convolution operator is used as a current convolution operator, the step of performing quantization processing on the feature output by the current convolution operator through convolution processing is returned and continuously performed until a first intermediate result is obtained after the last inverse quantization processing, and therefore a loss error generated in the processing process can be accurately obtained, and a weight range of the quantization model is adjusted.
In an embodiment, the second training module 706 is further configured to, for the training at the second stage, perform forward propagation processing on the sample data through a floating point weight range in the quantization model after the training at the first stage to obtain a second intermediate result; determining a second loss error of the quantization model based on forward propagation processing, and adjusting the initial activation range of the quantization model based on the second loss error in the process of performing backward propagation processing on a second intermediate result until the quantization model meets a second stop condition, so as to obtain a floating point activation range of the trained target quantization model; wherein, the initial activation range is obtained by forward propagation processing statistics in the training of the first stage.
In this embodiment, for the training of the second stage, forward propagation processing is performed on the sample data through the floating point weight range in the quantization model after the training of the first stage to obtain a second intermediate result, a second loss error of the quantization model is determined based on the forward propagation processing, and in the process of performing backward propagation processing on the second intermediate result, the initial activation range of the quantization model is adjusted based on the second loss error until the quantization model meets a second stop condition, so that the target quantization model and the floating point activation range in the target quantization model can be accurately obtained through the training of the second stage.
In one embodiment, the apparatus further comprises: a quantization parameter determination module; the quantization parameter determining module is used for acquiring a preset weight range and a preset activation range corresponding to a preset quantization type; determining a quantization parameter of the target quantization model according to the preset weight range, the preset activation range, the floating point weight range and the floating point activation range; the quantization parameter is used for quantizing the data to be quantized into the data corresponding to the preset quantization type.
In the embodiment, a preset weight range and a preset activation range corresponding to a preset quantization type are obtained; and determining quantization parameters of the target quantization model according to the preset weight range, the preset activation range, the floating point weight range and the floating point activation range, and accurately quantizing the data to be quantized into the data corresponding to the preset quantization type through the quantization parameters, so that the floating point data can be accurately mapped into the fixed point data of a specific type.
Fig. 8 is a block diagram of a quantization processing apparatus according to an embodiment. As shown in fig. 8, the apparatus includes:
the type obtaining module 802 is configured to obtain a quantization type corresponding to data to be quantized, and obtain a fixed point weight range and a fixed point activation range corresponding to the quantization type.
The weight determining module 804 is configured to determine a weight quantization parameter corresponding to the data to be quantized based on the fixed-point weight range and the floating-point weight range of the trained quantization model.
An activation determining module 806, configured to determine, based on the fixed-point activation range and the floating-point activation range of the trained quantization model, an activation quantization parameter corresponding to data to be quantized; the trained quantization model is obtained by training a first stage floating point weight range and a second stage floating point activation range based on sample data.
And a quantization module 808, configured to quantize the data to be quantized into target data in a quantization type based on the weighted quantization parameter and the activated quantization parameter.
In the embodiment, a quantization type corresponding to data to be quantized is obtained, and a fixed point weight range and a fixed point activation range corresponding to the quantization type are obtained; determining a weight quantization parameter corresponding to the data to be quantized based on the fixed point weight range and the floating point weight range of the trained quantization model; determining an activation quantization parameter corresponding to data to be quantized based on the fixed point activation range and the floating point activation range of the trained quantization model; the trained quantization model is obtained by training a first stage floating point weight range and a second stage floating point activation range based on sample data; based on the weight quantization parameter and the activation quantization parameter, the data to be quantized can be accurately quantized from a floating point type to integer target data in a quantization type. The operation is carried out based on the integer data, so that the operation amount can be reduced, the operation speed is improved, and the subsequent image processing speed is improved.
In an embodiment, the weight determining module 804 is further configured to determine a weight quantization factor corresponding to the data to be quantized based on a floating point weight maximum value and a floating point weight minimum value in a floating point weight range, and a fixed point weight maximum value and a fixed point weight minimum value in a fixed point weight range; and determining the weight quantization offset corresponding to the data to be quantized according to the weight quantization factor, the floating point weight minimum and the fixed point weight minimum.
In this embodiment, based on the floating point weight maximum value and the floating point weight minimum value in the floating point weight range and the fixed point weight maximum value and the fixed point weight minimum value in the fixed point weight range, the weight quantization factor corresponding to the data to be quantized can be accurately determined. And accurately determining the weight quantization offset corresponding to the data to be quantized according to the weight quantization factor, the floating point weight minimum and the fixed point weight minimum.
In an embodiment, the quantization module 808 is further configured to perform convolution processing on the data to be quantized based on the weight quantization parameter, and perform activation processing on a result after the convolution processing by activating the quantization parameter, so as to obtain target data in the quantization type.
In this embodiment, the influence of the weight quantization parameter and the activation quantization parameter on data quantization is considered, convolution processing is performed on the data to be quantized based on the weight quantization parameter, and activation processing is performed on a result after the convolution processing by activating the quantization parameter, so that the obtained target data under the quantization type is more accurate.
In one embodiment, the data to be quantized is multimedia data to be quantized; the well-trained quantization model is a quantization model obtained by training a floating point weight range of multimedia in a first stage based on sample multimedia data and training a floating point activation range of the multimedia in a second stage; and the floating point weight range of the multimedia and the floating point activation range of the multimedia in the trained quantization model are used for performing quantization processing on the multimedia data to be quantized, and the target data is the quantized target multimedia data.
In this embodiment, the quantization processing method is applied to quantization processing of multimedia data, a two-stage training is performed on a quantization model through sample multimedia data, a floating point weight range of multimedia in the quantization model is determined through a first-stage training, and a floating point activation range of the multimedia in the quantization model is determined based on the sample multimedia data and the weight range of the multimedia in a second-stage training of the quantization model trained in the first stage, so that a trained target quantization model for performing multimedia data quantization is obtained, thereby improving quantization precision of the target quantization model and enabling quantization of the multimedia data to be more accurate.
The target quantization model is used for quantizing the multimedia data to be quantized, so that the floating-point multimedia data can be accurately quantized into integer data. The operation is performed based on the integer data, so that the operation amount can be reduced, the operation speed can be increased, and the multimedia processing speed can be increased.
The division of the training device and the quantization processing device of the quantization model into various modules is merely used for illustration, and in other embodiments, the training device and the quantization processing device of the quantization model may be divided into different modules as needed to complete all or part of the functions of the training device and the quantization processing device of the quantization model.
For the specific limitations of the training device and the quantization processing device of the quantization model, reference may be made to the above limitations of the training method and the quantization processing method of the quantization model, which are not described herein again. The modules in the training device and the quantization processing device of the quantization model can be wholly or partially realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent of a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
Fig. 9 is a schematic diagram of an internal structure of an electronic device in one embodiment. The electronic device may be any terminal device such as a mobile phone, a tablet computer, a notebook computer, a desktop computer, a PDA (Personal Digital Assistant), a POS (Point of Sales), a vehicle-mounted computer, and a wearable device. The electronic device includes a processor and a memory connected by a system bus. The processor may include one or more processing units, among others. The processor may be a CPU (Central Processing Unit), a DSP (Digital Signal processor), or the like. The memory may include a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The computer program can be executed by a processor for implementing a training method and a quantization processing method of a quantization model provided in the following embodiments. The internal memory provides a cached execution environment for the operating system computer programs in the non-volatile storage medium.
The implementation of each module in the training apparatus and the quantization processing apparatus of the quantization model provided in the embodiments of the present application may be in the form of a computer program. The computer program may be run on a terminal or a server. Program modules constituted by such computer programs may be stored on the memory of the electronic device. Which when executed by a processor, performs the steps of the method described in the embodiments of the present application.
The embodiment of the application also provides a computer readable storage medium. One or more non-transitory computer-readable storage media containing computer-executable instructions that, when executed by one or more processors, cause the processors to perform the steps of a method of training a quantization model or a method of quantization processing.
Embodiments of the present application further provide a computer program product containing instructions, which when executed on a computer, cause the computer to perform a method for training a quantization model or a method for quantization processing.
Any reference to memory, storage, database, or other medium used herein may include non-volatile and/or volatile memory. The nonvolatile Memory may include a ROM (Read-Only Memory), a PROM (Programmable Read-Only Memory), an EPROM (Erasable Programmable Read-Only Memory), an EEPROM (Electrically Erasable Programmable Read-Only Memory), or a flash Memory. Volatile Memory can include RAM (Random Access Memory), which acts as external cache Memory. By way of illustration and not limitation, RAM is available in a variety of forms, such as SRAM (Static Random Access Memory), DRAM (Dynamic Random Access Memory), SDRAM (Synchronous Dynamic Random Access Memory), double Data Rate DDR SDRAM (Double Data Rate Synchronous Random Access Memory), ESDRAM (Enhanced Synchronous Dynamic Random Access Memory), SLDRAM (Synchronous Link Dynamic Random Access Memory), RDRAM (Random Dynamic Random Access Memory), and DRAM (Random Dynamic Random Access Memory).
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (13)

1. A training method of a quantization model is characterized by comprising the following steps:
acquiring sample data;
training a quantization model through the sample data at a first stage to obtain a floating point weight range of the quantization model;
based on the floating point weight range and the sample data, performing second-stage training on the quantization model trained in the first stage, and obtaining a floating point activation range of the trained target quantization model in the second-stage training;
the floating point weight range and the floating point activation range in the target quantization model are used for performing quantization processing on data to be quantized.
2. The method of claim 1, wherein the training a quantization model through the sample data in a first stage to obtain a floating point weight range of the quantization model comprises:
for the training of the first stage, carrying out forward propagation processing on the sample data through a quantization model to obtain a first intermediate result;
determining a first loss error of the quantization model through the forward propagation process, and counting an initial weight range of the quantization model through the forward propagation process;
and in the process of performing back propagation processing on the first intermediate result, adjusting the initial weight range based on the first loss error until the quantization model meets a first stop condition, and obtaining a floating point weight range of the quantization model.
3. The method of claim 2, wherein said performing a forward propagation process on said sample data through a quantization model to obtain a first intermediate result comprises:
carrying out quantization processing on the features output by the current convolution operator in the quantization model through convolution processing to obtain quantization features; the output characteristic of the first convolution operator is obtained by performing convolution processing on the sample data;
and performing inverse quantization processing on the quantized features to obtain inverse quantized features, taking the inverse quantized features as input of a next convolution operator, taking the next convolution operator as a current convolution operator, returning to the step of performing quantization processing on the features output by the current convolution operator through convolution processing, and continuing to execute the steps until a first intermediate result is obtained after the last inverse quantization processing.
4. The method according to claim 1, wherein the performing a second stage of training on the quantization model trained in the first stage based on the floating point weight range and the sample data, and obtaining a floating point activation range of the trained target quantization model in the second stage of training comprises:
for the training of the second stage, performing forward propagation processing on the sample data through the floating point weight range in the quantization model after the training of the first stage to obtain a second intermediate result;
determining a second loss error of the quantization model based on the forward propagation processing, and adjusting an initial activation range of the quantization model based on the second loss error in the process of performing backward propagation processing on the second intermediate result until the quantization model meets a second stop condition, so as to obtain a floating point activation range of the trained target quantization model;
wherein the initial activation range is statistically derived by the forward propagation process in the first stage of training.
5. The method of claim 1, further comprising:
acquiring a preset weight range and a preset activation range corresponding to a preset quantization type;
determining a quantization parameter of the target quantization model according to the preset weight range, the preset activation range, the floating point weight range and the floating point activation range; the quantization parameter is used for quantizing the data to be quantized into the data corresponding to the preset quantization type.
6. A quantization processing method is characterized by comprising the following steps:
the method comprises the steps of obtaining a quantization type corresponding to data to be quantized, and obtaining a fixed point weight range and a fixed point activation range corresponding to the quantization type;
determining a weight quantization parameter corresponding to the data to be quantized based on the fixed point weight range and a floating point weight range of a trained quantization model;
determining an activated quantization parameter corresponding to the data to be quantized based on the fixed point activation range and the floating point activation range of the trained quantization model; the trained quantization model is obtained by training a first stage floating point weight range and a second stage floating point activation range based on sample data;
quantizing the data to be quantized into target data under the quantization type based on the weight quantization parameter and the activation quantization parameter.
7. The method according to claim 6, wherein the determining a weight quantization parameter corresponding to the data to be quantized based on the floating point weight range and the fixed point weight range comprises:
determining a weight quantization factor corresponding to the data to be quantized based on a floating point weight maximum value and a floating point weight minimum value in the floating point weight range and a fixed point weight maximum value and a fixed point weight minimum value in the fixed point weight range;
and determining a weight quantization offset corresponding to the data to be quantized according to the weight quantization factor, the floating point weight minimum and the fixed point weight minimum.
8. The method according to claim 6, wherein the quantizing the data to be quantized into the target data under the quantization type based on the weighting quantization parameter and the activation quantization parameter comprises:
and performing convolution processing on the data to be quantized based on the weight quantization parameter, and performing activation processing on the result after the convolution processing through the activation quantization parameter to obtain target data under the quantization type.
9. The method of claim 6, wherein the data to be quantized is multimedia data to be quantized; the well-trained quantization model is a quantization model obtained by training a floating point weight range of multimedia in a first stage based on sample multimedia data and training a floating point activation range of the multimedia in a second stage; the floating point weight range of the multimedia and the floating point activation range of the multimedia in the trained quantization model are used for carrying out quantization processing on the multimedia data to be quantized; the target data is quantized target multimedia data.
10. A training apparatus for a quantitative model, comprising:
the sample acquisition module is used for acquiring sample data;
the first training module is used for carrying out first-stage training on a quantization model through the sample data to obtain a floating point weight range of the quantization model;
the second training module is used for carrying out second-stage training on the quantization model trained in the first stage based on the floating point weight range and the sample data, and obtaining a floating point activation range of the trained target quantization model in the second-stage training;
and the floating point weight range and the floating point activation range in the target quantization model are used for performing quantization processing on data to be quantized.
11. A quantization processing apparatus characterized by comprising:
the type acquisition module is used for acquiring a quantization type corresponding to data to be quantized and acquiring a fixed point weight range and a fixed point activation range corresponding to the quantization type;
the weight determining module is used for determining a weight quantization parameter corresponding to the data to be quantized based on the fixed point weight range and a floating point weight range of a trained quantization model;
the activation determining module is used for determining an activation quantization parameter corresponding to the data to be quantized based on the fixed point activation range and the floating point activation range of the trained quantization model; the trained quantization model is obtained by training a first stage floating point weight range and a second stage floating point activation range based on sample data;
and the quantization module is used for quantizing the data to be quantized into target data under the quantization type based on the weight quantization parameter and the activation quantization parameter.
12. An electronic device comprising a memory and a processor, the memory having stored thereon a computer program, wherein the computer program, when executed by the processor, causes the processor to perform the steps of the method according to any of claims 1 to 9.
13. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 9.
CN202110896946.0A 2021-08-05 2021-08-05 Method and device for training quantitative model, electronic equipment and readable storage medium Pending CN115705486A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110896946.0A CN115705486A (en) 2021-08-05 2021-08-05 Method and device for training quantitative model, electronic equipment and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110896946.0A CN115705486A (en) 2021-08-05 2021-08-05 Method and device for training quantitative model, electronic equipment and readable storage medium

Publications (1)

Publication Number Publication Date
CN115705486A true CN115705486A (en) 2023-02-17

Family

ID=85178843

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110896946.0A Pending CN115705486A (en) 2021-08-05 2021-08-05 Method and device for training quantitative model, electronic equipment and readable storage medium

Country Status (1)

Country Link
CN (1) CN115705486A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117788983A (en) * 2024-02-28 2024-03-29 青岛海尔科技有限公司 Image data processing method and device based on large model and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117788983A (en) * 2024-02-28 2024-03-29 青岛海尔科技有限公司 Image data processing method and device based on large model and storage medium
CN117788983B (en) * 2024-02-28 2024-05-24 青岛海尔科技有限公司 Image data processing method and device based on large model and storage medium

Similar Documents

Publication Publication Date Title
CN109840589B (en) Method and device for operating convolutional neural network on FPGA
CN110443165B (en) Neural network quantization method, image recognition method, device and computer equipment
WO2019238029A1 (en) Convolutional neural network system, and method for quantifying convolutional neural network
US20220329807A1 (en) Image compression method and apparatus thereof
CN110175641B (en) Image recognition method, device, equipment and storage medium
CN111144457B (en) Image processing method, device, equipment and storage medium
CN111401550A (en) Neural network model quantification method and device and electronic equipment
CN112149797B (en) Neural network structure optimization method and device and electronic equipment
CN111105017B (en) Neural network quantization method and device and electronic equipment
CN110647974A (en) Network layer operation method and device in deep neural network
CN115238893B (en) Neural network model quantification method and device for natural language processing
CN114677548A (en) Neural network image classification system and method based on resistive random access memory
CN115705486A (en) Method and device for training quantitative model, electronic equipment and readable storage medium
CN114049530A (en) Hybrid precision neural network quantization method, device and equipment
CN112561050B (en) Neural network model training method and device
CN113011532A (en) Classification model training method and device, computing equipment and storage medium
CN116957024A (en) Method and device for reasoning by using neural network model
CN113610709B (en) Model quantization method, apparatus, electronic device, and computer-readable storage medium
JP2021033994A (en) Text processing method, apparatus, device and computer readable storage medium
CN111160517A (en) Convolutional layer quantization method and device of deep neural network
WO2021083154A1 (en) Method and apparatus for quantization of neural networks post training
CN114155388A (en) Image recognition method and device, computer equipment and storage medium
CN109583512B (en) Image processing method, device and system
CN115797643A (en) Image denoising method, device and system, edge device and storage medium
CN111815510A (en) Image processing method based on improved convolutional neural network model and related equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination