CN114511069A

CN114511069A - Method and system for improving performance of low bit quantization model

Info

Publication number: CN114511069A
Application number: CN202210400848.8A
Authority: CN
Inventors: 杜力; 郭若凡; 杜源
Original assignee: Nanjing University
Current assignee: Nanjing University
Priority date: 2022-04-18
Filing date: 2022-04-18
Publication date: 2022-05-17

Abstract

The application relates to the field of neural network quantization, in particular to a method for improving the performance of a low bit quantization model, which comprises the following steps: performing iterative computation on a preset image matrix and a preset first weight by adopting a preset algorithm to obtain an image vector; quantizing the first weight to obtain a second weight; calculating the first weight and the second weight to obtain quantization error regular term data of the first weight; the quantization error regular term data is regarded as a constraint result of the data distribution condition of the first weight; obtaining a constraint result; obtaining a total loss function model; carrying out reverse gradient propagation on the total loss function model to obtain an optimized data result of a first weight, and iterating until a neural network converges; in the practical application process, a quantization error regular term is used, so that the defect of unstable training in quantization perception training is avoided; the quantization error regular term can be directly added in the fine tuning stage of the model, and the quantization error regular term is less in calculation amount compared with quantization perception training.

Description

Method and system for improving performance of low bit quantization model

Technical Field

The present application relates to the field of neural network quantization, and in particular, to a method and system for improving performance of a low bit quantization model.

Background

The deep neural network model is widely applied to machine vision tasks such as image classification and target detection and natural language processing tasks, and achieves huge achievement. However, the deep neural network model cannot exert a good effect on a mobile terminal or an embedded device due to the limitation of storage resources and computing resources, and thus the compression and the light weight of the deep neural network are a problem to be solved urgently. In recent years, engineers have made many research efforts in the compression direction of deep neural networks, in which quantization is one of the methods of compressing deep neural networks.

The common quantized neural network model uses parameters expressed by low-order precision figures to carry out calculations such as convolution, activation, batch normalization and the like, and in an inference stage, the deep neural network only needs to carry out forward propagation once and uses the low-order precision figures to carry out calculation; therefore, the network parameters are expressed by int16 bits occupying 2 bytes or int8 bits occupying 1 byte, which are respectively called int16 (16-bit integer number) quantization and int8 quantization, and the quantized model can greatly reduce memory consumption and calculation amount and can also be deployed on hardware only supporting integer operation.

Common quantization methods cause obvious errors during low-bit precision quantization, and the lower bit precision errors are larger; in order to compensate for errors caused by direct quantization, quantization is introduced in the model training process by a quantization perception training method, and the quantized values are used for reasoning and back propagation; however, in quantization, rounding operation needs to be performed on the values of the weight and the output of the network, and the quantized values are not derivable, so that the quantization perception training method widely uses a straight-through estimator, the straight-through estimator enables the derivative of the input of the rounding function to be equal to the derivative of the output of the rounding function, and limits the range of the derivative of the output; the above training method causes the network training to become unstable, so that the convergence rate in the training becomes slow, the calculation amount is large, and the effect becomes poor.

Disclosure of Invention

In order to solve the problems that network training becomes unstable, convergence speed in training becomes slow, calculation amount is large and effect is poor due to a quantitative perception training method in the prior art, the application provides a method for improving performance of a low-bit quantization model, which is characterized by comprising the following steps of:

performing iterative computation on a preset image matrix and a preset first weight by adopting a preset algorithm to obtain an image vector, wherein the image vector is the category prediction of the image matrix;

establishing a loss data model according to the image vector and a preset class label, wherein the loss data model is used for representing a loss function of the image matrix;

quantizing the first weight to obtain a second weight;

calculating the first weight and the second weight by adopting a quantization error regular term calculation model to obtain quantization error regular term data of the first weight;

the quantization error regular term data is regarded as a constraint result of the data distribution condition of the first weight; obtaining a constraint result, wherein the constraint result is the lowest quantization error value of the first weight;

carrying out weighted summation on the constraint result and the loss function of the image matrix to obtain a total loss function model;

performing inverse gradient propagation on the total loss function model to obtain an optimized data result of the first weight, wherein the optimized data result is a result obtained by subtracting a gradient value obtained in the inverse gradient propagation from the first weight; and taking the optimized data result of the first weight as a preset first weight, and repeating iteration until the neural network converges.

Further, the quantization error regularization term computation model includes:

；

wherein the content of the first and second substances,

in order to quantize the error regularization term,

as to the number of total weights,

for the ith weight in the model,

in order to be the weight of the model,

as model weights

The value after the quantization is obtained by the quantization,

as model weights

The number of parameters of (2).

Further, the total loss function model includes:

；

wherein the content of the first and second substances,

as a function of the total loss, the loss,

in order to lose the data it is necessary to,

in order to quantize the error regularization term,

are coefficients.

Further, the loss data model includes:

；

wherein the content of the first and second substances,

in order to lose the data it is necessary to,

for the number of input pictures for this iteration,

for the j-th picture, the picture is,

is the true category vector of the jth picture,

and predicting a category vector for the model of the jth picture.

A system for improving performance of a low bit quantization model comprising:

a first module for iterating a preset image matrix and a preset first weight; performing iterative computation on the image matrix and the first weight by adopting a preset algorithm to obtain an image vector, wherein the image vector is the category prediction of the image matrix;

a second module, configured to establish a loss data model according to the image vector and a preset category label, where the loss data model is used to represent a loss function of the image matrix;

a third module that quantizes the first weight to obtain a second weight; calculating the first weight and the second weight by adopting a quantization error regular term calculation model to obtain quantization error regular term data of the first weight;

a fourth module that treats the quantization error regularization term data as a constraint result for the data distribution of the first weight; obtaining a constraint result, wherein the constraint result is the lowest quantization error value of the first weight; carrying out weighted summation on the constraint result by using the loss data model to obtain a total loss function model; performing inverse gradient propagation on the total loss function model to obtain an optimized data result of the first weight, wherein the optimized data result is a result obtained by subtracting a gradient value obtained in the inverse gradient propagation from the first weight; and taking the optimized data result of the first weight as a preset first weight, and repeatedly iterating the four modules until the neural network converges.

Further, the quantization error regularization term computation model includes:

；

wherein the content of the first and second substances,

in order to quantize the error regularization term,

as to the number of total weights,

for the ith weight in the model,

in order to be the weight of the model,

as model weights

The value after the quantization is obtained by the quantization,

as model weights

The number of parameters of (2).

Further, the total loss function model includes:

；

wherein the content of the first and second substances,

as a function of the total loss, the loss,

in order to balance the order of magnitude,

in order to quantize the error regularization term,

are coefficients.

Further, the loss data model includes:

；

wherein the content of the first and second substances,

in order to lose the data it is necessary to,

for the number of input pictures for this iteration,

for the j-th picture, the picture is,

is the true category vector of the jth picture,

and predicting a category vector for the model of the jth picture.

According to the technical scheme, the method for improving the performance of the low bit quantization model comprises the following steps: performing iterative computation on a preset image matrix and a preset first weight by adopting a preset algorithm to obtain an image vector, wherein the image vector is the category prediction of the image matrix; establishing a loss data model according to the image vector and a preset class label, wherein the loss data model is used for representing a loss function of the image matrix; quantizing the first weight to obtain a second weight; calculating the first weight and the second weight by adopting a quantization error regular term calculation model to obtain quantization error regular term data of the first weight; the quantization error regular term data is regarded as a constraint result of the data distribution condition of the first weight; obtaining a constraint result, wherein the constraint result is the lowest quantization error value of the first weight; carrying out weighted summation on the constraint result and the loss function of the image matrix to obtain a total loss function model; performing inverse gradient propagation on the total loss function model to obtain an optimized data result of the first weight, wherein the optimized data result is a result obtained by subtracting a gradient value obtained in the inverse gradient propagation from the first weight; and taking the optimized data result of the first weight as a preset first weight, and repeating iteration until the neural network converges.

In the practical application process, the method for improving the performance of the low-bit quantization model uses the quantization error regular term, and avoids the defects of unstable training and difficult convergence in the quantization perception training; meanwhile, the quantization error regular term can be directly added in the fine tuning stage of the model, and the calculation amount is less compared with that of quantization perception training; finally, the quantization error regular term is optimized only aiming at the model weight, no contradiction is generated with the quantization method output by the intermediate layer, and the performance of the quantized model is further improved.

Drawings

In order to more clearly explain the technical solution of the present application, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious to those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a flowchart illustrating a method for improving performance of a low bit quantization model according to an embodiment of the present disclosure.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some but not all embodiments of the present application. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations.

In the description of the present application, it is also to be noted that, unless explicitly stated or limited otherwise, the term "connected" is to be understood in a broad sense, e.g. electrically, but also communicatively, connected. The specific meaning of the above terms in the present application can be understood in a specific case by those of ordinary skill in the art.

Furthermore, the terms "comprises" and "comprising," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a product or device that comprises a list of elements is not necessarily limited to those elements explicitly listed, but may include other elements not expressly listed or inherent to such product or device.

The method aims to solve the problems that network training becomes unstable, the convergence speed in the training becomes slow, the calculated amount is large and the effect is poor due to a quantitative perception training method in the prior art; referring to fig. 1, a schematic flowchart of a method for improving performance of a low bit quantization model according to an embodiment of the present application is shown; in a first aspect, an embodiment of the present application provides a method for improving performance of a low bit quantization model, including: and performing iterative computation on a preset image matrix and a preset first weight by adopting a preset algorithm to obtain an image vector, wherein the image vector is the category prediction of the image matrix.

In some embodiments of the present application, the image matrix is represented in matrix dimensions

Expressed as a matrix, the number being the size of each dimension, i.e.

Wherein N represents N pictures, and 3 representsEach picture has three channels of RGB, h and w are the height and width of the picture respectively, for an image classification task, each picture has a class which is expressed as a number of 1-1000, and a class label is expressed as

(N is the number of pictures, the 1000-dimensional vector represents the class of this picture, e.g. if it is class 1, then the class label vector of this image is

Of 1 at

Class 1 is

The vitamin is 1, the rest is 0. The first weight has data represented as a set

Wherein

Is a matrix representing the weight data of the ith operation, e.g.

Is the weight of a layer of convolution, with a size of

，

As the number of channels to be output,

k is the size of the convolution kernel for the number of channels input.

In some embodiments of the present application, a preset algorithm is adopted to perform iterative computation on the initial data and the first weight to obtain an image vector, where the image vector is a category prediction of the initial data; the preset algorithm is convolution multiplication or matrix multiplication; further, the initial data and the first weight are calculated by convolution multiplication or matrix multiplication to obtain output data, the output data is subjected to an activation function to obtain data serving as new initial output, iterative calculation is performed by combining the first weight, and finally a 1000-dimensional image vector is output and serves as class prediction of the initial data.

In some embodiments of the present application, a loss data model is established according to the image vector and a preset class label, where the loss data model is used to represent a loss function of the image matrix; specifically, the probability that a picture belongs to each category in 1000 categories is predicted according to the image vector representation model; consistent with the ordinary training process; according to the method, a regular term is added after a classification task loss model is calculated, a scaling coefficient is added to adjust the influence of the regular term on training, and the distribution of weight parameters can be limited in the training process, so that the weight parameters have lower quantization errors under the distribution, namely the parameters are closer to the quantized values. The regularization term can be added directly in the fine tuning stage of the model, and the regularization term is less computational in comparison with quantitative perception training.

In some embodiments of the application, the quantization error regularization term added in the training has a small influence on the performance of the full-precision floating point number model, so that the quantization error regularization term can be added in the fine tuning process for a specific application task (the fine tuning step is performed in the full-precision model deployment process without considering quantization), and a stage does not need to be separately set for quantization. And because the regular term is calculated once in each iteration in the using process, the calculation is not separately calculated for each input data (each iteration can simultaneously input dozens to hundreds of input data), and the calculation amount is lower.

In some embodiments of the application, a loss data model is established according to the image vector and a preset category label, and is added with a quantization error regular term to obtain a total loss function; calculating the gradient of the total loss function to each weight according to a chain type derivation rule, and then updating the weight according to the gradient to reduce the total loss function; the model weight can be updated towards a direction of more accurate classification based on task loss updating, the model weight can be gradually close to a quantized value based on quantization error loss item updating, the quantization error of the model is reduced, and performance loss caused by quantization is reduced.

In some embodiments of the present application, after determining a weight quantization algorithm and implementing a code of quantization and inverse quantization, quantizing the first weight to obtain a second weight; and calculating the first weight and the second weight by adopting a quantization error regular term calculation model to obtain quantization error regular term data of the first weight, namely the L2 norm of the variation before and after model quantization. Specifically, the back propagation process is consistent with a floating point model, only a quantization error regular term is added into a loss function, and all involved operations are conducted.

In some embodiments of the present application, the quantization error regularization term data is regarded as a constraint result of the data distribution condition of the first weight; obtaining a constraint result, wherein the constraint result is the lowest quantization error value of the first weight; carrying out weighted summation on the constraint result and the second result data to obtain a total loss function model; and performing inverse gradient propagation on the total loss function model to obtain an optimized data result of the first weight, wherein the optimized data result is a result of subtracting a gradient value obtained in the inverse gradient propagation from the first weight.

In some embodiments of the present application, image classification of preset initial data, total 6 models used, the Deit-tiny, Deit-base, Swin-tiny, ViT-base, are all image deep neural network models based on a transform structure, a data set used for model fine tuning is ImageNet, 1000 classes and about 100 ten thousand pictures are total, and a test is performed on the ImageNet data set, as shown in table 1:

TABLE 1 Performance Table of the model

Specifically, in the present embodiment, the expressions of 6 models are compared in table 1, and except for the precision of floating point (the performance of the model with the precision of floating point, i.e. the performance of the model before quantization), the weights, calculations and intermediate values of all models are quantized, wherein the weights are quantized to 4-bit precision (int4), the attention weights (important components in the transform structure, belonging to the intermediate values) are quantized to 4-bit precision, and the rest are all quantized to 8-bit precision. The number in the table is the accuracy of image classification of the model on the ImageNet data set, and 4-bit fine adjustment represents model expression of 4-bit quantization after fine adjustment is carried out by using a quantization error regular term proposed by us; it can be seen that the direct 4-bit quantization of the full-precision model results in a large loss of model performance, and if the quantization is performed on the model finely tuned by using the quantization error regular term, the performance of the quantized model can be obviously improved; meanwhile, the quantization error regular term is optimized only aiming at the model weight, and is not contradictory to a plurality of quantization methods aiming at the output of the model intermediate layer, so that the quantization error regular term can be used simultaneously, and the performance of the quantized model is further improved.

As can be seen from the above technical solutions, the method for improving the performance of the low bit quantization model provided by the present application includes: performing iterative computation on a preset image matrix and a preset first weight by adopting a preset algorithm to obtain an image vector, wherein the image vector is the category prediction of the image matrix; establishing a loss data model according to the image vector and a preset class label, wherein the loss data model is used for representing a loss function of the image matrix; quantizing the first weight to obtain a second weight; calculating the first weight and the second weight by adopting a quantization error regular term calculation model to obtain quantization error regular term data of the first weight; the quantization error regular term data is regarded as a constraint result of the data distribution condition of the first weight; obtaining a constraint result, wherein the constraint result is the lowest quantization error value of the first weight; carrying out weighted summation on the constraint result and the loss function of the image matrix to obtain a total loss function model; performing inverse gradient propagation on the total loss function model to obtain an optimized data result of the first weight, wherein the optimized data result is a result obtained by subtracting a gradient value obtained in the inverse gradient propagation from the first weight; and the optimization data result is used as a preset first weight mentioned in the first step, and the optimization process is repeatedly iterated until the neural network converges.

In the practical application process, the added quantization error regular term calculation model is suitable for various quantization methods, and the defects of unstable training and difficult convergence of quantization perception training can be avoided; in the actual quantization process, the value of the quantized weight is only used as a leaf node in a calculation graph and does not participate in forward propagation and is not changed in the backward propagation gradient descending process, so that the problem of gradient backward propagation of non-derivable operation in various quantization operations is not considered, and the situation that the gradient of a straight-through estimator is inaccurate is not caused; and the scaling coefficient before the regular term can adjust the influence of the regular term on the training process, and a user can balance the effect and the stability.

In some embodiments of the present application, the quantization error regularization term computation model includes:

；

wherein the content of the first and second substances,

in order to quantize the error regularization term,

as to the number of total weights,

for the ith weight in the model,

in order to be the weight of the model,

as model weights

The value after the quantization is obtained by the quantization,

as model weights

The number of parameters of (a);

the lower the model weight is, the closer it is to its quantized value, the smaller the quantization error.

In some embodiments of the present application, the total loss function model includes:

；

wherein the content of the first and second substances,

as a function of the total loss, the loss,

in order to lose the data it is necessary to,

in order to quantize the error regularization term,

is a coefficient for balancing

And

of the order of magnitude. In the actual application processThe training process continues for 5-15 rounds until the model converges, i.e., the total loss function is lowest.

In some embodiments of the present application, the loss data model includes:

；

wherein the content of the first and second substances,

in order to lose the data it is necessary to,

for the number of input pictures for this iteration,

for the j-th picture, the picture is,

is the true category vector of the jth picture,

and predicting a category vector for the model of the jth picture.

In order to implement the practical application of the method, a second aspect of the embodiments of the present application further provides a system for improving performance of a low bit quantization model, including: a first module for iterating a preset image matrix and a preset first weight; performing iterative computation on the image matrix and the first weight by adopting a preset algorithm to obtain an image vector, wherein the image vector is the category prediction of the image matrix; a second module, configured to establish a loss data model according to the image vector and a preset category label, where the loss data model is used to represent a loss function of the image matrix; a third module that quantizes the first weight to obtain a second weight; calculating the first weight and the second weight by adopting a quantization error regular term calculation model to obtain quantization error regular term data of the first weight; a fourth module that treats the quantization error regularization term data as a constraint result for the data distribution of the first weight; obtaining a constraint result, wherein the constraint result is the lowest quantization error value of the first weight; carrying out weighted summation on the constraint result by using the loss data model to obtain a total loss function model; performing inverse gradient propagation on the total loss function model to obtain an optimized data result of the first weight, wherein the optimized data result is a result obtained by subtracting a gradient value obtained in the inverse gradient propagation from the first weight; and the optimization data result is used as a preset first weight mentioned in the first step, and the optimization process is repeatedly iterated until the neural network converges.

；

wherein the content of the first and second substances,

in order to quantize the error regularization term,

as to the number of total weights,

for the ith weight in the model,

in order to be the weight of the model,

as model weights

The value after the quantization is obtained by the quantization,

as model weights

The number of parameters of (2).

；

wherein the content of the first and second substances,

as a function of the total loss, the loss,

in order to balance the order of magnitude,

in order to quantize the error regularization term,

are coefficients.

In some embodiments of the present application, the loss data model includes:

；

wherein the content of the first and second substances,

in order to lose the data it is necessary to,

for the number of input pictures for this iteration,

for the j-th picture, the picture is,

is the true category vector of the jth picture,

and predicting a category vector for the model of the jth picture.

According to the technical scheme, the method for improving the performance of the low bit quantization model is characterized in that a preset algorithm is adopted to carry out iterative computation on a preset image matrix and a preset first weight to obtain an image vector, and the image vector is a category prediction of the image matrix; establishing a loss data model according to the image vector and a preset class label, wherein the loss data model is used for representing a loss function of the image matrix; quantizing the first weight to obtain a second weight; calculating the first weight and the second weight by adopting a quantization error regular term calculation model to obtain quantization error regular term data of the first weight; the quantization error regular term data is regarded as a constraint result of the data distribution condition of the first weight; obtaining a constraint result, wherein the constraint result is the lowest quantization error value of the first weight; carrying out weighted summation on the constraint result and the loss function of the image matrix to obtain a total loss function model; and performing inverse gradient propagation on the total loss function model to obtain an optimized data result of the first weight, wherein the optimized data result is a result of subtracting a gradient value obtained in the inverse gradient propagation from the first weight.

The present application has been described in detail with reference to specific embodiments and illustrative examples, but the description is not intended to limit the application. Those skilled in the art will appreciate that various equivalent substitutions, modifications or improvements may be made to the presently disclosed embodiments and implementations thereof without departing from the spirit and scope of the present disclosure, and these fall within the scope of the present disclosure.

Claims

1. A method for improving performance of a low bit quantization model, comprising:

quantizing the first weight to obtain a second weight;

the quantization error regular item data is regarded as a constraint result of the data distribution condition of the first weight; obtaining a constraint result, wherein the constraint result is the lowest quantization error value of the first weight;

performing inverse gradient propagation on the total loss function model to obtain an optimized data result of the first weight, wherein the optimized data result is a result obtained by subtracting a gradient value obtained in the inverse gradient propagation from the first weight;

and taking the optimized data result of the first weight as a preset first weight, and repeating iteration until the neural network converges.

2. A method for improving the performance of a low bit quantization model according to claim 1, wherein the quantization error regularization term computation model comprises:

；

wherein the content of the first and second substances,

in order to quantize the error regularization term,

as to the number of total weights,

for the ith weight in the model,

in order to be the weight of the model,

as model weights

The value after the quantization is obtained by the quantization,

as model weights

The number of parameters of (2).

3. The method of claim 1, wherein the total loss function model comprises:

；

wherein the content of the first and second substances,

as a function of the total loss, the loss,

in order to lose the data it is necessary to,

in order to quantize the error regularization term,

are coefficients.

4. The method of claim 1, wherein the loss data model comprises:

；

wherein the content of the first and second substances,

in order to lose the data it is necessary to,

for the number of input pictures for this iteration,

for the j-th picture, the picture is,

is the true category vector of the jth picture,

and predicting a category vector for the model of the jth picture.

5. A system for improving performance of a low bit quantization model, comprising:

a fourth module that treats the quantization error regularization term data as a constraint result for the data distribution of the first weight; obtaining a constraint result, wherein the constraint result is the lowest quantization error value of the first weight; carrying out weighted summation on the constraint result by using the loss data model to obtain a total loss function model; performing inverse gradient propagation on the total loss function model to obtain an optimized data result of the first weight, wherein the optimized data result is a result obtained by subtracting a gradient value obtained in the inverse gradient propagation from the first weight; and taking the optimized data result of the first weight as a preset first weight, and repeating iteration until the neural network converges.

6. A system for improving the performance of a low bit quantization model according to claim 5, wherein the quantization error regularization term computation model comprises: