CN114511069A - Method and system for improving performance of low bit quantization model - Google Patents

Method and system for improving performance of low bit quantization model Download PDF

Info

Publication number
CN114511069A
CN114511069A CN202210400848.8A CN202210400848A CN114511069A CN 114511069 A CN114511069 A CN 114511069A CN 202210400848 A CN202210400848 A CN 202210400848A CN 114511069 A CN114511069 A CN 114511069A
Authority
CN
China
Prior art keywords
weight
model
quantization
data
result
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210400848.8A
Other languages
Chinese (zh)
Inventor
杜力
郭若凡
杜源
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN202210400848.8A priority Critical patent/CN114511069A/en
Publication of CN114511069A publication Critical patent/CN114511069A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Abstract

The application relates to the field of neural network quantization, in particular to a method for improving the performance of a low bit quantization model, which comprises the following steps: performing iterative computation on a preset image matrix and a preset first weight by adopting a preset algorithm to obtain an image vector; quantizing the first weight to obtain a second weight; calculating the first weight and the second weight to obtain quantization error regular term data of the first weight; the quantization error regular term data is regarded as a constraint result of the data distribution condition of the first weight; obtaining a constraint result; obtaining a total loss function model; carrying out reverse gradient propagation on the total loss function model to obtain an optimized data result of a first weight, and iterating until a neural network converges; in the practical application process, a quantization error regular term is used, so that the defect of unstable training in quantization perception training is avoided; the quantization error regular term can be directly added in the fine tuning stage of the model, and the quantization error regular term is less in calculation amount compared with quantization perception training.

Description

Method and system for improving performance of low bit quantization model
Technical Field
The present application relates to the field of neural network quantization, and in particular, to a method and system for improving performance of a low bit quantization model.
Background
The deep neural network model is widely applied to machine vision tasks such as image classification and target detection and natural language processing tasks, and achieves huge achievement. However, the deep neural network model cannot exert a good effect on a mobile terminal or an embedded device due to the limitation of storage resources and computing resources, and thus the compression and the light weight of the deep neural network are a problem to be solved urgently. In recent years, engineers have made many research efforts in the compression direction of deep neural networks, in which quantization is one of the methods of compressing deep neural networks.
The common quantized neural network model uses parameters expressed by low-order precision figures to carry out calculations such as convolution, activation, batch normalization and the like, and in an inference stage, the deep neural network only needs to carry out forward propagation once and uses the low-order precision figures to carry out calculation; therefore, the network parameters are expressed by int16 bits occupying 2 bytes or int8 bits occupying 1 byte, which are respectively called int16 (16-bit integer number) quantization and int8 quantization, and the quantized model can greatly reduce memory consumption and calculation amount and can also be deployed on hardware only supporting integer operation.
Common quantization methods cause obvious errors during low-bit precision quantization, and the lower bit precision errors are larger; in order to compensate for errors caused by direct quantization, quantization is introduced in the model training process by a quantization perception training method, and the quantized values are used for reasoning and back propagation; however, in quantization, rounding operation needs to be performed on the values of the weight and the output of the network, and the quantized values are not derivable, so that the quantization perception training method widely uses a straight-through estimator, the straight-through estimator enables the derivative of the input of the rounding function to be equal to the derivative of the output of the rounding function, and limits the range of the derivative of the output; the above training method causes the network training to become unstable, so that the convergence rate in the training becomes slow, the calculation amount is large, and the effect becomes poor.
Disclosure of Invention
In order to solve the problems that network training becomes unstable, convergence speed in training becomes slow, calculation amount is large and effect is poor due to a quantitative perception training method in the prior art, the application provides a method for improving performance of a low-bit quantization model, which is characterized by comprising the following steps of:
performing iterative computation on a preset image matrix and a preset first weight by adopting a preset algorithm to obtain an image vector, wherein the image vector is the category prediction of the image matrix;
establishing a loss data model according to the image vector and a preset class label, wherein the loss data model is used for representing a loss function of the image matrix;
quantizing the first weight to obtain a second weight;
calculating the first weight and the second weight by adopting a quantization error regular term calculation model to obtain quantization error regular term data of the first weight;
the quantization error regular term data is regarded as a constraint result of the data distribution condition of the first weight; obtaining a constraint result, wherein the constraint result is the lowest quantization error value of the first weight;
carrying out weighted summation on the constraint result and the loss function of the image matrix to obtain a total loss function model;
performing inverse gradient propagation on the total loss function model to obtain an optimized data result of the first weight, wherein the optimized data result is a result obtained by subtracting a gradient value obtained in the inverse gradient propagation from the first weight; and taking the optimized data result of the first weight as a preset first weight, and repeating iteration until the neural network converges.
Further, the quantization error regularization term computation model includes:
Figure 338807DEST_PATH_IMAGE001
wherein the content of the first and second substances,
Figure 558436DEST_PATH_IMAGE002
in order to quantize the error regularization term,
Figure 7871DEST_PATH_IMAGE003
as to the number of total weights,
Figure 439990DEST_PATH_IMAGE004
for the ith weight in the model,
Figure 596165DEST_PATH_IMAGE005
in order to be the weight of the model,
Figure 455536DEST_PATH_IMAGE006
as model weights
Figure 392268DEST_PATH_IMAGE005
The value after the quantization is obtained by the quantization,
Figure 628078DEST_PATH_IMAGE007
as model weights
Figure 704005DEST_PATH_IMAGE005
The number of parameters of (2).
Further, the total loss function model includes:
Figure 468699DEST_PATH_IMAGE008
wherein the content of the first and second substances,
Figure 627148DEST_PATH_IMAGE009
as a function of the total loss, the loss,
Figure 401069DEST_PATH_IMAGE010
in order to lose the data it is necessary to,
Figure 594153DEST_PATH_IMAGE002
in order to quantize the error regularization term,
Figure 998589DEST_PATH_IMAGE011
are coefficients.
Further, the loss data model includes:
Figure 644334DEST_PATH_IMAGE012
wherein the content of the first and second substances,
Figure 956367DEST_PATH_IMAGE010
in order to lose the data it is necessary to,
Figure 269537DEST_PATH_IMAGE013
for the number of input pictures for this iteration,
Figure 847804DEST_PATH_IMAGE014
for the j-th picture, the picture is,
Figure 246424DEST_PATH_IMAGE015
is the true category vector of the jth picture,
Figure 565410DEST_PATH_IMAGE016
and predicting a category vector for the model of the jth picture.
A system for improving performance of a low bit quantization model comprising:
a first module for iterating a preset image matrix and a preset first weight; performing iterative computation on the image matrix and the first weight by adopting a preset algorithm to obtain an image vector, wherein the image vector is the category prediction of the image matrix;
a second module, configured to establish a loss data model according to the image vector and a preset category label, where the loss data model is used to represent a loss function of the image matrix;
a third module that quantizes the first weight to obtain a second weight; calculating the first weight and the second weight by adopting a quantization error regular term calculation model to obtain quantization error regular term data of the first weight;
a fourth module that treats the quantization error regularization term data as a constraint result for the data distribution of the first weight; obtaining a constraint result, wherein the constraint result is the lowest quantization error value of the first weight; carrying out weighted summation on the constraint result by using the loss data model to obtain a total loss function model; performing inverse gradient propagation on the total loss function model to obtain an optimized data result of the first weight, wherein the optimized data result is a result obtained by subtracting a gradient value obtained in the inverse gradient propagation from the first weight; and taking the optimized data result of the first weight as a preset first weight, and repeatedly iterating the four modules until the neural network converges.
Further, the quantization error regularization term computation model includes:
Figure 733087DEST_PATH_IMAGE001
wherein the content of the first and second substances,
Figure 276063DEST_PATH_IMAGE002
in order to quantize the error regularization term,
Figure 834084DEST_PATH_IMAGE003
as to the number of total weights,
Figure 753498DEST_PATH_IMAGE004
for the ith weight in the model,
Figure 775681DEST_PATH_IMAGE005
in order to be the weight of the model,
Figure 692821DEST_PATH_IMAGE006
as model weights
Figure 66034DEST_PATH_IMAGE005
The value after the quantization is obtained by the quantization,
Figure 526489DEST_PATH_IMAGE007
as model weights
Figure 340862DEST_PATH_IMAGE005
The number of parameters of (2).
Further, the total loss function model includes:
Figure 756800DEST_PATH_IMAGE008
wherein the content of the first and second substances,
Figure 86150DEST_PATH_IMAGE009
as a function of the total loss, the loss,
Figure 550629DEST_PATH_IMAGE010
in order to balance the order of magnitude,
Figure 281825DEST_PATH_IMAGE002
in order to quantize the error regularization term,
Figure 603085DEST_PATH_IMAGE011
are coefficients.
Further, the loss data model includes:
Figure 622993DEST_PATH_IMAGE012
wherein the content of the first and second substances,
Figure 687901DEST_PATH_IMAGE010
in order to lose the data it is necessary to,
Figure 338850DEST_PATH_IMAGE013
for the number of input pictures for this iteration,
Figure 34274DEST_PATH_IMAGE014
for the j-th picture, the picture is,
Figure 338216DEST_PATH_IMAGE015
is the true category vector of the jth picture,
Figure 941236DEST_PATH_IMAGE016
and predicting a category vector for the model of the jth picture.
According to the technical scheme, the method for improving the performance of the low bit quantization model comprises the following steps: performing iterative computation on a preset image matrix and a preset first weight by adopting a preset algorithm to obtain an image vector, wherein the image vector is the category prediction of the image matrix; establishing a loss data model according to the image vector and a preset class label, wherein the loss data model is used for representing a loss function of the image matrix; quantizing the first weight to obtain a second weight; calculating the first weight and the second weight by adopting a quantization error regular term calculation model to obtain quantization error regular term data of the first weight; the quantization error regular term data is regarded as a constraint result of the data distribution condition of the first weight; obtaining a constraint result, wherein the constraint result is the lowest quantization error value of the first weight; carrying out weighted summation on the constraint result and the loss function of the image matrix to obtain a total loss function model; performing inverse gradient propagation on the total loss function model to obtain an optimized data result of the first weight, wherein the optimized data result is a result obtained by subtracting a gradient value obtained in the inverse gradient propagation from the first weight; and taking the optimized data result of the first weight as a preset first weight, and repeating iteration until the neural network converges.
In the practical application process, the method for improving the performance of the low-bit quantization model uses the quantization error regular term, and avoids the defects of unstable training and difficult convergence in the quantization perception training; meanwhile, the quantization error regular term can be directly added in the fine tuning stage of the model, and the calculation amount is less compared with that of quantization perception training; finally, the quantization error regular term is optimized only aiming at the model weight, no contradiction is generated with the quantization method output by the intermediate layer, and the performance of the quantized model is further improved.
Drawings
In order to more clearly explain the technical solution of the present application, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious to those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a flowchart illustrating a method for improving performance of a low bit quantization model according to an embodiment of the present disclosure.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some but not all embodiments of the present application. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations.
In the description of the present application, it is also to be noted that, unless explicitly stated or limited otherwise, the term "connected" is to be understood in a broad sense, e.g. electrically, but also communicatively, connected. The specific meaning of the above terms in the present application can be understood in a specific case by those of ordinary skill in the art.
Furthermore, the terms "comprises" and "comprising," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a product or device that comprises a list of elements is not necessarily limited to those elements explicitly listed, but may include other elements not expressly listed or inherent to such product or device.
The method aims to solve the problems that network training becomes unstable, the convergence speed in the training becomes slow, the calculated amount is large and the effect is poor due to a quantitative perception training method in the prior art; referring to fig. 1, a schematic flowchart of a method for improving performance of a low bit quantization model according to an embodiment of the present application is shown; in a first aspect, an embodiment of the present application provides a method for improving performance of a low bit quantization model, including: and performing iterative computation on a preset image matrix and a preset first weight by adopting a preset algorithm to obtain an image vector, wherein the image vector is the category prediction of the image matrix.
In some embodiments of the present application, the image matrix is represented in matrix dimensions
Figure 850286DEST_PATH_IMAGE017
Expressed as a matrix, the number being the size of each dimension, i.e.
Figure 513348DEST_PATH_IMAGE018
Wherein N represents N pictures, and 3 representsEach picture has three channels of RGB, h and w are the height and width of the picture respectively, for an image classification task, each picture has a class which is expressed as a number of 1-1000, and a class label is expressed as
Figure 507849DEST_PATH_IMAGE019
(N is the number of pictures, the 1000-dimensional vector represents the class of this picture, e.g. if it is class 1, then the class label vector of this image is
Figure 914560DEST_PATH_IMAGE020
Of 1 at
Figure 678116DEST_PATH_IMAGE021
Class 1 is
Figure 184184DEST_PATH_IMAGE021
The vitamin is 1, the rest is 0. The first weight has data represented as a set
Figure 400402DEST_PATH_IMAGE022
Wherein
Figure 761804DEST_PATH_IMAGE023
Is a matrix representing the weight data of the ith operation, e.g.
Figure 379867DEST_PATH_IMAGE023
Is the weight of a layer of convolution, with a size of
Figure 384732DEST_PATH_IMAGE024
Figure 150563DEST_PATH_IMAGE025
As the number of channels to be output,
Figure 633497DEST_PATH_IMAGE026
k is the size of the convolution kernel for the number of channels input.
In some embodiments of the present application, a preset algorithm is adopted to perform iterative computation on the initial data and the first weight to obtain an image vector, where the image vector is a category prediction of the initial data; the preset algorithm is convolution multiplication or matrix multiplication; further, the initial data and the first weight are calculated by convolution multiplication or matrix multiplication to obtain output data, the output data is subjected to an activation function to obtain data serving as new initial output, iterative calculation is performed by combining the first weight, and finally a 1000-dimensional image vector is output and serves as class prediction of the initial data.
In some embodiments of the present application, a loss data model is established according to the image vector and a preset class label, where the loss data model is used to represent a loss function of the image matrix; specifically, the probability that a picture belongs to each category in 1000 categories is predicted according to the image vector representation model; consistent with the ordinary training process; according to the method, a regular term is added after a classification task loss model is calculated, a scaling coefficient is added to adjust the influence of the regular term on training, and the distribution of weight parameters can be limited in the training process, so that the weight parameters have lower quantization errors under the distribution, namely the parameters are closer to the quantized values. The regularization term can be added directly in the fine tuning stage of the model, and the regularization term is less computational in comparison with quantitative perception training.
In some embodiments of the application, the quantization error regularization term added in the training has a small influence on the performance of the full-precision floating point number model, so that the quantization error regularization term can be added in the fine tuning process for a specific application task (the fine tuning step is performed in the full-precision model deployment process without considering quantization), and a stage does not need to be separately set for quantization. And because the regular term is calculated once in each iteration in the using process, the calculation is not separately calculated for each input data (each iteration can simultaneously input dozens to hundreds of input data), and the calculation amount is lower.
In some embodiments of the application, a loss data model is established according to the image vector and a preset category label, and is added with a quantization error regular term to obtain a total loss function; calculating the gradient of the total loss function to each weight according to a chain type derivation rule, and then updating the weight according to the gradient to reduce the total loss function; the model weight can be updated towards a direction of more accurate classification based on task loss updating, the model weight can be gradually close to a quantized value based on quantization error loss item updating, the quantization error of the model is reduced, and performance loss caused by quantization is reduced.
In some embodiments of the present application, after determining a weight quantization algorithm and implementing a code of quantization and inverse quantization, quantizing the first weight to obtain a second weight; and calculating the first weight and the second weight by adopting a quantization error regular term calculation model to obtain quantization error regular term data of the first weight, namely the L2 norm of the variation before and after model quantization. Specifically, the back propagation process is consistent with a floating point model, only a quantization error regular term is added into a loss function, and all involved operations are conducted.
In some embodiments of the present application, the quantization error regularization term data is regarded as a constraint result of the data distribution condition of the first weight; obtaining a constraint result, wherein the constraint result is the lowest quantization error value of the first weight; carrying out weighted summation on the constraint result and the second result data to obtain a total loss function model; and performing inverse gradient propagation on the total loss function model to obtain an optimized data result of the first weight, wherein the optimized data result is a result of subtracting a gradient value obtained in the inverse gradient propagation from the first weight.
In some embodiments of the present application, image classification of preset initial data, total 6 models used, the Deit-tiny, Deit-base, Swin-tiny, ViT-base, are all image deep neural network models based on a transform structure, a data set used for model fine tuning is ImageNet, 1000 classes and about 100 ten thousand pictures are total, and a test is performed on the ImageNet data set, as shown in table 1:
TABLE 1 Performance Table of the model
Figure 371646DEST_PATH_IMAGE027
Specifically, in the present embodiment, the expressions of 6 models are compared in table 1, and except for the precision of floating point (the performance of the model with the precision of floating point, i.e. the performance of the model before quantization), the weights, calculations and intermediate values of all models are quantized, wherein the weights are quantized to 4-bit precision (int4), the attention weights (important components in the transform structure, belonging to the intermediate values) are quantized to 4-bit precision, and the rest are all quantized to 8-bit precision. The number in the table is the accuracy of image classification of the model on the ImageNet data set, and 4-bit fine adjustment represents model expression of 4-bit quantization after fine adjustment is carried out by using a quantization error regular term proposed by us; it can be seen that the direct 4-bit quantization of the full-precision model results in a large loss of model performance, and if the quantization is performed on the model finely tuned by using the quantization error regular term, the performance of the quantized model can be obviously improved; meanwhile, the quantization error regular term is optimized only aiming at the model weight, and is not contradictory to a plurality of quantization methods aiming at the output of the model intermediate layer, so that the quantization error regular term can be used simultaneously, and the performance of the quantized model is further improved.
As can be seen from the above technical solutions, the method for improving the performance of the low bit quantization model provided by the present application includes: performing iterative computation on a preset image matrix and a preset first weight by adopting a preset algorithm to obtain an image vector, wherein the image vector is the category prediction of the image matrix; establishing a loss data model according to the image vector and a preset class label, wherein the loss data model is used for representing a loss function of the image matrix; quantizing the first weight to obtain a second weight; calculating the first weight and the second weight by adopting a quantization error regular term calculation model to obtain quantization error regular term data of the first weight; the quantization error regular term data is regarded as a constraint result of the data distribution condition of the first weight; obtaining a constraint result, wherein the constraint result is the lowest quantization error value of the first weight; carrying out weighted summation on the constraint result and the loss function of the image matrix to obtain a total loss function model; performing inverse gradient propagation on the total loss function model to obtain an optimized data result of the first weight, wherein the optimized data result is a result obtained by subtracting a gradient value obtained in the inverse gradient propagation from the first weight; and the optimization data result is used as a preset first weight mentioned in the first step, and the optimization process is repeatedly iterated until the neural network converges.
In the practical application process, the added quantization error regular term calculation model is suitable for various quantization methods, and the defects of unstable training and difficult convergence of quantization perception training can be avoided; in the actual quantization process, the value of the quantized weight is only used as a leaf node in a calculation graph and does not participate in forward propagation and is not changed in the backward propagation gradient descending process, so that the problem of gradient backward propagation of non-derivable operation in various quantization operations is not considered, and the situation that the gradient of a straight-through estimator is inaccurate is not caused; and the scaling coefficient before the regular term can adjust the influence of the regular term on the training process, and a user can balance the effect and the stability.
In some embodiments of the present application, the quantization error regularization term computation model includes:
Figure 547412DEST_PATH_IMAGE001
wherein the content of the first and second substances,
Figure 537890DEST_PATH_IMAGE002
in order to quantize the error regularization term,
Figure 27777DEST_PATH_IMAGE003
as to the number of total weights,
Figure 682749DEST_PATH_IMAGE004
for the ith weight in the model,
Figure 232679DEST_PATH_IMAGE005
in order to be the weight of the model,
Figure 707523DEST_PATH_IMAGE006
as model weights
Figure 797838DEST_PATH_IMAGE005
The value after the quantization is obtained by the quantization,
Figure 245000DEST_PATH_IMAGE007
as model weights
Figure 762569DEST_PATH_IMAGE005
The number of parameters of (a);
Figure 990288DEST_PATH_IMAGE002
the lower the model weight is, the closer it is to its quantized value, the smaller the quantization error.
In some embodiments of the present application, the total loss function model includes:
Figure 821978DEST_PATH_IMAGE008
wherein the content of the first and second substances,
Figure 923314DEST_PATH_IMAGE009
as a function of the total loss, the loss,
Figure 877363DEST_PATH_IMAGE010
in order to lose the data it is necessary to,
Figure 530062DEST_PATH_IMAGE002
in order to quantize the error regularization term,
Figure 962180DEST_PATH_IMAGE011
is a coefficient for balancing
Figure 180672DEST_PATH_IMAGE010
And
Figure 243306DEST_PATH_IMAGE002
of the order of magnitude. In the actual application processThe training process continues for 5-15 rounds until the model converges, i.e., the total loss function is lowest.
In some embodiments of the present application, the loss data model includes:
Figure 180038DEST_PATH_IMAGE012
wherein the content of the first and second substances,
Figure 415847DEST_PATH_IMAGE010
in order to lose the data it is necessary to,
Figure 692107DEST_PATH_IMAGE013
for the number of input pictures for this iteration,
Figure 987960DEST_PATH_IMAGE014
for the j-th picture, the picture is,
Figure 352600DEST_PATH_IMAGE015
is the true category vector of the jth picture,
Figure 860942DEST_PATH_IMAGE016
and predicting a category vector for the model of the jth picture.
In order to implement the practical application of the method, a second aspect of the embodiments of the present application further provides a system for improving performance of a low bit quantization model, including: a first module for iterating a preset image matrix and a preset first weight; performing iterative computation on the image matrix and the first weight by adopting a preset algorithm to obtain an image vector, wherein the image vector is the category prediction of the image matrix; a second module, configured to establish a loss data model according to the image vector and a preset category label, where the loss data model is used to represent a loss function of the image matrix; a third module that quantizes the first weight to obtain a second weight; calculating the first weight and the second weight by adopting a quantization error regular term calculation model to obtain quantization error regular term data of the first weight; a fourth module that treats the quantization error regularization term data as a constraint result for the data distribution of the first weight; obtaining a constraint result, wherein the constraint result is the lowest quantization error value of the first weight; carrying out weighted summation on the constraint result by using the loss data model to obtain a total loss function model; performing inverse gradient propagation on the total loss function model to obtain an optimized data result of the first weight, wherein the optimized data result is a result obtained by subtracting a gradient value obtained in the inverse gradient propagation from the first weight; and the optimization data result is used as a preset first weight mentioned in the first step, and the optimization process is repeatedly iterated until the neural network converges.
In some embodiments of the present application, the quantization error regularization term computation model includes:
Figure 319605DEST_PATH_IMAGE001
wherein the content of the first and second substances,
Figure 520779DEST_PATH_IMAGE002
in order to quantize the error regularization term,
Figure 369787DEST_PATH_IMAGE003
as to the number of total weights,
Figure 947399DEST_PATH_IMAGE004
for the ith weight in the model,
Figure 994989DEST_PATH_IMAGE005
in order to be the weight of the model,
Figure 570327DEST_PATH_IMAGE006
as model weights
Figure 968947DEST_PATH_IMAGE005
The value after the quantization is obtained by the quantization,
Figure 287933DEST_PATH_IMAGE007
as model weights
Figure 192960DEST_PATH_IMAGE005
The number of parameters of (2).
In some embodiments of the present application, the total loss function model includes:
Figure 1516DEST_PATH_IMAGE008
wherein the content of the first and second substances,
Figure 356274DEST_PATH_IMAGE009
as a function of the total loss, the loss,
Figure 478951DEST_PATH_IMAGE010
in order to balance the order of magnitude,
Figure 501133DEST_PATH_IMAGE002
in order to quantize the error regularization term,
Figure 418274DEST_PATH_IMAGE011
are coefficients.
In some embodiments of the present application, the loss data model includes:
Figure 525907DEST_PATH_IMAGE012
wherein the content of the first and second substances,
Figure 983433DEST_PATH_IMAGE010
in order to lose the data it is necessary to,
Figure 860122DEST_PATH_IMAGE013
for the number of input pictures for this iteration,
Figure 13410DEST_PATH_IMAGE014
for the j-th picture, the picture is,
Figure 546023DEST_PATH_IMAGE015
is the true category vector of the jth picture,
Figure 807240DEST_PATH_IMAGE016
and predicting a category vector for the model of the jth picture.
According to the technical scheme, the method for improving the performance of the low bit quantization model is characterized in that a preset algorithm is adopted to carry out iterative computation on a preset image matrix and a preset first weight to obtain an image vector, and the image vector is a category prediction of the image matrix; establishing a loss data model according to the image vector and a preset class label, wherein the loss data model is used for representing a loss function of the image matrix; quantizing the first weight to obtain a second weight; calculating the first weight and the second weight by adopting a quantization error regular term calculation model to obtain quantization error regular term data of the first weight; the quantization error regular term data is regarded as a constraint result of the data distribution condition of the first weight; obtaining a constraint result, wherein the constraint result is the lowest quantization error value of the first weight; carrying out weighted summation on the constraint result and the loss function of the image matrix to obtain a total loss function model; and performing inverse gradient propagation on the total loss function model to obtain an optimized data result of the first weight, wherein the optimized data result is a result of subtracting a gradient value obtained in the inverse gradient propagation from the first weight.
In the practical application process, the method for improving the performance of the low-bit quantization model uses the quantization error regular term, and avoids the defects of unstable training and difficult convergence in the quantization perception training; meanwhile, the quantization error regular term can be directly added in the fine tuning stage of the model, and the calculation amount is less compared with that of quantization perception training; finally, the quantization error regular term is optimized only aiming at the model weight, no contradiction is generated with the quantization method output by the intermediate layer, and the performance of the quantized model is further improved.
The present application has been described in detail with reference to specific embodiments and illustrative examples, but the description is not intended to limit the application. Those skilled in the art will appreciate that various equivalent substitutions, modifications or improvements may be made to the presently disclosed embodiments and implementations thereof without departing from the spirit and scope of the present disclosure, and these fall within the scope of the present disclosure.

Claims (8)

1. A method for improving performance of a low bit quantization model, comprising:
performing iterative computation on a preset image matrix and a preset first weight by adopting a preset algorithm to obtain an image vector, wherein the image vector is the category prediction of the image matrix;
establishing a loss data model according to the image vector and a preset class label, wherein the loss data model is used for representing a loss function of the image matrix;
quantizing the first weight to obtain a second weight;
calculating the first weight and the second weight by adopting a quantization error regular term calculation model to obtain quantization error regular term data of the first weight;
the quantization error regular item data is regarded as a constraint result of the data distribution condition of the first weight; obtaining a constraint result, wherein the constraint result is the lowest quantization error value of the first weight;
carrying out weighted summation on the constraint result and the loss function of the image matrix to obtain a total loss function model;
performing inverse gradient propagation on the total loss function model to obtain an optimized data result of the first weight, wherein the optimized data result is a result obtained by subtracting a gradient value obtained in the inverse gradient propagation from the first weight;
and taking the optimized data result of the first weight as a preset first weight, and repeating iteration until the neural network converges.
2. A method for improving the performance of a low bit quantization model according to claim 1, wherein the quantization error regularization term computation model comprises:
Figure 281353DEST_PATH_IMAGE001
wherein the content of the first and second substances,
Figure 57548DEST_PATH_IMAGE002
in order to quantize the error regularization term,
Figure 102865DEST_PATH_IMAGE003
as to the number of total weights,
Figure 91549DEST_PATH_IMAGE004
for the ith weight in the model,
Figure 968238DEST_PATH_IMAGE005
in order to be the weight of the model,
Figure 853018DEST_PATH_IMAGE006
as model weights
Figure 982035DEST_PATH_IMAGE005
The value after the quantization is obtained by the quantization,
Figure 446515DEST_PATH_IMAGE007
as model weights
Figure 177710DEST_PATH_IMAGE005
The number of parameters of (2).
3. The method of claim 1, wherein the total loss function model comprises:
Figure 233391DEST_PATH_IMAGE008
wherein the content of the first and second substances,
Figure 315616DEST_PATH_IMAGE009
as a function of the total loss, the loss,
Figure 849366DEST_PATH_IMAGE010
in order to lose the data it is necessary to,
Figure 231806DEST_PATH_IMAGE002
in order to quantize the error regularization term,
Figure 927229DEST_PATH_IMAGE011
are coefficients.
4. The method of claim 1, wherein the loss data model comprises:
Figure 234101DEST_PATH_IMAGE012
wherein the content of the first and second substances,
Figure 571542DEST_PATH_IMAGE010
in order to lose the data it is necessary to,
Figure 542909DEST_PATH_IMAGE013
for the number of input pictures for this iteration,
Figure 940392DEST_PATH_IMAGE014
for the j-th picture, the picture is,
Figure 934893DEST_PATH_IMAGE015
is the true category vector of the jth picture,
Figure 341604DEST_PATH_IMAGE016
and predicting a category vector for the model of the jth picture.
5. A system for improving performance of a low bit quantization model, comprising:
a first module for iterating a preset image matrix and a preset first weight; performing iterative computation on the image matrix and the first weight by adopting a preset algorithm to obtain an image vector, wherein the image vector is the category prediction of the image matrix;
a second module, configured to establish a loss data model according to the image vector and a preset category label, where the loss data model is used to represent a loss function of the image matrix;
a third module that quantizes the first weight to obtain a second weight; calculating the first weight and the second weight by adopting a quantization error regular term calculation model to obtain quantization error regular term data of the first weight;
a fourth module that treats the quantization error regularization term data as a constraint result for the data distribution of the first weight; obtaining a constraint result, wherein the constraint result is the lowest quantization error value of the first weight; carrying out weighted summation on the constraint result by using the loss data model to obtain a total loss function model; performing inverse gradient propagation on the total loss function model to obtain an optimized data result of the first weight, wherein the optimized data result is a result obtained by subtracting a gradient value obtained in the inverse gradient propagation from the first weight; and taking the optimized data result of the first weight as a preset first weight, and repeating iteration until the neural network converges.
6. A system for improving the performance of a low bit quantization model according to claim 5, wherein the quantization error regularization term computation model comprises:
Figure 901898DEST_PATH_IMAGE001
wherein the content of the first and second substances,
Figure 939124DEST_PATH_IMAGE002
in order to quantize the error regularization term,
Figure 952079DEST_PATH_IMAGE003
as to the number of total weights,
Figure 899831DEST_PATH_IMAGE004
for the ith weight in the model,
Figure 314632DEST_PATH_IMAGE005
in order to be the weight of the model,
Figure 585076DEST_PATH_IMAGE006
as model weights
Figure 288590DEST_PATH_IMAGE005
The value after the quantization is obtained by the quantization,
Figure 568262DEST_PATH_IMAGE007
as model weights
Figure 837569DEST_PATH_IMAGE005
The number of parameters of (2).
7. A system for improving performance of a low bit quantization model according to claim 5, wherein the overall loss function model comprises:
Figure 278915DEST_PATH_IMAGE008
wherein the content of the first and second substances,
Figure 266462DEST_PATH_IMAGE009
as a function of the total loss, the loss,
Figure 556017DEST_PATH_IMAGE010
in order to balance the order of magnitude,
Figure 883093DEST_PATH_IMAGE002
in order to quantize the error regularization term,
Figure 229761DEST_PATH_IMAGE011
are coefficients.
8. The system of claim 5, wherein the loss data model comprises:
Figure 704604DEST_PATH_IMAGE012
wherein the content of the first and second substances,
Figure 60499DEST_PATH_IMAGE010
in order to lose the data it is necessary to,
Figure 304399DEST_PATH_IMAGE013
for the number of input pictures for this iteration,
Figure 759651DEST_PATH_IMAGE014
for the j-th picture, the picture is,
Figure 721791DEST_PATH_IMAGE015
is the true category vector of the jth picture,
Figure 615798DEST_PATH_IMAGE016
and predicting a category vector for the model of the jth picture.
CN202210400848.8A 2022-04-18 2022-04-18 Method and system for improving performance of low bit quantization model Pending CN114511069A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210400848.8A CN114511069A (en) 2022-04-18 2022-04-18 Method and system for improving performance of low bit quantization model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210400848.8A CN114511069A (en) 2022-04-18 2022-04-18 Method and system for improving performance of low bit quantization model

Publications (1)

Publication Number Publication Date
CN114511069A true CN114511069A (en) 2022-05-17

Family

ID=81554833

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210400848.8A Pending CN114511069A (en) 2022-04-18 2022-04-18 Method and system for improving performance of low bit quantization model

Country Status (1)

Country Link
CN (1) CN114511069A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117689044A (en) * 2024-02-01 2024-03-12 厦门大学 Quantification method suitable for vision self-attention model

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117689044A (en) * 2024-02-01 2024-03-12 厦门大学 Quantification method suitable for vision self-attention model

Similar Documents

Publication Publication Date Title
CN108510067B (en) Convolutional neural network quantification method based on engineering realization
CN110969251B (en) Neural network model quantification method and device based on label-free data
US20230004813A1 (en) Jointly pruning and quantizing deep neural networks
CN111489364A (en) Medical image segmentation method based on lightweight full convolution neural network
CN113610227B (en) Deep convolutional neural network pruning method for image classification
US11531884B2 (en) Separate quantization method of forming combination of 4-bit and 8-bit data of neural network
CN114580281A (en) Model quantization method, apparatus, device, storage medium, and program product
US20210294874A1 (en) Quantization method based on hardware of in-memory computing and system thereof
CN114511069A (en) Method and system for improving performance of low bit quantization model
CN114756517A (en) Visual Transformer compression method and system based on micro-quantization training
CN111937011A (en) Method and equipment for determining weight parameters of neural network model
CN110288002B (en) Image classification method based on sparse orthogonal neural network
CN112686384A (en) Bit-width-adaptive neural network quantization method and device
CN112766492A (en) Model processing method and device, electronic equipment and storage medium
CN116634162A (en) Post-training quantization method for rate-distortion optimized image compression neural network
CN114830137A (en) Method and system for generating a predictive model
US20200372363A1 (en) Method of Training Artificial Neural Network Using Sparse Connectivity Learning
CN112766537A (en) Short-term electric load prediction method
CN114139678A (en) Convolutional neural network quantization method and device, electronic equipment and storage medium
CN112488291A (en) Neural network 8-bit quantization compression method
CN112508194B (en) Model compression method, system and computing equipment
US20230385600A1 (en) Optimizing method and computing apparatus for deep learning network and computer-readable storage medium
WO2024060727A1 (en) Method and apparatus for training neural network model, and device and system
Zhen et al. A Secure and Effective Energy-Aware Fixed-Point Quantization Scheme for Asynchronous Federated Learning.
CN113627595B (en) Probability-based MobileNet V1 network channel pruning method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20220517