CN114511069A - Method and system for improving performance of low bit quantization model - Google Patents
Method and system for improving performance of low bit quantization model Download PDFInfo
- Publication number
- CN114511069A CN114511069A CN202210400848.8A CN202210400848A CN114511069A CN 114511069 A CN114511069 A CN 114511069A CN 202210400848 A CN202210400848 A CN 202210400848A CN 114511069 A CN114511069 A CN 114511069A
- Authority
- CN
- China
- Prior art keywords
- weight
- model
- quantization
- data
- result
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000013139 quantization Methods 0.000 title claims abstract description 134
- 238000000034 method Methods 0.000 title claims abstract description 46
- 239000011159 matrix material Substances 0.000 claims abstract description 42
- 238000004364 calculation method Methods 0.000 claims abstract description 23
- 238000013528 artificial neural network Methods 0.000 claims abstract description 14
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 13
- 238000013499 data model Methods 0.000 claims description 28
- 239000000126 substance Substances 0.000 claims description 18
- 230000006870 function Effects 0.000 abstract description 43
- 238000012549 training Methods 0.000 abstract description 30
- 230000008569 process Effects 0.000 abstract description 17
- 230000008447 perception Effects 0.000 abstract description 12
- 230000007547 defect Effects 0.000 abstract description 4
- 230000000694 effects Effects 0.000 description 5
- 238000003062 neural network model Methods 0.000 description 4
- 238000005457 optimization Methods 0.000 description 4
- 230000004913 activation Effects 0.000 description 2
- 230000006835 compression Effects 0.000 description 2
- 238000007906 compression Methods 0.000 description 2
- 230000014509 gene expression Effects 0.000 description 2
- 238000002620 method output Methods 0.000 description 2
- 101100317378 Mus musculus Wnt3 gene Proteins 0.000 description 1
- 230000008094 contradictory effect Effects 0.000 description 1
- 238000009795 derivation Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- QVRVXSZKCXFBTE-UHFFFAOYSA-N n-[4-(6,7-dimethoxy-3,4-dihydro-1h-isoquinolin-2-yl)butyl]-2-(2-fluoroethoxy)-5-methylbenzamide Chemical compound C1C=2C=C(OC)C(OC)=CC=2CCN1CCCCNC(=O)C1=CC(C)=CC=C1OCCF QVRVXSZKCXFBTE-UHFFFAOYSA-N 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 229940088594 vitamin Drugs 0.000 description 1
- 229930003231 vitamin Natural products 0.000 description 1
- 235000013343 vitamin Nutrition 0.000 description 1
- 239000011782 vitamin Substances 0.000 description 1
- 150000003722 vitamin derivatives Chemical class 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
Abstract
The application relates to the field of neural network quantization, in particular to a method for improving the performance of a low bit quantization model, which comprises the following steps: performing iterative computation on a preset image matrix and a preset first weight by adopting a preset algorithm to obtain an image vector; quantizing the first weight to obtain a second weight; calculating the first weight and the second weight to obtain quantization error regular term data of the first weight; the quantization error regular term data is regarded as a constraint result of the data distribution condition of the first weight; obtaining a constraint result; obtaining a total loss function model; carrying out reverse gradient propagation on the total loss function model to obtain an optimized data result of a first weight, and iterating until a neural network converges; in the practical application process, a quantization error regular term is used, so that the defect of unstable training in quantization perception training is avoided; the quantization error regular term can be directly added in the fine tuning stage of the model, and the quantization error regular term is less in calculation amount compared with quantization perception training.
Description
Technical Field
The present application relates to the field of neural network quantization, and in particular, to a method and system for improving performance of a low bit quantization model.
Background
The deep neural network model is widely applied to machine vision tasks such as image classification and target detection and natural language processing tasks, and achieves huge achievement. However, the deep neural network model cannot exert a good effect on a mobile terminal or an embedded device due to the limitation of storage resources and computing resources, and thus the compression and the light weight of the deep neural network are a problem to be solved urgently. In recent years, engineers have made many research efforts in the compression direction of deep neural networks, in which quantization is one of the methods of compressing deep neural networks.
The common quantized neural network model uses parameters expressed by low-order precision figures to carry out calculations such as convolution, activation, batch normalization and the like, and in an inference stage, the deep neural network only needs to carry out forward propagation once and uses the low-order precision figures to carry out calculation; therefore, the network parameters are expressed by int16 bits occupying 2 bytes or int8 bits occupying 1 byte, which are respectively called int16 (16-bit integer number) quantization and int8 quantization, and the quantized model can greatly reduce memory consumption and calculation amount and can also be deployed on hardware only supporting integer operation.
Common quantization methods cause obvious errors during low-bit precision quantization, and the lower bit precision errors are larger; in order to compensate for errors caused by direct quantization, quantization is introduced in the model training process by a quantization perception training method, and the quantized values are used for reasoning and back propagation; however, in quantization, rounding operation needs to be performed on the values of the weight and the output of the network, and the quantized values are not derivable, so that the quantization perception training method widely uses a straight-through estimator, the straight-through estimator enables the derivative of the input of the rounding function to be equal to the derivative of the output of the rounding function, and limits the range of the derivative of the output; the above training method causes the network training to become unstable, so that the convergence rate in the training becomes slow, the calculation amount is large, and the effect becomes poor.
Disclosure of Invention
In order to solve the problems that network training becomes unstable, convergence speed in training becomes slow, calculation amount is large and effect is poor due to a quantitative perception training method in the prior art, the application provides a method for improving performance of a low-bit quantization model, which is characterized by comprising the following steps of:
performing iterative computation on a preset image matrix and a preset first weight by adopting a preset algorithm to obtain an image vector, wherein the image vector is the category prediction of the image matrix;
establishing a loss data model according to the image vector and a preset class label, wherein the loss data model is used for representing a loss function of the image matrix;
quantizing the first weight to obtain a second weight;
calculating the first weight and the second weight by adopting a quantization error regular term calculation model to obtain quantization error regular term data of the first weight;
the quantization error regular term data is regarded as a constraint result of the data distribution condition of the first weight; obtaining a constraint result, wherein the constraint result is the lowest quantization error value of the first weight;
carrying out weighted summation on the constraint result and the loss function of the image matrix to obtain a total loss function model;
performing inverse gradient propagation on the total loss function model to obtain an optimized data result of the first weight, wherein the optimized data result is a result obtained by subtracting a gradient value obtained in the inverse gradient propagation from the first weight; and taking the optimized data result of the first weight as a preset first weight, and repeating iteration until the neural network converges.
Further, the quantization error regularization term computation model includes:
wherein the content of the first and second substances,in order to quantize the error regularization term,as to the number of total weights,for the ith weight in the model,in order to be the weight of the model,as model weightsThe value after the quantization is obtained by the quantization,as model weightsThe number of parameters of (2).
Further, the total loss function model includes:
wherein the content of the first and second substances,as a function of the total loss, the loss,in order to lose the data it is necessary to,in order to quantize the error regularization term,are coefficients.
Further, the loss data model includes:
wherein the content of the first and second substances,in order to lose the data it is necessary to,for the number of input pictures for this iteration,for the j-th picture, the picture is,is the true category vector of the jth picture,and predicting a category vector for the model of the jth picture.
A system for improving performance of a low bit quantization model comprising:
a first module for iterating a preset image matrix and a preset first weight; performing iterative computation on the image matrix and the first weight by adopting a preset algorithm to obtain an image vector, wherein the image vector is the category prediction of the image matrix;
a second module, configured to establish a loss data model according to the image vector and a preset category label, where the loss data model is used to represent a loss function of the image matrix;
a third module that quantizes the first weight to obtain a second weight; calculating the first weight and the second weight by adopting a quantization error regular term calculation model to obtain quantization error regular term data of the first weight;
a fourth module that treats the quantization error regularization term data as a constraint result for the data distribution of the first weight; obtaining a constraint result, wherein the constraint result is the lowest quantization error value of the first weight; carrying out weighted summation on the constraint result by using the loss data model to obtain a total loss function model; performing inverse gradient propagation on the total loss function model to obtain an optimized data result of the first weight, wherein the optimized data result is a result obtained by subtracting a gradient value obtained in the inverse gradient propagation from the first weight; and taking the optimized data result of the first weight as a preset first weight, and repeatedly iterating the four modules until the neural network converges.
Further, the quantization error regularization term computation model includes:
wherein the content of the first and second substances,in order to quantize the error regularization term,as to the number of total weights,for the ith weight in the model,in order to be the weight of the model,as model weightsThe value after the quantization is obtained by the quantization,as model weightsThe number of parameters of (2).
Further, the total loss function model includes:
wherein the content of the first and second substances,as a function of the total loss, the loss,in order to balance the order of magnitude,in order to quantize the error regularization term,are coefficients.
Further, the loss data model includes:
wherein the content of the first and second substances,in order to lose the data it is necessary to,for the number of input pictures for this iteration,for the j-th picture, the picture is,is the true category vector of the jth picture,and predicting a category vector for the model of the jth picture.
According to the technical scheme, the method for improving the performance of the low bit quantization model comprises the following steps: performing iterative computation on a preset image matrix and a preset first weight by adopting a preset algorithm to obtain an image vector, wherein the image vector is the category prediction of the image matrix; establishing a loss data model according to the image vector and a preset class label, wherein the loss data model is used for representing a loss function of the image matrix; quantizing the first weight to obtain a second weight; calculating the first weight and the second weight by adopting a quantization error regular term calculation model to obtain quantization error regular term data of the first weight; the quantization error regular term data is regarded as a constraint result of the data distribution condition of the first weight; obtaining a constraint result, wherein the constraint result is the lowest quantization error value of the first weight; carrying out weighted summation on the constraint result and the loss function of the image matrix to obtain a total loss function model; performing inverse gradient propagation on the total loss function model to obtain an optimized data result of the first weight, wherein the optimized data result is a result obtained by subtracting a gradient value obtained in the inverse gradient propagation from the first weight; and taking the optimized data result of the first weight as a preset first weight, and repeating iteration until the neural network converges.
In the practical application process, the method for improving the performance of the low-bit quantization model uses the quantization error regular term, and avoids the defects of unstable training and difficult convergence in the quantization perception training; meanwhile, the quantization error regular term can be directly added in the fine tuning stage of the model, and the calculation amount is less compared with that of quantization perception training; finally, the quantization error regular term is optimized only aiming at the model weight, no contradiction is generated with the quantization method output by the intermediate layer, and the performance of the quantized model is further improved.
Drawings
In order to more clearly explain the technical solution of the present application, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious to those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a flowchart illustrating a method for improving performance of a low bit quantization model according to an embodiment of the present disclosure.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some but not all embodiments of the present application. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations.
In the description of the present application, it is also to be noted that, unless explicitly stated or limited otherwise, the term "connected" is to be understood in a broad sense, e.g. electrically, but also communicatively, connected. The specific meaning of the above terms in the present application can be understood in a specific case by those of ordinary skill in the art.
Furthermore, the terms "comprises" and "comprising," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a product or device that comprises a list of elements is not necessarily limited to those elements explicitly listed, but may include other elements not expressly listed or inherent to such product or device.
The method aims to solve the problems that network training becomes unstable, the convergence speed in the training becomes slow, the calculated amount is large and the effect is poor due to a quantitative perception training method in the prior art; referring to fig. 1, a schematic flowchart of a method for improving performance of a low bit quantization model according to an embodiment of the present application is shown; in a first aspect, an embodiment of the present application provides a method for improving performance of a low bit quantization model, including: and performing iterative computation on a preset image matrix and a preset first weight by adopting a preset algorithm to obtain an image vector, wherein the image vector is the category prediction of the image matrix.
In some embodiments of the present application, the image matrix is represented in matrix dimensionsExpressed as a matrix, the number being the size of each dimension, i.e.Wherein N represents N pictures, and 3 representsEach picture has three channels of RGB, h and w are the height and width of the picture respectively, for an image classification task, each picture has a class which is expressed as a number of 1-1000, and a class label is expressed as(N is the number of pictures, the 1000-dimensional vector represents the class of this picture, e.g. if it is class 1, then the class label vector of this image isOf 1 atClass 1 isThe vitamin is 1, the rest is 0. The first weight has data represented as a setWhereinIs a matrix representing the weight data of the ith operation, e.g.Is the weight of a layer of convolution, with a size of,As the number of channels to be output,k is the size of the convolution kernel for the number of channels input.
In some embodiments of the present application, a preset algorithm is adopted to perform iterative computation on the initial data and the first weight to obtain an image vector, where the image vector is a category prediction of the initial data; the preset algorithm is convolution multiplication or matrix multiplication; further, the initial data and the first weight are calculated by convolution multiplication or matrix multiplication to obtain output data, the output data is subjected to an activation function to obtain data serving as new initial output, iterative calculation is performed by combining the first weight, and finally a 1000-dimensional image vector is output and serves as class prediction of the initial data.
In some embodiments of the present application, a loss data model is established according to the image vector and a preset class label, where the loss data model is used to represent a loss function of the image matrix; specifically, the probability that a picture belongs to each category in 1000 categories is predicted according to the image vector representation model; consistent with the ordinary training process; according to the method, a regular term is added after a classification task loss model is calculated, a scaling coefficient is added to adjust the influence of the regular term on training, and the distribution of weight parameters can be limited in the training process, so that the weight parameters have lower quantization errors under the distribution, namely the parameters are closer to the quantized values. The regularization term can be added directly in the fine tuning stage of the model, and the regularization term is less computational in comparison with quantitative perception training.
In some embodiments of the application, the quantization error regularization term added in the training has a small influence on the performance of the full-precision floating point number model, so that the quantization error regularization term can be added in the fine tuning process for a specific application task (the fine tuning step is performed in the full-precision model deployment process without considering quantization), and a stage does not need to be separately set for quantization. And because the regular term is calculated once in each iteration in the using process, the calculation is not separately calculated for each input data (each iteration can simultaneously input dozens to hundreds of input data), and the calculation amount is lower.
In some embodiments of the application, a loss data model is established according to the image vector and a preset category label, and is added with a quantization error regular term to obtain a total loss function; calculating the gradient of the total loss function to each weight according to a chain type derivation rule, and then updating the weight according to the gradient to reduce the total loss function; the model weight can be updated towards a direction of more accurate classification based on task loss updating, the model weight can be gradually close to a quantized value based on quantization error loss item updating, the quantization error of the model is reduced, and performance loss caused by quantization is reduced.
In some embodiments of the present application, after determining a weight quantization algorithm and implementing a code of quantization and inverse quantization, quantizing the first weight to obtain a second weight; and calculating the first weight and the second weight by adopting a quantization error regular term calculation model to obtain quantization error regular term data of the first weight, namely the L2 norm of the variation before and after model quantization. Specifically, the back propagation process is consistent with a floating point model, only a quantization error regular term is added into a loss function, and all involved operations are conducted.
In some embodiments of the present application, the quantization error regularization term data is regarded as a constraint result of the data distribution condition of the first weight; obtaining a constraint result, wherein the constraint result is the lowest quantization error value of the first weight; carrying out weighted summation on the constraint result and the second result data to obtain a total loss function model; and performing inverse gradient propagation on the total loss function model to obtain an optimized data result of the first weight, wherein the optimized data result is a result of subtracting a gradient value obtained in the inverse gradient propagation from the first weight.
In some embodiments of the present application, image classification of preset initial data, total 6 models used, the Deit-tiny, Deit-base, Swin-tiny, ViT-base, are all image deep neural network models based on a transform structure, a data set used for model fine tuning is ImageNet, 1000 classes and about 100 ten thousand pictures are total, and a test is performed on the ImageNet data set, as shown in table 1:
TABLE 1 Performance Table of the model
Specifically, in the present embodiment, the expressions of 6 models are compared in table 1, and except for the precision of floating point (the performance of the model with the precision of floating point, i.e. the performance of the model before quantization), the weights, calculations and intermediate values of all models are quantized, wherein the weights are quantized to 4-bit precision (int4), the attention weights (important components in the transform structure, belonging to the intermediate values) are quantized to 4-bit precision, and the rest are all quantized to 8-bit precision. The number in the table is the accuracy of image classification of the model on the ImageNet data set, and 4-bit fine adjustment represents model expression of 4-bit quantization after fine adjustment is carried out by using a quantization error regular term proposed by us; it can be seen that the direct 4-bit quantization of the full-precision model results in a large loss of model performance, and if the quantization is performed on the model finely tuned by using the quantization error regular term, the performance of the quantized model can be obviously improved; meanwhile, the quantization error regular term is optimized only aiming at the model weight, and is not contradictory to a plurality of quantization methods aiming at the output of the model intermediate layer, so that the quantization error regular term can be used simultaneously, and the performance of the quantized model is further improved.
As can be seen from the above technical solutions, the method for improving the performance of the low bit quantization model provided by the present application includes: performing iterative computation on a preset image matrix and a preset first weight by adopting a preset algorithm to obtain an image vector, wherein the image vector is the category prediction of the image matrix; establishing a loss data model according to the image vector and a preset class label, wherein the loss data model is used for representing a loss function of the image matrix; quantizing the first weight to obtain a second weight; calculating the first weight and the second weight by adopting a quantization error regular term calculation model to obtain quantization error regular term data of the first weight; the quantization error regular term data is regarded as a constraint result of the data distribution condition of the first weight; obtaining a constraint result, wherein the constraint result is the lowest quantization error value of the first weight; carrying out weighted summation on the constraint result and the loss function of the image matrix to obtain a total loss function model; performing inverse gradient propagation on the total loss function model to obtain an optimized data result of the first weight, wherein the optimized data result is a result obtained by subtracting a gradient value obtained in the inverse gradient propagation from the first weight; and the optimization data result is used as a preset first weight mentioned in the first step, and the optimization process is repeatedly iterated until the neural network converges.
In the practical application process, the added quantization error regular term calculation model is suitable for various quantization methods, and the defects of unstable training and difficult convergence of quantization perception training can be avoided; in the actual quantization process, the value of the quantized weight is only used as a leaf node in a calculation graph and does not participate in forward propagation and is not changed in the backward propagation gradient descending process, so that the problem of gradient backward propagation of non-derivable operation in various quantization operations is not considered, and the situation that the gradient of a straight-through estimator is inaccurate is not caused; and the scaling coefficient before the regular term can adjust the influence of the regular term on the training process, and a user can balance the effect and the stability.
In some embodiments of the present application, the quantization error regularization term computation model includes:
wherein the content of the first and second substances,in order to quantize the error regularization term,as to the number of total weights,for the ith weight in the model,in order to be the weight of the model,as model weightsThe value after the quantization is obtained by the quantization,as model weightsThe number of parameters of (a);the lower the model weight is, the closer it is to its quantized value, the smaller the quantization error.
In some embodiments of the present application, the total loss function model includes:
wherein the content of the first and second substances,as a function of the total loss, the loss,in order to lose the data it is necessary to,in order to quantize the error regularization term,is a coefficient for balancingAndof the order of magnitude. In the actual application processThe training process continues for 5-15 rounds until the model converges, i.e., the total loss function is lowest.
In some embodiments of the present application, the loss data model includes:
wherein the content of the first and second substances,in order to lose the data it is necessary to,for the number of input pictures for this iteration,for the j-th picture, the picture is,is the true category vector of the jth picture,and predicting a category vector for the model of the jth picture.
In order to implement the practical application of the method, a second aspect of the embodiments of the present application further provides a system for improving performance of a low bit quantization model, including: a first module for iterating a preset image matrix and a preset first weight; performing iterative computation on the image matrix and the first weight by adopting a preset algorithm to obtain an image vector, wherein the image vector is the category prediction of the image matrix; a second module, configured to establish a loss data model according to the image vector and a preset category label, where the loss data model is used to represent a loss function of the image matrix; a third module that quantizes the first weight to obtain a second weight; calculating the first weight and the second weight by adopting a quantization error regular term calculation model to obtain quantization error regular term data of the first weight; a fourth module that treats the quantization error regularization term data as a constraint result for the data distribution of the first weight; obtaining a constraint result, wherein the constraint result is the lowest quantization error value of the first weight; carrying out weighted summation on the constraint result by using the loss data model to obtain a total loss function model; performing inverse gradient propagation on the total loss function model to obtain an optimized data result of the first weight, wherein the optimized data result is a result obtained by subtracting a gradient value obtained in the inverse gradient propagation from the first weight; and the optimization data result is used as a preset first weight mentioned in the first step, and the optimization process is repeatedly iterated until the neural network converges.
In some embodiments of the present application, the quantization error regularization term computation model includes:
wherein the content of the first and second substances,in order to quantize the error regularization term,as to the number of total weights,for the ith weight in the model,in order to be the weight of the model,as model weightsThe value after the quantization is obtained by the quantization,as model weightsThe number of parameters of (2).
In some embodiments of the present application, the total loss function model includes:
wherein the content of the first and second substances,as a function of the total loss, the loss,in order to balance the order of magnitude,in order to quantize the error regularization term,are coefficients.
In some embodiments of the present application, the loss data model includes:
wherein the content of the first and second substances,in order to lose the data it is necessary to,for the number of input pictures for this iteration,for the j-th picture, the picture is,is the true category vector of the jth picture,and predicting a category vector for the model of the jth picture.
According to the technical scheme, the method for improving the performance of the low bit quantization model is characterized in that a preset algorithm is adopted to carry out iterative computation on a preset image matrix and a preset first weight to obtain an image vector, and the image vector is a category prediction of the image matrix; establishing a loss data model according to the image vector and a preset class label, wherein the loss data model is used for representing a loss function of the image matrix; quantizing the first weight to obtain a second weight; calculating the first weight and the second weight by adopting a quantization error regular term calculation model to obtain quantization error regular term data of the first weight; the quantization error regular term data is regarded as a constraint result of the data distribution condition of the first weight; obtaining a constraint result, wherein the constraint result is the lowest quantization error value of the first weight; carrying out weighted summation on the constraint result and the loss function of the image matrix to obtain a total loss function model; and performing inverse gradient propagation on the total loss function model to obtain an optimized data result of the first weight, wherein the optimized data result is a result of subtracting a gradient value obtained in the inverse gradient propagation from the first weight.
In the practical application process, the method for improving the performance of the low-bit quantization model uses the quantization error regular term, and avoids the defects of unstable training and difficult convergence in the quantization perception training; meanwhile, the quantization error regular term can be directly added in the fine tuning stage of the model, and the calculation amount is less compared with that of quantization perception training; finally, the quantization error regular term is optimized only aiming at the model weight, no contradiction is generated with the quantization method output by the intermediate layer, and the performance of the quantized model is further improved.
The present application has been described in detail with reference to specific embodiments and illustrative examples, but the description is not intended to limit the application. Those skilled in the art will appreciate that various equivalent substitutions, modifications or improvements may be made to the presently disclosed embodiments and implementations thereof without departing from the spirit and scope of the present disclosure, and these fall within the scope of the present disclosure.
Claims (8)
1. A method for improving performance of a low bit quantization model, comprising:
performing iterative computation on a preset image matrix and a preset first weight by adopting a preset algorithm to obtain an image vector, wherein the image vector is the category prediction of the image matrix;
establishing a loss data model according to the image vector and a preset class label, wherein the loss data model is used for representing a loss function of the image matrix;
quantizing the first weight to obtain a second weight;
calculating the first weight and the second weight by adopting a quantization error regular term calculation model to obtain quantization error regular term data of the first weight;
the quantization error regular item data is regarded as a constraint result of the data distribution condition of the first weight; obtaining a constraint result, wherein the constraint result is the lowest quantization error value of the first weight;
carrying out weighted summation on the constraint result and the loss function of the image matrix to obtain a total loss function model;
performing inverse gradient propagation on the total loss function model to obtain an optimized data result of the first weight, wherein the optimized data result is a result obtained by subtracting a gradient value obtained in the inverse gradient propagation from the first weight;
and taking the optimized data result of the first weight as a preset first weight, and repeating iteration until the neural network converges.
2. A method for improving the performance of a low bit quantization model according to claim 1, wherein the quantization error regularization term computation model comprises:
wherein the content of the first and second substances,in order to quantize the error regularization term,as to the number of total weights,for the ith weight in the model,in order to be the weight of the model,as model weightsThe value after the quantization is obtained by the quantization,as model weightsThe number of parameters of (2).
4. The method of claim 1, wherein the loss data model comprises:
wherein the content of the first and second substances,in order to lose the data it is necessary to,for the number of input pictures for this iteration,for the j-th picture, the picture is,is the true category vector of the jth picture,and predicting a category vector for the model of the jth picture.
5. A system for improving performance of a low bit quantization model, comprising:
a first module for iterating a preset image matrix and a preset first weight; performing iterative computation on the image matrix and the first weight by adopting a preset algorithm to obtain an image vector, wherein the image vector is the category prediction of the image matrix;
a second module, configured to establish a loss data model according to the image vector and a preset category label, where the loss data model is used to represent a loss function of the image matrix;
a third module that quantizes the first weight to obtain a second weight; calculating the first weight and the second weight by adopting a quantization error regular term calculation model to obtain quantization error regular term data of the first weight;
a fourth module that treats the quantization error regularization term data as a constraint result for the data distribution of the first weight; obtaining a constraint result, wherein the constraint result is the lowest quantization error value of the first weight; carrying out weighted summation on the constraint result by using the loss data model to obtain a total loss function model; performing inverse gradient propagation on the total loss function model to obtain an optimized data result of the first weight, wherein the optimized data result is a result obtained by subtracting a gradient value obtained in the inverse gradient propagation from the first weight; and taking the optimized data result of the first weight as a preset first weight, and repeating iteration until the neural network converges.
6. A system for improving the performance of a low bit quantization model according to claim 5, wherein the quantization error regularization term computation model comprises:
wherein the content of the first and second substances,in order to quantize the error regularization term,as to the number of total weights,for the ith weight in the model,in order to be the weight of the model,as model weightsThe value after the quantization is obtained by the quantization,as model weightsThe number of parameters of (2).
7. A system for improving performance of a low bit quantization model according to claim 5, wherein the overall loss function model comprises:
8. The system of claim 5, wherein the loss data model comprises:
wherein the content of the first and second substances,in order to lose the data it is necessary to,for the number of input pictures for this iteration,for the j-th picture, the picture is,is the true category vector of the jth picture,and predicting a category vector for the model of the jth picture.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210400848.8A CN114511069A (en) | 2022-04-18 | 2022-04-18 | Method and system for improving performance of low bit quantization model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210400848.8A CN114511069A (en) | 2022-04-18 | 2022-04-18 | Method and system for improving performance of low bit quantization model |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114511069A true CN114511069A (en) | 2022-05-17 |
Family
ID=81554833
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210400848.8A Pending CN114511069A (en) | 2022-04-18 | 2022-04-18 | Method and system for improving performance of low bit quantization model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114511069A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117689044A (en) * | 2024-02-01 | 2024-03-12 | 厦门大学 | Quantification method suitable for vision self-attention model |
-
2022
- 2022-04-18 CN CN202210400848.8A patent/CN114511069A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117689044A (en) * | 2024-02-01 | 2024-03-12 | 厦门大学 | Quantification method suitable for vision self-attention model |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108510067B (en) | Convolutional neural network quantification method based on engineering realization | |
CN110969251B (en) | Neural network model quantification method and device based on label-free data | |
US20230004813A1 (en) | Jointly pruning and quantizing deep neural networks | |
CN111489364A (en) | Medical image segmentation method based on lightweight full convolution neural network | |
CN113610227B (en) | Deep convolutional neural network pruning method for image classification | |
US11531884B2 (en) | Separate quantization method of forming combination of 4-bit and 8-bit data of neural network | |
CN114580281A (en) | Model quantization method, apparatus, device, storage medium, and program product | |
US20210294874A1 (en) | Quantization method based on hardware of in-memory computing and system thereof | |
CN114511069A (en) | Method and system for improving performance of low bit quantization model | |
CN114756517A (en) | Visual Transformer compression method and system based on micro-quantization training | |
CN111937011A (en) | Method and equipment for determining weight parameters of neural network model | |
CN110288002B (en) | Image classification method based on sparse orthogonal neural network | |
CN112686384A (en) | Bit-width-adaptive neural network quantization method and device | |
CN112766492A (en) | Model processing method and device, electronic equipment and storage medium | |
CN116634162A (en) | Post-training quantization method for rate-distortion optimized image compression neural network | |
CN114830137A (en) | Method and system for generating a predictive model | |
US20200372363A1 (en) | Method of Training Artificial Neural Network Using Sparse Connectivity Learning | |
CN112766537A (en) | Short-term electric load prediction method | |
CN114139678A (en) | Convolutional neural network quantization method and device, electronic equipment and storage medium | |
CN112488291A (en) | Neural network 8-bit quantization compression method | |
CN112508194B (en) | Model compression method, system and computing equipment | |
US20230385600A1 (en) | Optimizing method and computing apparatus for deep learning network and computer-readable storage medium | |
WO2024060727A1 (en) | Method and apparatus for training neural network model, and device and system | |
Zhen et al. | A Secure and Effective Energy-Aware Fixed-Point Quantization Scheme for Asynchronous Federated Learning. | |
CN113627595B (en) | Probability-based MobileNet V1 network channel pruning method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20220517 |