CN110276451A

CN110276451A - One kind being based on the normalized deep neural network compression method of weight

Info

Publication number: CN110276451A
Application number: CN201910575103.3A
Authority: CN
Inventors: 李武军; 蔡文朴
Original assignee: Nanjing University
Current assignee: Nanjing University
Priority date: 2019-06-28
Filing date: 2019-06-28
Publication date: 2019-09-24

Abstract

The invention discloses one kind to be based on the normalized deep neural network compression method of weight, the quantization of weight can be decomposed into three steps, weight is normalized first, minimum quantization error is then based on to quantify to extremely low bit weight, the weight that renormalization is quantified finally is carried out, the forward-propagating of neural network is carried out using the weight of quantization.In order to train the neural network of quantization weight, the present invention carries out approximation to the derivative of the quantization function of step form, so that the gradient of neural network can carry out backpropagation, gradient is accumulated in floating-point weight.The long-tail distribution for avoiding weight to a certain extent based on the normalized weight quantization of weight, to reduce quantization error, and then improves the performance of model.The weight of present some mainstream neural networks can be compressed to 2 bits, 3 bits by the present invention, while guarantee that the performance of model does not have biggish loss.

Description

One kind being based on the normalized deep neural network compression method of weight

Technical field

The present invention provides one kind based on the normalized deep neural network compression method of weight, is related to compressing (quantization) nerve The parameter of network can quantify the weight of model to extremely low bit number (2 bits, 3 bits) to be suitable for compressing various masters The parameter of the neural network of stream, such as ResNet, MobileNet etc., so that model can be deployed to mobile end equipment.

Background technique

With the development of deep learning, deep neural network is increasingly becoming the main model of machine learning.But depth Practising model has a large amount of parameter while needing great computing cost, is unfavorable for model to mobile device and embedded device portion Administration.According to existing research, there are bulk redundancies in deep neural network, thus can the parameter to model greatly pressed Contracting, while guaranteeing that the performance of model does not decline significantly.

Model Weight quantization is a main method of model compression, although existing method can be the parameter amount of model Change to 8 bits almost without performance loss, but hold power re-quantization to extremely low bit number when, often bring biggish property The loss of energy.

A kind of weight quantization method of mainstream is based on minimum quantization error quantization weight, but this method is by weight Long-tail distribution influence, lead to biggish Relative quantification error, in turn result in the loss of quantitative model performance.

Summary of the invention

Goal of the invention: the shadow that the current weight quantization method based on minimum quantization error is distributed by the long-tail of weight It rings, biggish Relative quantification error can be brought.In view of the above-mentioned problems, the present invention provides one kind to be based on the normalized depth of weight Neural network compression method.First weight is normalized, then weight is quantified based on minimum quantization error, finally to the anti-normalizing of weight Change the weight quantified.Approximation is carried out to the derivative of the quantization function of step form simultaneously, so that the reversed biography of neural network Broadcasting can be normally carried out.By the present invention in that weight normalization is carried out with maximum value element, to obtain maximum value The gradient profile of element is different from the gradient of other elements, so that maximum value element is all quickly leaned on to 0 value in each iteration Closely, so that the long-tail distribution of weight can be weakened after many iterations, lesser Relative quantification error is obtained, so that amount The performance loss for changing model reduces.

Technical solution: one kind being based on the normalized deep neural network compression method of weight, in neural network forward-propagating In the process, weight is normalized first, is then based on minimum quantization error quantization weight, then carry out what renormalization was quantified Weight carries out the forward-propagating of neural network using the weight of quantization；In back-propagation process, to the quantization letter of step form Several derivatives carries out approximation, so that neural network can be reversed propagation, to be trained end to end, gradient is accumulated in floating-point In weight.

The forward-propagating process first normalizes weight, is then based on minimum quantization error quantization weight, then into The weight that row renormalization is quantified, specific steps are as follows:

Step 100, the parameter for obtaining the full precision model of pre-training, each filtering to all convolutional layers and full articulamentum The parameter vector of device obtains w ∈ R^M。

Step 101, parameter w is normalized, using w maximum value element each element normalizing in w Change to [- 1,1], i.e.,

Step 102, based on pairMinimum quantization error solve optimal quantization base α, obtain corresponding quantized value set V (α)。

Step 103, to the weight after normalizationQuantization obtainsI.e.Projection function (quantization Function) Π ()In each element project to the set V (α) of quantized value.

Step 104, to the normalized weight of quantizationCarry out the weight w that renormalization is quantified^q, it is therefore an objective to maintenance dose The weight and initial parameter w of change have identical magnitude, i.e.,Detach () operation handlebar its In variable be considered as constant.

Step 105, the quantization weight w by obtaining^qConvolution operation (full connection is carried out with the input x of neural network this layer Operation) obtain the output y of this layer of neural network.

The back-propagation process carries out approximation to the derivative of the quantization function of step form, so that neural network can With backpropagation, specific steps are as follows:

Step 200, by the backpropagation of gradient, neural network loss function L is obtained to quantization weight w^qGradient

Step 201, according to the backpropagation of gradient, the normalized weight to quantization is obtainedGradient.

Step 202, approximation is carried out to the gradient of phase step type function Π (), i.e.,It obtains to normalized weight Gradient.

Step 203, according to the backpropagation of gradient, the gradient to original floating-point parameter w, the form of gradient are obtained are as follows:

Wherein w_iIt is i-th of element of w.

Step 204, using gradientUpdate floating-point weight w.

It is described based on rightMinimum quantization error solves the optimization aim of optimal quantization base α are as follows:

Wherein, M is the dimension of weight vectors, and K is the bit number of quantization, and B isBinary-coding, α be quantization base.α Define set V (α), V (α)={ α that all quantized values are constituted^Te_l|1≤l≤2^K, wherein e_l∈{-1,1}^KEnumerate all K The binary-coding of a bit.Using the above-mentioned target of thought Optimization Solution of alternative optimization, B is first fixed, the optimal solution of α is calculated, this Shi Wenti can convert recurrence problem solving, obtain new α；α is fixed again, calculates the optimal solution of B, problem is converted into projection at this time Problem, i.e. handleIn each element project to set V (α), obtain corresponding binary-coding, as new B；Continuous iteration The above process, until convergence.

The overall flow of the model training are as follows: to each filter parameter in neural network convolutional layer and full articulamentum The parameter w that w is quantified^q, use w^qCarry out neural network forward-propagating；Loss function L is calculated by forward-propagating, Then the backpropagation of gradient is carried out, wherein the derivative to phase step type function Π () carries out approximation, finally obtains and floating-point is joined The gradient of number w, updates floating point parameters w, and training needs iteration repeatedly until convergence.Final model need to only save the normalizing of quantization Change weightCorresponding binary-coding and quantization base are for predicting, without saving floating point parameters w.

The utility model has the advantages that compared with prior art, it is provided by the invention based on the normalized neural network weight quantization of weight Method can quantify the weight of neural network to extremely low bit, while guarantee that the performance of model is not lost significantly.Benefit Weight normalization is carried out with maximum value element, so that the gradient profile for obtaining maximum value element is different from other elements Gradient so that maximum value element is all quickly close to 0 value in each iteration, to can weaken after many iterations The long-tail of weight is distributed, and smaller quantization error is obtained, so that the performance loss of quantitative model reduces.

Detailed description of the invention

Fig. 1 is forward-propagating and the backpropagation schematic diagram of quantizing process of the invention；

Fig. 2 is that the neural network of quantization weight of the invention integrally trains flow chart；

Fig. 3 is quantization base α solution procedure of the invention；

Fig. 4 is the process predicted using trained quantitative model；

Fig. 5 is the floating-point weight of the method for the present invention and " the compact neural network of high-precision based on the quantization function that can learn " Profiles versus's figure.

Specific embodiment

Combined with specific embodiments below, the present invention is furture elucidated, it should be understood that these embodiments are merely to illustrate the present invention Rather than limit the scope of the invention, after the present invention has been read, those skilled in the art are to various equivalences of the invention The modification of form falls within the application range as defined in the appended claims.

Based on the normalized deep neural network compression method of weight, the forward-propagating and backpropagation of quantizing process are as schemed Shown in 1.Quantization function forward-propagating process are as follows: first each filter weight w is normalized to obtainIt is then based on minimum Quantization error solves quantization base α, obtains the set V (α) of quantized value by α, quantifies normalized weight according to V (α)Use Projection function Π_V(α)() arrives handleEach element project to set V (α) and obtainFinally carry out the renormalization amount of obtaining Change weight w^q, the magnitude of renormalization guarantee quantization weight is consistent with the magnitude of floating-point weight, uses the weight w of quantization^qCarry out net The forward-propagating of network.The back-propagation process of quantizing process carries out approximate, normalizing to the gradient of phase step type quantization function Π () Change and the derivative of the operation of renormalization can be directly obtained without approximation, may finally obtain to the gradient of floating-point weight w (such as Shown in step 204), gradient is accumulated in floating-point weight, so as to be trained end to end.

Based on the normalized deep neural network compression method of weight, whole training process is as shown in Figure 2.Firstly the need of taking The full precision model for obtaining a pre-training, then according to the forward-propagating process of the quantizer of Fig. 1 to convolutional layers all in network The floating-point weight w of each filter connected entirely quantifies, and uses the weight w of quantization^qNetwork is carried out with the training data of input Forward-propagating, the loss function L of network is calculated.Then the backpropagation Jing Guo neural network is obtained to quantization weight w^q Gradient the gradient to floating-point weight w is obtained according to the back-propagation process of the quantizer of Fig. 1, update floating-point weight w, repeatedly It updates until convergence.Floating-point weight w and quantization base α are saved, compression (low precision) can be further processed into before model prediction Format.

It is as shown in Figure 3 to quantify base α solution procedure.By the way of alternative optimization, initialization quantization base α first calculates institute Have the set V (α) of quantized value, thenIn each element project to V (α) and obtain corresponding binary-coding, it is as new B；According to current B andFind out the new enclosed optimal solution of quantization base α, the wheel number for repeating the above steps certain.

The process predicted using trained quantitative model as shown in figure 4, the floating-point weight w that is obtained first according to training and Quantify base α to calculateBinary-coding B, then calculate renormalization after quantization weight w^qCorresponding quantization base α^′=α × Max (| w |), final model need to only store quantization base α^′With binary-coding B, the storage overhead of model is substantially reduced.Convolution Inner product operation in operation and full attended operation can calculate in the following way:

Due to B_ijIt is two-value, value is { -1,1 }, so B_ij·x_iCalculating only need to be according to B_ijValue, to x_iIt takes just Or take negative, and it is operated without multiplication, therefore a large amount of multiplication in inner product operation can be eliminated and operated, it can be under specific hardware Accelerate deduction process.Furthermore some existing methods be can be combined with, input x is also quantified as low bit, greatly promote nerve The deduction speed of network.

The present invention is tested on both data sets, compared the effect of method and existing best method of the invention Fruit, existing best method are that Dongqing Zhang was proposed in European Computer vision international conference ECCV paper in 2018 " the compact neural network of high-precision based on the quantization function that can learn ".

First data set is CIFAR-100, and CIFAR-100 includes the RGB picture of 60K 32x32, altogether 100 classes Not, each classification has 600 pictures.Including 50K training pictures and 10K test pictures.Experimental result such as 1 institute of table Show, evaluation index is Top1 classification accuracy, and the network used is ResNet20.It can be seen that method ratio of the invention in it is existing most Good method has very big promotion, and can be more than the effect of full precision model when using 4 bit quantization weights.

Table 1 is the Top1 nicety of grading (%) that the present invention uses ResNet20 on CIFAR-100 data set

Table 2 is that the present invention is classified on ImageNet data set using the Top1/Top5 of MobileNetv1

Precision (%)

Second data set is ImageNet, and ImageNet includes that 1.28M training pictures and 50K open and test pictures, one Totally 1000 classifications.Experimental result is as shown in table 2, and evaluation index is that Top1 and Top5 classify accuracy, the network used for MobileNetv1.It can be seen that method ratio of the invention has very big promotion in existing best method.

Based on the normalized deep neural network compression method of weight, obtained weight distribution is as shown in figure 5, use 2 ratios Special quantization weight, above four figures be the obtained floating-point weight distribution of the present invention, four figures respectively correspond four convolution of selection Layer/full articulamentum floating-point weight distribution.Four figures are " the compact nerve net of high-precision based on the quantization function that can learn below The floating-point weight distribution that network " equivalent layer obtains, the point in x-axis represent being averaged for the quantized value of this layer of all filters.Wherein " mse " indicates being averaged for the Relative quantification error of this layer of each filter, and the Relative quantification error of each filter is defined asWeight distribution figure comparison from above and below is it can be seen that the present invention can be distributed to avoid the long-tail of weight, simultaneously Smaller Relative quantification error " mse " can be obtained, therefore quantitative model can obtain better performance.

Claims

1. one kind is based on the normalized deep neural network compression method of weight, it is characterised in that: in neural network forward-propagating In the process, weight is normalized first, is then based on minimum quantization error quantization weight, then carry out what renormalization was quantified Weight carries out the forward-propagating of neural network using the weight of quantization；In back-propagation process, to the quantization letter of step form Several derivatives carries out approximation, so that neural network can be reversed propagation, to be trained end to end, gradient is accumulated in floating-point In weight.

2. being based on the normalized deep neural network compression method of weight as described in claim 1, which is characterized in that in forward direction In communication process, weight is normalized first, is then based on minimum quantization error quantization weight, then carry out the renormalization amount of obtaining The weight of change, specific steps are as follows:

Step 100, the parameter for obtaining the full precision model of pre-training, to each filters of all convolutional layers and full articulamentum Parameter vector obtains w ∈ R^M；

Step 101, parameter w is normalized, each element in w is normalized to using the element of the maximum value of w [- 1,1], i.e.,

Step 102, based on pairMinimum quantization error solve optimal quantization base α, obtain corresponding quantized value set V (α)；

Step 103, to the weight after normalizationQuantization obtainsI.e.Projection function (quantization function) Π ()In each element project to the set V (α) of quantized value；

Step 104, to the normalized weight of quantizationCarry out the weight w that renormalization is quantified^q, it is therefore an objective to keep quantization Weight and initial parameter w have identical magnitude, i.e.,Detach () operation handlebar is therein Variable is considered as constant；

Step 105, the quantization weight w by obtaining^qConvolution operation (full attended operation) is carried out with the input x of neural network this layer Obtain the output y of this layer of neural network.

3. being based on the normalized deep neural network compression method of weight as described in claim 1, it is characterised in that: reversed In communication process, approximation is carried out to the derivative of the quantization function of step form, so that neural network can be reversed propagation, it is specific to walk Suddenly are as follows:

Step 201, according to the backpropagation of gradient, the normalized weight to quantization is obtainedGradient；

Step 202, approximation is carried out to the gradient of phase step type function Π (), i.e.,It obtains to normalized weightLadder Degree；

Step 204, using gradientUpdate floating-point weight w.

4. being based on the normalized deep neural network compression method of weight as described in claim 1, which is characterized in that described Based on pairMinimum quantization error solves the optimization aim of optimal quantization base α are as follows:

Wherein, M is the dimension of weight vectors, and K is the bit number of quantization, and B isBinary-coding, α be quantization base；α is defined The set V (α), V (α)={ α that all quantized values are constituted^Te_l|1≤l≤2^K, wherein e_l∈{-1,1}^KEnumerate all K bits Binary-coding；Using the above-mentioned target of thought Optimization Solution of alternative optimization, B is first fixed, calculates the optimal solution of α, at this time problem Recurrence problem solving can be converted, new α is obtained；α is fixed again, calculates the optimal solution of B, problem is converted into On The Projection at this time, That is handleIn each element project to set V (α), obtain corresponding binary-coding, as new B；Continuous iteration is above-mentioned Process, until convergence.

5. being based on the normalized deep neural network compression method of weight, the bulk flow of model training as described in claim 1 Journey are as follows: the parameter w that each filter parameter w in all convolutional layers of neural network and full articulamentum is quantified^q, make Use w^qCarry out neural network forward-propagating；Loss function L is calculated by forward-propagating, then carries out the backpropagation of gradient, Approximation wherein is carried out to the derivative of phase step type function Π (), finally obtains the gradient to floating point parameters w, updates floating point parameters w, Training needs iteration repeatedly until convergence.Final model need to only save the normalized weight of quantizationCorresponding binary-coding With quantization base for predicting, without saving floating point parameters w.