CN108805257A

CN108805257A - A kind of neural network quantization method based on parameter norm

Info

Publication number: CN108805257A
Application number: CN201810387893.8A
Authority: CN
Inventors: 田永鸿; 燕肇; 燕肇一; 史业民; 王耀威
Original assignee: Peking University
Current assignee: Peking University
Priority date: 2018-04-26
Filing date: 2018-04-26
Publication date: 2018-11-13

Abstract

The present invention provides a kind of neural network quantization method based on parameter norm, and this method includes：For given pre-training neural network parameter model, is counted by the value of the parameter to required quantization layer, divide quantization center；According to selected quantization center, the quantization loss of the parameter of each corresponding quantization layer is calculated；Quantization loss is added with the loss of the error in classification of Neural Network Training Parameter model, as total losses, and carries out backpropagation optimization, while quantifying center and being also updated in optimization；To the end of training, quantization operation, the compact model after being quantified are carried out to equivalent layer according to quantization center.Method provided by the invention can divide weight center, and by applying simple quantization loss, using optimizer identical with conventional method, neural network model is quantified, to obtain the compact model of archetype, reduce network storage volume and computational complexity.

Description

Neural network quantification method based on parameter norm

Technical Field

The invention relates to the field of neural networks, in particular to a neural network quantification method based on a parameter norm.

Background

As early as the end of the last century, Yann LeCun et al have successfully identified handwritten zip codes on mail using neural networks. In recent years, different neural network structures are layered endlessly, a good effect far exceeding that of a traditional algorithm is obtained, a huge breakthrough is made in the fields of computer vision, voice processing, recommendation systems and the like, and the neural network structure is widely applied to the industries of internet, intelligent equipment, security equipment and the like.

In order to enable the neural network to achieve a better effect, supervised optimization and learning are carried out on network parameters based on a large-scale labeled data set in the training process. Meanwhile, in order to learn data more comprehensively, the corresponding neural network structure is also developing towards high capacity and high complexity. However, as the number of layers and the number of parameters of the neural network increase, the operation time and the storage cost increase greatly, so that training and deployment of the existing neural network need to depend on a large-scale server cluster. This is a difficult method for mobile devices and wearable devices under the mobile internet.

Some effective algorithms have been proposed for the compression problem of neural networks. One of the well-known quantization methods is to quantize the high-precision floating-point parameters into a low-precision representation or to retain only a small number of high-precision quantization centers. However, most of these algorithms are based on conventional average quantization or clustering quantization algorithms, for example, patent of the invention with the grant number of CN105184362B obtains quantization codebooks through K-means clustering. However, these methods are not optimized by combining the characteristics of the neural network, and only quantification is performed from the mathematical and statistical aspects. Therefore, the result after quantization often has a large accuracy reduction relative to the original model, and is difficult to use in practical application scenarios. How to combine the quantization operation in the training process of the neural network becomes a new research direction.

Disclosure of Invention

In order to solve the problems, a neural network quantization algorithm based on parameter norm is provided to realize the combination of neural network training and quantization and overcome the problem of low accuracy after quantization.

The invention provides a neural network quantification method based on a parameter norm, which comprises the following steps:

for a given pre-training neural network parameter model, dividing a quantization center by counting the values of the parameters of the required quantization layer;

calculating a quantization loss of the parameter for each quantization layer based on the selected quantization center;

adding the quantitative loss and the loss of the pre-trained neural network parameter model to obtain a total loss, and performing back propagation optimization;

judging whether the training requirement is met, if so, entering the next step, and if not, updating the quantization center;

and carrying out quantization operation on the corresponding layer according to the quantization center to obtain a quantized compression model.

By automatically dividing the quantization centers, the burden of manual analysis can be reduced as much as possible, and different network parameters can be better adapted. By calculating the quantization loss, the difference between the current network parameter and the quantization target can be obtained and used as one of the evaluation and optimization items of the quantization degree. Through the combination of the quantization loss and the training loss, the neural network can maintain high accuracy while quantizing. During training, the quantization center is continuously updated along with the transformation of the parameters, so that the error in the training process is reduced. After the quantization training is finished, the network weight parameters are quantized through the obtained quantization area and the center, and the purpose of model compression is achieved.

Optionally, the dividing of the quantization centers comprises the steps of:

for a neural network training model, a quantization series parameter n is given to the corresponding layer l to be quantized_lI.e. the number of quantization centers;

weighting parameter w of l layer_lMaking statistics to obtain the corresponding maximum value max (w)_l) And minimum value min (w)_l)；

And obtaining different quantization centers and regions according to the maximum and minimum values.

Optionally, the specific formula of the different quantization centers and regions is as follows:

l denotes the l network weight parameter layer, d_lRepresenting the spacing of adjacent quantization centers or regions,the end points of the different quantized regions are represented,indicates the range of the i-th quantization region,represents the value of the ith quantization center, i being an integer.

Optionally, the calculation of the loss comprises the steps of:

parameter w for l layers_lAnd calculating the parameter norm loss according to the quantization region and the quantization center where the parameter norm loss is located.

Alternatively,is w_lThe jth weight parameter in (1) is found by comparing the rangesThe quantization area is located to enable the following formula to be established;

computingL of₁Loss or L₂Loss:

after summing, L of the L layer is obtained₁Loss or L₂Loss:

m is the number of all weights for that layer.

Optionally, the optimization operation comprises the steps of:

selecting t samples { x ] from a neural network training set⁽¹⁾,x⁽²⁾,…,x^(t)Is used for learning optimization of the network, wherein x⁽ⁱ⁾The corresponding target is y⁽ⁱ⁾；

Calculating a classification error loss L for a neural network_cCalculating the total quantization loss L_qAdding the total loss L according to a certain proportion to obtain a formula:

wherein, theta is weight parameter of the neural network model, f (x)⁽ⁱ⁾(ii) a Theta) is a neural netOutput of the collaterals, L_CEalpha represents a quantization loss scale factor for a cross entropy loss function;

gradient was calculated using the chain rule:

updating the weight parameters in sequence:

θ＝θ-εg

where ε is the learning rate of the optimizer.

Optionally, the quantization center update comprises the following steps:

parameter w for the l-th layer_lCalculating the mean value of the parameters according to the quantization area where the parameter is located;

then after one optimization training, the mean of the ith quantization region is:

wherein,as a weight parameter in the ith quantization region,the number of weight parameters for the ith quantization region,the end points of the different quantized regions are represented,represents the value of the ith quantization center, i being an integer.

Optionally, the update to the ith quantization center uses the following formula:

beta is a quantization center update rate parameter.

Optionally, the quantization operation comprises the steps of:

and carrying out quantization operation on the quantization result of the parameter of each layer in sequence according to the quantization region and the quantization center where the quantization result of the parameter of each layer is located.

Optionally, the formula of the quantization operation is as follows:

as a result of the quantization of the parameter of the l-th layer,a value representing the ith quantization center,is w_lThe jth weight parameter of (a),the end points of the different quantized regions are represented,indicates the range of the i-th quantization region.

The invention has the advantages that:

by adding the layer quantization loss of the network based on the parameter norm, the neural network can train and optimize parameters in the quantization process, and the accuracy loss caused by quantization is reduced. Specifically, the method comprises the following steps: by automatically dividing the quantization centers, the burden of manual analysis can be reduced as much as possible, and different network parameters can be better adapted. By calculating the quantization loss, the difference between the current network parameter and the quantization target can be obtained and used as one of the evaluation and optimization items of the quantization degree. Through the combination of the quantization loss and the training loss, the neural network can maintain high accuracy while quantizing. During training, the quantization center is continuously updated along with the transformation of the parameters, so that the error in the training process is reduced. After the quantization training is finished, the network weight parameters are quantized through the obtained quantization area and the center, and the purpose of model compression is achieved. In summary, the method of the present invention can reduce the decrease of the accuracy rate while quantifying the network.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the specific embodiments. The drawings are only for purposes of illustrating the particular embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:

FIG. 1 illustrates an exemplary flow diagram of a neural network quantization algorithm in accordance with an embodiment of the present invention;

fig. 2 is a network structure diagram illustrating a neural network quantization algorithm according to an embodiment of the present invention.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

The preferred embodiments of the present invention will be described below with reference to the accompanying drawings of the specification, it being understood that the preferred embodiments described herein are merely for illustrating and explaining the present invention, and are not intended to limit the present invention, and that the embodiments and features of the embodiments in the present invention may be combined with each other without conflict.

The embodiment of the invention provides a neural network quantization algorithm based on a parameter norm based on a neural network, and a model for quantization compression of the neural network provided by the embodiment of the invention is shown in fig. 1: by automatically dividing the quantization centers, the burden of manual analysis can be reduced as much as possible, and different network parameters can be better adapted. By calculating the quantization loss, the difference between the current network parameter and the quantization target can be obtained and used as one of the evaluation and optimization items of the quantization degree. Through the combination of the quantization loss and the training loss, the neural network can maintain high accuracy while quantizing. During training, the quantization center is continuously updated along with the transformation of the parameters, so that the error in the training process is reduced. After the quantization training is finished, the network weight parameters are quantized through the obtained quantization area and the center, and the purpose of model compression is achieved.

The neural network quantization method in the embodiment of the present invention is described in detail below.

As shown in fig. 1, an exemplary flowchart of a neural network quantization method in an embodiment of the present invention is shown, where the method includes the following steps:

step S101: according to the input neural network initial model, a quantization series parameter n is given to the layer l to be quantized in sequence_lI.e. the number of quantization centers. In the embodiment of the present invention, the tested network model is Alexnet, and the structure thereof is shown in fig. 2. Weighting parameter w of l layer_lMaking statistics to obtain corresponding maximum value max(w_l) And minimum value min (w)_l). According to the maximum value and the minimum value, linear division is carried out to obtain different quantization centersAnd a quantization region

The specific formula is as follows:

Step S102: parameter w for l layers_lAnd calculating the parameter norm loss according to the quantization region and the quantization center where the parameter norm loss is located.Is w_lIn (1)The jth weight parameter is found by comparing the rangesIn the quantization region i, calculatingL of₁Loss or L₂Loss, after summing, to obtain L of the L-th layer₁Loss or L₂And (4) loss. The classification error is obtained by calculating the cross entropy loss of the label of the network output and input data.

The specific formula is as follows:

computingL of₁Loss or L₂Loss:

after summing, L of the L layer is obtained₁Loss or L₂Loss:

m is the number of all weights for that layer.

Step S103: and updating and optimizing network parameters by using a back propagation algorithm in the field of neural networks according to the calculated classification error loss and layer quantization loss. And then, according to the conditions of whether the requirement of quantization precision is met or not, whether the preset training times are met or not and the like, judging whether the training is continued or is finished, and respectively skipping to the steps S201 and S104. If the training requirement is not met, the process goes to step S201. If the training requirement is met, the process jumps to step S104.

The optimization operation comprises the following steps:

selecting t samples { x ] from a neural network training set⁽¹⁾，x⁽²⁾，...，x^(t)Is used for learning optimization of the network, wherein x⁽ⁱ⁾The corresponding target is y⁽ⁱ⁾；

wherein theta is a weight parameter of the neural network model, f (x (i); theta) is an output of the neural network, and L_CEalpha represents a quantization loss scale factor for a cross entropy loss function;

gradient was calculated using the chain rule:

updating the weight parameters in sequence:

θ＝θ-εg

where ε is the learning rate of the optimizer.

Step S104: and after the quantization is finished, obtaining the training optimization parameters of the quantization layer and the quantization centers updated for multiple times. And carrying out quantization operation in sequence according to the quantization region and the quantization center of the parameter, so that the quantization value of the parameter is equal to the quantization center of the quantization region of the parameter.

The quantization operation comprises the steps of:

q_lis the parameter w of the l-th layer_lAccording to the quantization region and the quantization center, sequentially performing quantization operation:

i.e. the parameter quantization value is equal to the quantization center value of the quantization region in which the parameter value is located.

Step S201: parameter w for the l-th layer_lAccording to the quantization area where the network weight parameter is located, average value calculation of the parameter is carried out to obtain the average value of each quantization area after the network weight parameter is updatedAnd updating the ith quantization center by combining the quantization center value before updating. After that, the process goes to step S102.

The quantization center update comprises the following steps:

wherein,as a weight parameter in the ith quantization region,the number of weight parameters for the ith quantization region,the end points of the different quantized regions are represented,a value representing the ith quantization center, i being an integer;

the update to the ith quantization center uses the following formula:

beta is a quantization center update rate parameter.

According to the invention, through adding the network layer quantization loss based on the parameter norm, the neural network can train and optimize parameters in the quantization process, and the accuracy loss caused by quantization is reduced. Specifically, the method comprises the following steps: by automatically dividing the quantization centers, the burden of manual analysis can be reduced as much as possible, and different network parameters can be better adapted. By calculating the quantization loss, the difference between the current network parameter and the quantization target can be obtained and used as one of the evaluation and optimization items of the quantization degree. Through the combination of the quantization loss and the training loss, the neural network can maintain high accuracy while quantizing. During training, the quantization center is continuously updated along with the transformation of the parameters, so that the error in the training process is reduced. After the quantization training is finished, the network weight parameters are quantized through the obtained quantization area and the center, and the purpose of model compression is achieved. In summary, the method of the present invention can reduce the decrease of the accuracy rate while quantifying the network.

In summary, the steps in the embodiments of the present invention are described. Therefore, the method provided by the embodiment of the invention can combine training optimization and parameter quantization and improve the accuracy of the quantized network.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. A method for neural network quantization based on parameter norms, the method comprising:

adding the quantization loss and the classification error loss of the pre-training neural network parameter model to obtain a total loss, and performing back propagation optimization;

2. The neural network quantization method of claim 1, wherein said dividing the quantization centers comprises the steps of:

for a neural network model, sequentially giving a quantization progression parameter, namely a quantization center number, to each corresponding layer to be quantized;

counting the weight parameters of each corresponding layer to obtain the maximum value and the minimum value of the weight parameters;

and obtaining different quantization centers and regions according to the maximum value and the minimum value.

3. The neural network quantization method of claim 2, wherein the different quantization centers and regions are specified by the following formula:

wherein, l represents the ith network weight parameter layer, w_lDenotes a weight parameter, max (w)_l) Denotes the maximum value, min (w)_l) Denotes the minimum value, d_lRepresenting the spacing of adjacent quantization centers or regions,to express different quantitiesThe end points of the chemo-area,indicates the range of the i-th quantization region,represents the value of the ith quantization center, where i is an integer.

4. The neural network quantization method of claim 1, wherein said calculating a quantization loss for the parameters of each quantization layer comprises the steps of:

and for the parameters of the layer needing to be quantized, calculating the parameter norm loss according to the quantization region and the quantization center where the parameters are located.

5. The neural network quantization method of claim 3 or 4, wherein the range of the quantization region is compared to find the range of the quantization regionThe quantization area in which the data is located,is w_lThe jth weight parameter of (1), the following formula holds:

computingL of₁Loss or L₂Loss:

after summing, L of the L layer is obtained₁Loss or L₂Loss:

where m is the number of all weights for that layer.

6. The neural network quantization method of claim 1, wherein said back propagation optimization operation comprises the steps of:

selecting a plurality of samples from the data set of the pre-trained neural network parameter model;

calculating the classification error loss of the pre-training neural network parameter model, calculating the total quantization loss, and adding the classification error loss and the total quantization loss according to a certain proportion to obtain the total loss;

calculating the gradient using a chain rule;

and updating the weight parameters in sequence.

7. A neural network quantization method according to claim 1 or 3, wherein said quantization centre update comprises the steps of:

calculating the mean value of the parameters of the corresponding quantization layer according to the quantization region where the parameters are located;

after one optimization training, the mean of the ith quantization region is obtained as:

wherein,as a weight parameter in the ith quantization region,the number of weight parameters for the ith quantization region.

8. The neural network quantization method of claim 7,

the following formula is used for the ith quantization center update:

wherein,represents the mean value of the i-th quantization region,represents the value of the ith quantization center, i being an integer and β being a quantization center update rate parameter.

9. The neural network quantization method of claim 1, wherein said quantization operation comprises the steps of:

10. The neural network quantization method of claim 9, wherein the quantization operation is formulated as follows: