CN115936099A

CN115936099A - Weight compression and integration standard pruning method for neural network

Info

Publication number: CN115936099A
Application number: CN202211590899.8A
Authority: CN
Inventors: 陈小柏; 孙杰克
Original assignee: Nanjing University of Posts and Telecommunications
Current assignee: Nanjing University of Posts and Telecommunications
Priority date: 2022-12-12
Filing date: 2022-12-12
Publication date: 2023-04-07

Abstract

A weight compression and integration standard pruning method of a neural network is provided, wherein for unstructured pruning, aiming at the defect that a traditional one-time pruning obtained compression network is difficult to recover due to the fact that a pruning threshold value is set to be too large, a weight compression pruning method is provided, weights are compressed and then pruned for many times through setting a lower threshold value, and a higher pruning proportion is obtained than that of a traditional scheme under the condition that the precision is effectively kept. For structured pruning, aiming at the defect that one-to-one convolution kernel importance measurement standard is used and one-sided influence is possibly generated on a specific task, a pruning scheme of an integration standard is provided, and final integration standard is used for pruning by combining multiple importance measurement standards.

Description

Weight compression and integration standard pruning method for neural network

Technical Field

The invention relates to the field of compression of deep neural networks, in particular to a pruning compression method of a convolutional neural network.

Background

At present, with the development of a deep convolutional neural network, the calculated amount and scale of a model become more huge, and the deployment of the model on resource-limited terminal equipment is restricted to a great extent. Structured and unstructured pruning are two categories of pruning that differ in the granularity of pruning. The pruning granularity of the structured pruning is relatively coarse, and the pruning is carried out by taking a convolution kernel as a unit, which is also called channel pruning; the pruning granularity of the unstructured pruning is finer, and the pruning is carried out by taking the weight as a unit.

In the traditional structured pruning, a certain standard needs to be set to measure the importance of a channel, the paper "Learning efficiency Networks through Network Networks slim" selects a scaling factor γ of a BN layer (batch normalization layer) as an importance measurement standard, and also selects the sum of absolute values of weights of convolution kernels, i.e., the norm of a convolution kernel L1, to delete unimportant convolution kernels, i.e., the channel, and finally fine-tunes a training Network. However, selecting a single metric may have a one-sided impact on different tasks, affecting the final pruning result.

In the traditional unstructured pruning, a pruning threshold needs to be set, then the weight lower than the threshold is pruned, and then the training network is fine-tuned again to achieve the purpose of pruning. The paper "Deep Compression: compressing Deep Neural Networks with Pruning, trained Quantization and Huffman Coding" is the method, which usually obtains the final network through one Pruning, however, if the threshold is set too large, some important weights may be deleted, and if the threshold is set too small, the network may not reach a good Pruning proportion, so the method can only find the best result by continuously setting the threshold.

Disclosure of Invention

The invention aims to improve the unstructured pruning and the structured pruning respectively to obtain a weight compression and integration standard pruning method of the neural network aiming at the defects of the traditional pruning method of the neural network. For unstructured pruning, aiming at the defect that precision is difficult to recover due to the fact that a traditional one-time pruning compression network is obtained and a pruning threshold value is set to be too large, a weight compression pruning method is provided, and through setting a lower threshold value, weight is compressed and then pruned for multiple times, so that a higher pruning proportion is obtained compared with a traditional scheme under the condition that precision is effectively kept. For structured pruning, aiming at the defect that one-to-one convolution kernel importance measurement standard is used and one-sided influence is possibly generated on specific tasks, a pruning scheme of an integration standard is provided, and the final integration standard is used for pruning by combining multiple importance measurement standards

A weight compression and integration standard pruning method of a neural network comprises a weight compression pruning method for unstructured pruning and an integration standard pruning method for structured pruning;

the weight compression pruning method comprises the following steps:

step 1): sparsely training an original neural network, and setting a pruning threshold;

step 2): pruning is carried out according to a threshold value;

step 3): training the network fine adjustment;

and step 4): compressing the weight by 10%;

and step 5): repeating the steps 2 to 4 until an ideal target network is reached;

an integrated standard pruning method comprising the steps of:

step 1): measuring the raw detection accuracy mAP of a neural network ₀ ；

Step 2): based on two measurement standards of the sum L1 of the absolute values of the weights of the convolution kernels and the scaling factor gamma, pruning and fine tuning training are respectively carried out to obtain corresponding detection precision, and the corresponding importance standard delta mAP is obtained through processing _L1 And Δ mAP _γ And proportional value thereof

And λ _γ ：

Step 3): sparsely training an original neural network;

step 4): calculating the sum L1 of absolute values of convolution kernels in all layers to be pruned, and then calculating the average value aveL1;

step 5): collecting the scaling factor gamma of each channel;

step 6): by passing

λ _γ aveL1 and gamma calculatedTo importance measure Final;

step 7): judging the importance of the channel by using Final importance measurement standard, and pruning the network;

step 8): and fine-tuning the training network to obtain a final pruning network.

Compared with the prior art, the technical scheme adopted by the invention has the following technical effects:

(1) Compared with a one-time pruning method, the weight compression pruning is performed in a mode of compressing the weight in a small scale each time, the pruning proportion is slowly increased, and when the pruning proportion is the same as or higher than that of the one-time pruning, the accuracy mAP of a model obtained by the weight compression pruning is higher than that of the one-time pruning, and is higher than that of the one-time pruning by about 1 percent on average.

(2) Under the same pruning proportion, the precision can be recovered to be higher after network fine adjustment by using an integrated standard pruning method; the pruning ratio using the integrated standard pruning method is higher while maintaining equal accuracy.

(3) In conclusion, the experimental result can judge that compared with the traditional pruning method, the pruning method has the characteristics of larger pruning proportion and higher model precision recovery after pruning.

Drawings

Fig. 1 is a flowchart of weight compression pruning in an embodiment of the present invention.

Fig. 2 is a schematic diagram of the computed Final metric in the integrated standard pruning of an embodiment of the present invention in which the convolution layer has 4 convolution kernels, each convolution kernel having a size of 3 x 3.

Fig. 3 is a flow chart of integrating standard pruning in an embodiment of the present invention.

FIG. 4 is a table of experimental results of one-time pruning and weight-compression pruning of YOLOv3 and YOLOv3-tiny on COCO data sets in an embodiment of the present invention.

Fig. 5 is a table of the results of various pruning criteria experiments on the VOC data set for YOLOv3 in an example of the present invention.

FIG. 6 is a table of the results of various pruning criteria for YOLOv3 on an oxfordHand dataset according to an embodiment of the present invention.

Detailed Description

The technical scheme of the invention is further explained in detail by combining the drawings in the specification.

Referring to fig. 1, the present embodiment provides an unstructured pruning method for neural network weight compression multiple pruning, which includes the following specific steps in combination with the accompanying drawings:

step 1): and sparsely training an original neural network, and setting a pruning threshold.

Wherein the sparse training comprises adding weight w to the loss function _i L2 regularization term of

Pre-training is carried out on a data set, most of the weights in the trained network are biased to be close to 0, so that the network has sparsity, and therefore sparse training is called.

The pruning threshold is set to 0.01, i.e. weights with absolute values less than 0.01 are to be pruned.

Step 2): pruning is carried out according to a threshold value.

And step 3): and carrying out fine tuning training on the network.

The fine tuning training is training on the original data set, and is training performed again to restore the accuracy of the model lost by pruning.

And step 4): the weights of the neural network are all compressed by 10%.

Step 5): and repeating the steps 2 to 4 until the ideal target network is reached.

The ideal definition is that after the last compression pruning, if the model precision is difficult to recover to the initial value (the difference with the original precision is more than 1%), the pruning can be stopped, and the model without the last pruning is an ideal target network.

Referring to fig. 2 and fig. 3, the present embodiment provides a structured pruning method for integrated metrics of a neural network, which includes the following specific steps in combination with the accompanying drawings:

step 1): measuring the raw detection accuracy mAP of a neural network ₀ 。

The mAP is an evaluation index for judging the accuracy of a target detection neural network, the target detection is to give a picture to the neural network, the neural network frames an object in the picture by using a rectangular boundary frame, and classifies and identifies the object, and if a plurality of objects exist in the picture, a plurality of frames need to be framed. The value of the mAP is therefore related to the accuracy of the location frame, whether the object is accurately framed with a frame of appropriate size and location, and the recognition accuracy, whether the object in the frame is correctly recognized. The value range of the mAP is 0% -100%, and the higher the value is, the higher the accuracy of the neural network is.

Step 2): for two metrics, namely the sum L1 of the absolute values of the weights of the convolution kernels and a scaling factor gamma, the following operations are carried out:

step 2.1): using the L1 measurement standard of the sum of the absolute values of the weights of the convolution kernels, pruning 50% of the neural network, carrying out fine tuning training for 100 generations, and measuring the detection accuracy mAP of the neural network at the moment _L1 。

The metric is to obtain the sum of the absolute values of all weights in each convolution kernel, and the convolution kernel with a larger value is more important, otherwise, the value is relatively unimportant. The smaller convolution kernel of the first 50% of this value is then pruned.

Step 2.2): using a scaling factor gamma measurement standard, pruning 50% of the neural network, carrying out fine tuning training for 100 generations, and measuring the detection precision mAP of the neural network at the moment _γ 。

Like the L1 metric of the convolution kernel, γ is a parameter, the convolution kernel with the larger γ is more important, otherwise, it is less important, and the first 50% of the convolution kernels with the smaller γ are cut out.

Step 2.3): let initial precision mAP ₀ Respectively making quotient with the precision after 2 kinds of pruning to obtain delta mAP _L1 And Δ mAP _γ As shown in

equations

1 and 2.

ΔmAP _L1 ＝mAP ₀ /mAP _L1 (1)

ΔmAP _γ ＝mAP ₀ /mAP _γ (2)

Step 2.4): according to Δ mAP _L1 And Δ mAP _γ The two importance criteria are calculated to be finally importantProportion of sexual standard Final

And λ _γ As shown in

equations

3 and 4. />

λ _γ ＝ΔmAP _γ /(ΔmAP _L1 +ΔmAP _γ ) (4)

And step 3): and sparsely training the original neural network. (similar to step 1 in the unstructured pruning above)

Step 4): the sum L1 of the absolute values of the convolution kernels in all the layers to be pruned is calculated, and then the average value aveL1 (sum of absolute values of convolution kernels L1/number of weights in convolution kernels) is calculated.

The sum of the absolute values of the weights in each convolution kernel has already been found in step 2.1, where the average value is obtained by directly dividing by the number of weights.

Step 5): the scaling factor gamma for each channel is collected.

Step 6): by passing

λ _γ aveL1 and gamma are calculated to obtain the importance metric Final, as shown in equation 5.

Step 7): and judging the importance of the channel by using Final importance measurement standard, and pruning the network.

The channel with the larger Final value (convolution kernel) is important. And setting a pruning threshold, and pruning channels with Final values smaller than the threshold. The threshold value is set according to the actual situation, generally, a plurality of threshold values are set for carrying out a plurality of experiments, and after pruning is found, the one-time experiment result with the best precision recovery and the maximum pruning rate is obtained through fine tuning training.

Step 8): and (4) fine-tuning the training network (similar to the step 3 in the unstructured pruning process) to obtain a final pruning network.

Fig. 4 shows the experimental results of selecting two networks YOLOv3 and YOLOv3-tiny to perform one-time pruning and weight compression pruning on the COCO data set. According to the data analysis, compared with a one-time pruning method, the weight compression pruning gradually increases the pruning proportion in a small-scale weight compression mode each time, and when the pruning proportion is the same as or higher than that of the one-time pruning, the accuracy mAP of a model obtained through the weight compression pruning is higher than that of the one-time pruning by about 1% on average.

Fig. 5 and 6 are results of experiments on VOC datasets and oxford datasets selecting YOLOv3 networks to validate the integrated standard pruning method. Analyzing the data in fig. 5, and observing the experimental results of the

groups

2, 3, and 5, it can be known that the precision can be recovered higher after network fine tuning by using the integrated standard pruning method under the same pruning proportion; the results of the 3 and 6 groups are observed to show that the pruning ratio is 1.5% higher by using the integrated standard pruning method under the condition of keeping equal precision, and the data conclusion of the figure 6 is also the same.

In conclusion, the experimental results can judge that compared with the traditional pruning method, the pruning method has the characteristics of larger pruning proportion, higher model precision recovery after pruning and the like.

The above description is only a preferred embodiment of the present invention, and the scope of the present invention is not limited to the above embodiment, but equivalent modifications or changes made by those skilled in the art according to the present disclosure should be included in the scope of the present invention as set forth in the appended claims.

Claims

1. A weight compression and integrated standard pruning method for a neural network is characterized by comprising the following steps:

the method comprises a weight compression pruning method for unstructured pruning and an integrated standard pruning method for structured pruning;

the weight compression pruning method comprises the following steps:

step 2): pruning is carried out according to a threshold value;

step 3): training the network fine adjustment;

step 4): compressing the weight by 10%;

an integrated standard pruning method comprising the steps of:

step 1): measuring the raw detection accuracy mAP of a neural network ₀ ；

And λ _γ ：

Step 3): sparsely training an original neural network;

step 5): collecting the scaling factor gamma of each channel;

step 6): by passing

λ _γ Calculating aveL1 and gamma to obtain an importance measurement standard Final;

step 7): judging the importance of the channel by using a Final importance measurement standard, and pruning the network;

2. The weight compression and integration standard pruning method for the neural network according to claim 1, characterized in that: in the step 1) of the weight compression pruning method, the sparse training is to add the weight w into the loss function _i L2 regularization term of

Pre-training is carried out on a data set, and the weight in the trained network is biased to be near 0, so that the network has sparsity; the pruning threshold is set to 0.01, i.e. weights with absolute values less than 0.01 are to be pruned.

3. The weight compression and integration standard pruning method for the neural network according to claim 1, characterized in that: in step 5) of the weight compression pruning method, after the last compression pruning, if the difference between the model precision and the original precision exceeds more than 1%, the pruning is stopped, and the model which is not subjected to the last pruning is the ideal target network.

4. The weight compression and integration standard pruning method for the neural network according to claim 1, characterized in that: step 2) of the integrated standard pruning method comprises the following steps:

step 2.1): using the measurement standard of the sum L1 of the absolute values of the weights of the convolution kernels, pruning 50% of the neural network, carrying out fine tuning training for 100 generations, and measuring the detection accuracy mAP of the neural network at the moment _L1 ；

Step 2.2): using a scaling factor gamma measurement standard, pruning 50% of the neural network, carrying out fine tuning training for 100 generations, and measuring the detection precision mAP of the neural network at the moment _γ ；

Step 2.3): let initial precision mAP ₀ Respectively making quotient with the precision after 2 kinds of pruning to obtain delta mAP _L1 And Δ mAP _γ ；

Step 2.4): according to Δ mAP _L1 And Δ mAP _γ Calculating the ratio of the two importance criteria in the Final importance criteria Final

And λ _γ 。

5. According to claimThe weight compression and integration standard pruning method for the neural network according to claim 4, characterized in that: in step 2.3), Δ mAP _L1 And Δ mAP _γ The calculation of (d) is shown in equation 1 and equation 2:

ΔmAP _L1 ＝mAP ₀ /mAP _L1 (1)

ΔmAP _γ ＝mAP ₀ /mAP _γ (2)。

6. the weight compression and integration standard pruning method for the neural network according to claim 5, wherein: in step 2.4), the proportion of the two importance standards in the Final importance standard Final is calculated

And λ _γ As shown in equations 3 and 4:

λ _γ ＝ΔmAP _γ /(ΔmAP _L1 +ΔmAP _γ ) (4)。

7. the weight compression and integration standard pruning method for the neural network according to claim 6, wherein: in step 6) of the integration standard pruning method, the importance measure Final is calculated as shown in equation 5:

/>