CN115936099A - Weight compression and integration standard pruning method for neural network - Google Patents

Weight compression and integration standard pruning method for neural network Download PDF

Info

Publication number
CN115936099A
CN115936099A CN202211590899.8A CN202211590899A CN115936099A CN 115936099 A CN115936099 A CN 115936099A CN 202211590899 A CN202211590899 A CN 202211590899A CN 115936099 A CN115936099 A CN 115936099A
Authority
CN
China
Prior art keywords
pruning
map
neural network
standard
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211590899.8A
Other languages
Chinese (zh)
Inventor
陈小柏
孙杰克
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Posts and Telecommunications filed Critical Nanjing University of Posts and Telecommunications
Priority to CN202211590899.8A priority Critical patent/CN115936099A/en
Publication of CN115936099A publication Critical patent/CN115936099A/en
Pending legal-status Critical Current

Links

Images

Abstract

A weight compression and integration standard pruning method of a neural network is provided, wherein for unstructured pruning, aiming at the defect that a traditional one-time pruning obtained compression network is difficult to recover due to the fact that a pruning threshold value is set to be too large, a weight compression pruning method is provided, weights are compressed and then pruned for many times through setting a lower threshold value, and a higher pruning proportion is obtained than that of a traditional scheme under the condition that the precision is effectively kept. For structured pruning, aiming at the defect that one-to-one convolution kernel importance measurement standard is used and one-sided influence is possibly generated on a specific task, a pruning scheme of an integration standard is provided, and final integration standard is used for pruning by combining multiple importance measurement standards.

Description

Weight compression and integration standard pruning method for neural network
Technical Field
The invention relates to the field of compression of deep neural networks, in particular to a pruning compression method of a convolutional neural network.
Background
At present, with the development of a deep convolutional neural network, the calculated amount and scale of a model become more huge, and the deployment of the model on resource-limited terminal equipment is restricted to a great extent. Structured and unstructured pruning are two categories of pruning that differ in the granularity of pruning. The pruning granularity of the structured pruning is relatively coarse, and the pruning is carried out by taking a convolution kernel as a unit, which is also called channel pruning; the pruning granularity of the unstructured pruning is finer, and the pruning is carried out by taking the weight as a unit.
In the traditional structured pruning, a certain standard needs to be set to measure the importance of a channel, the paper "Learning efficiency Networks through Network Networks slim" selects a scaling factor γ of a BN layer (batch normalization layer) as an importance measurement standard, and also selects the sum of absolute values of weights of convolution kernels, i.e., the norm of a convolution kernel L1, to delete unimportant convolution kernels, i.e., the channel, and finally fine-tunes a training Network. However, selecting a single metric may have a one-sided impact on different tasks, affecting the final pruning result.
In the traditional unstructured pruning, a pruning threshold needs to be set, then the weight lower than the threshold is pruned, and then the training network is fine-tuned again to achieve the purpose of pruning. The paper "Deep Compression: compressing Deep Neural Networks with Pruning, trained Quantization and Huffman Coding" is the method, which usually obtains the final network through one Pruning, however, if the threshold is set too large, some important weights may be deleted, and if the threshold is set too small, the network may not reach a good Pruning proportion, so the method can only find the best result by continuously setting the threshold.
Disclosure of Invention
The invention aims to improve the unstructured pruning and the structured pruning respectively to obtain a weight compression and integration standard pruning method of the neural network aiming at the defects of the traditional pruning method of the neural network. For unstructured pruning, aiming at the defect that precision is difficult to recover due to the fact that a traditional one-time pruning compression network is obtained and a pruning threshold value is set to be too large, a weight compression pruning method is provided, and through setting a lower threshold value, weight is compressed and then pruned for multiple times, so that a higher pruning proportion is obtained compared with a traditional scheme under the condition that precision is effectively kept. For structured pruning, aiming at the defect that one-to-one convolution kernel importance measurement standard is used and one-sided influence is possibly generated on specific tasks, a pruning scheme of an integration standard is provided, and the final integration standard is used for pruning by combining multiple importance measurement standards
A weight compression and integration standard pruning method of a neural network comprises a weight compression pruning method for unstructured pruning and an integration standard pruning method for structured pruning;
the weight compression pruning method comprises the following steps:
step 1): sparsely training an original neural network, and setting a pruning threshold;
step 2): pruning is carried out according to a threshold value;
step 3): training the network fine adjustment;
and step 4): compressing the weight by 10%;
and step 5): repeating the steps 2 to 4 until an ideal target network is reached;
an integrated standard pruning method comprising the steps of:
step 1): measuring the raw detection accuracy mAP of a neural network 0
Step 2): based on two measurement standards of the sum L1 of the absolute values of the weights of the convolution kernels and the scaling factor gamma, pruning and fine tuning training are respectively carried out to obtain corresponding detection precision, and the corresponding importance standard delta mAP is obtained through processing L1 And Δ mAP γ And proportional value thereof
Figure SMS_1
And λ γ
Step 3): sparsely training an original neural network;
step 4): calculating the sum L1 of absolute values of convolution kernels in all layers to be pruned, and then calculating the average value aveL1;
step 5): collecting the scaling factor gamma of each channel;
step 6): by passing
Figure SMS_2
λ γ aveL1 and gamma calculatedTo importance measure Final;
step 7): judging the importance of the channel by using Final importance measurement standard, and pruning the network;
step 8): and fine-tuning the training network to obtain a final pruning network.
Compared with the prior art, the technical scheme adopted by the invention has the following technical effects:
(1) Compared with a one-time pruning method, the weight compression pruning is performed in a mode of compressing the weight in a small scale each time, the pruning proportion is slowly increased, and when the pruning proportion is the same as or higher than that of the one-time pruning, the accuracy mAP of a model obtained by the weight compression pruning is higher than that of the one-time pruning, and is higher than that of the one-time pruning by about 1 percent on average.
(2) Under the same pruning proportion, the precision can be recovered to be higher after network fine adjustment by using an integrated standard pruning method; the pruning ratio using the integrated standard pruning method is higher while maintaining equal accuracy.
(3) In conclusion, the experimental result can judge that compared with the traditional pruning method, the pruning method has the characteristics of larger pruning proportion and higher model precision recovery after pruning.
Drawings
Fig. 1 is a flowchart of weight compression pruning in an embodiment of the present invention.
Fig. 2 is a schematic diagram of the computed Final metric in the integrated standard pruning of an embodiment of the present invention in which the convolution layer has 4 convolution kernels, each convolution kernel having a size of 3 x 3.
Fig. 3 is a flow chart of integrating standard pruning in an embodiment of the present invention.
FIG. 4 is a table of experimental results of one-time pruning and weight-compression pruning of YOLOv3 and YOLOv3-tiny on COCO data sets in an embodiment of the present invention.
Fig. 5 is a table of the results of various pruning criteria experiments on the VOC data set for YOLOv3 in an example of the present invention.
FIG. 6 is a table of the results of various pruning criteria for YOLOv3 on an oxfordHand dataset according to an embodiment of the present invention.
Detailed Description
The technical scheme of the invention is further explained in detail by combining the drawings in the specification.
Referring to fig. 1, the present embodiment provides an unstructured pruning method for neural network weight compression multiple pruning, which includes the following specific steps in combination with the accompanying drawings:
step 1): and sparsely training an original neural network, and setting a pruning threshold.
Wherein the sparse training comprises adding weight w to the loss function i L2 regularization term of
Figure SMS_3
Pre-training is carried out on a data set, most of the weights in the trained network are biased to be close to 0, so that the network has sparsity, and therefore sparse training is called.
The pruning threshold is set to 0.01, i.e. weights with absolute values less than 0.01 are to be pruned.
Step 2): pruning is carried out according to a threshold value.
And step 3): and carrying out fine tuning training on the network.
The fine tuning training is training on the original data set, and is training performed again to restore the accuracy of the model lost by pruning.
And step 4): the weights of the neural network are all compressed by 10%.
Step 5): and repeating the steps 2 to 4 until the ideal target network is reached.
The ideal definition is that after the last compression pruning, if the model precision is difficult to recover to the initial value (the difference with the original precision is more than 1%), the pruning can be stopped, and the model without the last pruning is an ideal target network.
Referring to fig. 2 and fig. 3, the present embodiment provides a structured pruning method for integrated metrics of a neural network, which includes the following specific steps in combination with the accompanying drawings:
step 1): measuring the raw detection accuracy mAP of a neural network 0
The mAP is an evaluation index for judging the accuracy of a target detection neural network, the target detection is to give a picture to the neural network, the neural network frames an object in the picture by using a rectangular boundary frame, and classifies and identifies the object, and if a plurality of objects exist in the picture, a plurality of frames need to be framed. The value of the mAP is therefore related to the accuracy of the location frame, whether the object is accurately framed with a frame of appropriate size and location, and the recognition accuracy, whether the object in the frame is correctly recognized. The value range of the mAP is 0% -100%, and the higher the value is, the higher the accuracy of the neural network is.
Step 2): for two metrics, namely the sum L1 of the absolute values of the weights of the convolution kernels and a scaling factor gamma, the following operations are carried out:
step 2.1): using the L1 measurement standard of the sum of the absolute values of the weights of the convolution kernels, pruning 50% of the neural network, carrying out fine tuning training for 100 generations, and measuring the detection accuracy mAP of the neural network at the moment L1
The metric is to obtain the sum of the absolute values of all weights in each convolution kernel, and the convolution kernel with a larger value is more important, otherwise, the value is relatively unimportant. The smaller convolution kernel of the first 50% of this value is then pruned.
Step 2.2): using a scaling factor gamma measurement standard, pruning 50% of the neural network, carrying out fine tuning training for 100 generations, and measuring the detection precision mAP of the neural network at the moment γ
Like the L1 metric of the convolution kernel, γ is a parameter, the convolution kernel with the larger γ is more important, otherwise, it is less important, and the first 50% of the convolution kernels with the smaller γ are cut out.
Step 2.3): let initial precision mAP 0 Respectively making quotient with the precision after 2 kinds of pruning to obtain delta mAP L1 And Δ mAP γ As shown in equations 1 and 2.
ΔmAP L1 =mAP 0 /mAP L1 (1)
ΔmAP γ =mAP 0 /mAP γ (2)
Step 2.4): according to Δ mAP L1 And Δ mAP γ The two importance criteria are calculated to be finally importantProportion of sexual standard Final
Figure SMS_4
And λ γ As shown in equations 3 and 4. />
Figure SMS_5
λ γ =ΔmAP γ /(ΔmAP L1 +ΔmAP γ ) (4)
And step 3): and sparsely training the original neural network. (similar to step 1 in the unstructured pruning above)
Step 4): the sum L1 of the absolute values of the convolution kernels in all the layers to be pruned is calculated, and then the average value aveL1 (sum of absolute values of convolution kernels L1/number of weights in convolution kernels) is calculated.
The sum of the absolute values of the weights in each convolution kernel has already been found in step 2.1, where the average value is obtained by directly dividing by the number of weights.
Step 5): the scaling factor gamma for each channel is collected.
Step 6): by passing
Figure SMS_6
λ γ aveL1 and gamma are calculated to obtain the importance metric Final, as shown in equation 5.
Figure SMS_7
Step 7): and judging the importance of the channel by using Final importance measurement standard, and pruning the network.
The channel with the larger Final value (convolution kernel) is important. And setting a pruning threshold, and pruning channels with Final values smaller than the threshold. The threshold value is set according to the actual situation, generally, a plurality of threshold values are set for carrying out a plurality of experiments, and after pruning is found, the one-time experiment result with the best precision recovery and the maximum pruning rate is obtained through fine tuning training.
Step 8): and (4) fine-tuning the training network (similar to the step 3 in the unstructured pruning process) to obtain a final pruning network.
Fig. 4 shows the experimental results of selecting two networks YOLOv3 and YOLOv3-tiny to perform one-time pruning and weight compression pruning on the COCO data set. According to the data analysis, compared with a one-time pruning method, the weight compression pruning gradually increases the pruning proportion in a small-scale weight compression mode each time, and when the pruning proportion is the same as or higher than that of the one-time pruning, the accuracy mAP of a model obtained through the weight compression pruning is higher than that of the one-time pruning by about 1% on average.
Fig. 5 and 6 are results of experiments on VOC datasets and oxford datasets selecting YOLOv3 networks to validate the integrated standard pruning method. Analyzing the data in fig. 5, and observing the experimental results of the groups 2, 3, and 5, it can be known that the precision can be recovered higher after network fine tuning by using the integrated standard pruning method under the same pruning proportion; the results of the 3 and 6 groups are observed to show that the pruning ratio is 1.5% higher by using the integrated standard pruning method under the condition of keeping equal precision, and the data conclusion of the figure 6 is also the same.
In conclusion, the experimental results can judge that compared with the traditional pruning method, the pruning method has the characteristics of larger pruning proportion, higher model precision recovery after pruning and the like.
The above description is only a preferred embodiment of the present invention, and the scope of the present invention is not limited to the above embodiment, but equivalent modifications or changes made by those skilled in the art according to the present disclosure should be included in the scope of the present invention as set forth in the appended claims.

Claims (7)

1. A weight compression and integrated standard pruning method for a neural network is characterized by comprising the following steps:
the method comprises a weight compression pruning method for unstructured pruning and an integrated standard pruning method for structured pruning;
the weight compression pruning method comprises the following steps:
step 1): sparsely training an original neural network, and setting a pruning threshold;
step 2): pruning is carried out according to a threshold value;
step 3): training the network fine adjustment;
step 4): compressing the weight by 10%;
and step 5): repeating the steps 2 to 4 until an ideal target network is reached;
an integrated standard pruning method comprising the steps of:
step 1): measuring the raw detection accuracy mAP of a neural network 0
Step 2): based on two measurement standards of the sum L1 of the absolute values of the weights of the convolution kernels and the scaling factor gamma, pruning and fine tuning training are respectively carried out to obtain corresponding detection precision, and the corresponding importance standard delta mAP is obtained through processing L1 And Δ mAP γ And proportional value thereof
Figure FDA0003994272690000011
And λ γ
Step 3): sparsely training an original neural network;
step 4): calculating the sum L1 of absolute values of convolution kernels in all layers to be pruned, and then calculating the average value aveL1;
step 5): collecting the scaling factor gamma of each channel;
step 6): by passing
Figure FDA0003994272690000012
λ γ Calculating aveL1 and gamma to obtain an importance measurement standard Final;
step 7): judging the importance of the channel by using a Final importance measurement standard, and pruning the network;
step 8): and fine-tuning the training network to obtain a final pruning network.
2. The weight compression and integration standard pruning method for the neural network according to claim 1, characterized in that: in the step 1) of the weight compression pruning method, the sparse training is to add the weight w into the loss function i L2 regularization term of
Figure FDA0003994272690000021
Pre-training is carried out on a data set, and the weight in the trained network is biased to be near 0, so that the network has sparsity; the pruning threshold is set to 0.01, i.e. weights with absolute values less than 0.01 are to be pruned.
3. The weight compression and integration standard pruning method for the neural network according to claim 1, characterized in that: in step 5) of the weight compression pruning method, after the last compression pruning, if the difference between the model precision and the original precision exceeds more than 1%, the pruning is stopped, and the model which is not subjected to the last pruning is the ideal target network.
4. The weight compression and integration standard pruning method for the neural network according to claim 1, characterized in that: step 2) of the integrated standard pruning method comprises the following steps:
step 2.1): using the measurement standard of the sum L1 of the absolute values of the weights of the convolution kernels, pruning 50% of the neural network, carrying out fine tuning training for 100 generations, and measuring the detection accuracy mAP of the neural network at the moment L1
Step 2.2): using a scaling factor gamma measurement standard, pruning 50% of the neural network, carrying out fine tuning training for 100 generations, and measuring the detection precision mAP of the neural network at the moment γ
Step 2.3): let initial precision mAP 0 Respectively making quotient with the precision after 2 kinds of pruning to obtain delta mAP L1 And Δ mAP γ
Step 2.4): according to Δ mAP L1 And Δ mAP γ Calculating the ratio of the two importance criteria in the Final importance criteria Final
Figure FDA0003994272690000022
And λ γ
5. According to claimThe weight compression and integration standard pruning method for the neural network according to claim 4, characterized in that: in step 2.3), Δ mAP L1 And Δ mAP γ The calculation of (d) is shown in equation 1 and equation 2:
ΔmAP L1 =mAP 0 /mAP L1 (1)
ΔmAP γ =mAP 0 /mAP γ (2)。
6. the weight compression and integration standard pruning method for the neural network according to claim 5, wherein: in step 2.4), the proportion of the two importance standards in the Final importance standard Final is calculated
Figure FDA0003994272690000031
And λ γ As shown in equations 3 and 4:
Figure FDA0003994272690000032
λ γ =ΔmAP γ /(ΔmAP L1 +ΔmAP γ ) (4)。
7. the weight compression and integration standard pruning method for the neural network according to claim 6, wherein: in step 6) of the integration standard pruning method, the importance measure Final is calculated as shown in equation 5:
Figure FDA0003994272690000033
/>
CN202211590899.8A 2022-12-12 2022-12-12 Weight compression and integration standard pruning method for neural network Pending CN115936099A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211590899.8A CN115936099A (en) 2022-12-12 2022-12-12 Weight compression and integration standard pruning method for neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211590899.8A CN115936099A (en) 2022-12-12 2022-12-12 Weight compression and integration standard pruning method for neural network

Publications (1)

Publication Number Publication Date
CN115936099A true CN115936099A (en) 2023-04-07

Family

ID=86652004

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211590899.8A Pending CN115936099A (en) 2022-12-12 2022-12-12 Weight compression and integration standard pruning method for neural network

Country Status (1)

Country Link
CN (1) CN115936099A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117058525A (en) * 2023-10-08 2023-11-14 之江实验室 Model training method and device, storage medium and electronic equipment

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117058525A (en) * 2023-10-08 2023-11-14 之江实验室 Model training method and device, storage medium and electronic equipment
CN117058525B (en) * 2023-10-08 2024-02-06 之江实验室 Model training method and device, storage medium and electronic equipment

Similar Documents

Publication Publication Date Title
CN110619385A (en) Structured network model compression acceleration method based on multi-stage pruning
CN109002889B (en) Adaptive iterative convolution neural network model compression method
CN111325342A (en) Model compression method and device, target detection equipment and storage medium
CN112016674A (en) Knowledge distillation-based convolutional neural network quantification method
CN115936099A (en) Weight compression and integration standard pruning method for neural network
CN111105035A (en) Neural network pruning method based on combination of sparse learning and genetic algorithm
CN110162290B (en) Compression method for DeMURA data of OLED screen
CN115512166B (en) Intelligent preparation method and system of lens
CN112308316A (en) Crime number prediction method based on linear regression algorithm
CN111489364A (en) Medical image segmentation method based on lightweight full convolution neural network
CN113610227B (en) Deep convolutional neural network pruning method for image classification
CN112561041A (en) Neural network model acceleration method and platform based on filter distribution
CN111985825A (en) Crystal face quality evaluation method for roller mill orientation instrument
CN110689065A (en) Hyperspectral image classification method based on flat mixed convolution neural network
CN112926533A (en) Optical remote sensing image ground feature classification method and system based on bidirectional feature fusion
CN112465140A (en) Convolutional neural network model compression method based on packet channel fusion
CN115329880A (en) Meteorological feature extraction method and device, computer equipment and storage medium
CN105139373B (en) Non-reference picture assessment method for encoding quality based on independence subspace analysis
CN113112482A (en) PCB defect detection method based on attention mechanism network
CN113222920A (en) Suction pipe defect detection method based on pruning Yolov3
CN112288744A (en) SAR image change detection method based on integer reasoning quantification CNN
CN114220024B (en) Static satellite sand storm identification method based on deep learning
CN116910506A (en) Load dimension reduction clustering method based on space-time network variation self-encoder algorithm
CN115564043B (en) Image classification model pruning method and device, electronic equipment and storage medium
CN111340098A (en) STA-Net age prediction method based on shoe print image

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination