CN110674939A

CN110674939A - Deep neural network model compression method based on pruning threshold automatic search

Info

Publication number: CN110674939A
Application number: CN201910820043.7A
Authority: CN
Inventors: 刘欣刚; 钟鲁豪; 朱超; 王文涵; 吴立帅; 代成
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2019-08-31
Filing date: 2019-08-31
Publication date: 2020-01-10

Abstract

The invention discloses a deep neural network model compression method based on pruning threshold automatic search, and belongs to the field of deep neural network model compression. The invention comprises the following steps: model training to obtain an initial model for pruning; carrying out self-adaptive grid search on the model parameters to obtain a first pruning threshold; further reducing the threshold interval corresponding to the first pruning threshold by combining a binary search method, and searching for a more optimal threshold to obtain a second pruning threshold; performing iterative pruning processing on the original network model based on the second pruning threshold value; and carrying out sparse storage on the pruned model to obtain a usable compression network model. The deep neural network model compression method based on pruning threshold automatic search can compress the existing main deep neural network model, solves the technical problem that the deep neural network model cannot be deployed on embedded equipment due to large model, and expands the application range of the deep neural network model.

Description

Deep neural network model compression method based on pruning threshold automatic search

Technical Field

The invention belongs to the field of deep neural network model compression, and particularly relates to a deep neural network model compression method based on pruning threshold automatic search.

Background

The development of deep learning makes deep neural networks increasingly applied to computer vision tasks such as image recognition, detection and tracking, and network models increasingly tend to be designed in a wider and deeper direction. The success of deep learning depends largely on the large number of parameters of the model and the computing device with powerful capabilities. However, the deep neural network is difficult to deploy on a low-storage and low-power-consumption hardware platform (such as a mobile device) due to the huge memory requirement and computational consumption, which greatly limits the application. Therefore, researching how to effectively compress the neural network model under the condition of ensuring that the performance of the existing deep neural network model is not changed is an important problem to be solved.

The model pruning method becomes one of the most representative techniques in the model compression method due to the characteristics of simplicity and effectiveness. The model pruning method mainly comprises the step of obtaining the effect of a compression model by searching an effective parameter importance judging means and cutting unimportant parameters. However, most of the existing main methods clip by defining the pruning rate in advance, and then recover the model accuracy rate by retraining. There are two problems, one is that the definition of pruning rate is artificially specified, rather than model automatic search, model pruning threshold may have a better value; another problem is that excessive pruning may cause the model accuracy to be difficult to recover, and the relationship between the model accuracy and the model compression ratio is difficult to achieve a good balance. Therefore, a new approach is needed to address this need.

Disclosure of Invention

The invention aims to: aiming at the existing problems, a method for balancing the relation between the model compression ratio and the accuracy and adaptively searching the pruning threshold is provided.

The invention discloses a deep neural network model compression method based on pruning threshold automatic search, which comprises the following steps:

s1: carrying out model training on an original network model to be compressed to obtain an initial model for pruning;

s2: searching an interval threshold, and performing self-adaptive grid searching on the model parameters to obtain a first pruning threshold;

s3: carrying out pruning threshold search optimization, further reducing a threshold interval corresponding to the first pruning threshold by combining a binary search method, and searching for a more optimal threshold to obtain a second pruning threshold;

wherein the threshold interval corresponding to the first pruning threshold is [ V ]₁ ^*，V₁ ^*+σ)，V₁ ^*Representing a first pruning threshold, and sigma representing a search step length, namely an interval value, of the adaptive grid search;

s4: performing iterative pruning processing on the initial model based on the second pruning threshold value, and performing retraining on the network model after each pruning;

each pruning treatment is carried out, namely the weight of the parameter smaller than the second pruning threshold value is set to be zero, a sparse network model is obtained after pruning, and the accuracy of the network model can be reduced to a certain extent; therefore, the sparse network model obtained after pruning each time is retrained, and the accuracy of loss is improved by a retraining method;

secondly, weighting the parameters of the network idol after retraining, setting the parameters of the network idol smaller than a second pruning threshold to zero, and retraining so as to obtain a final network model after iterative pruning;

s5: and sparsely storing the network model subjected to iterative pruning. Namely, the final network model after the iterative pruning processing is sparsely stored, so that a usable compressed network model is obtained.

Wherein, step S2 includes the following steps:

s21: setting the accuracy drop threshold theta of the model_aThe model is smaller than a given threshold value (theta) in an accurate descending range_a) Pruning is carried out under the condition of (1);

s22: obtaining all parameter weights W of the model, and calculating the maximum absolute value | W of the parameter weights_max| and minimum | W_min|；

S23: setting the size N of the threshold interval, and dividing the absolute value of the parameter weight between the maximum value and the minimum value at equal intervals to obtain the value N of the threshold interval₀：

S24: with n₀A plurality of test thresholds are derived for the interval values:

s25: testing the model accuracy corresponding to each test threshold, and when the accuracy drop range does not exceed a given threshold theta_aUnder the condition of (1), obtaining the optimal pruning threshold value V through grid search_thresholdAnd pruning all parameter weights in the original network model, which are smaller than the optimal pruning threshold value;

wherein the content of the first and second substances,

wherein M represents a model parameter mask corresponding to the parameter weight W, mask values of 0 or 1 respectively represent pruning or retaining the parameter weight W, W ⊙ MⁿExpressing the parameter value after pruning, wherein A (-) is an accuracy function of the network model under the given parameter;

wherein, step S3 includes the following steps:

s31: a threshold value interval [ V ] corresponding to the first pruning threshold value₁ ^*，V₁ ^*Median of + σ) as initial temporary second pruning threshold

S32: based on the current temporary second pruning threshold

Pruning the initial modelJudging whether the descending value of the accuracy rate of the network model after pruning processing does not exceed a given threshold value, if so, executing the step S34; otherwise, executing step S33;

s33: judgment of

If the difference is larger than the preset binary error value, a new threshold value interval is used if the difference is larger than the preset binary error value

Is used as a new temporary second pruning threshold

And performs step S32;

if not, the device will

As the second pruning threshold obtained at the end;

s34: judgment ofIf the difference is larger than the preset binary error value, a new threshold value interval is used if the difference is larger than the preset binary error value

Is used as a new temporary second pruning thresholdAnd performs step S32;

if not, the device will

As the resulting second pruning threshold.

In summary, due to the adoption of the technical scheme, the invention has the beneficial effects that: the deep neural network model compression method can perform model compression by adaptively searching the model pruning threshold without reducing the model accuracy, and effectively balances the relationship between the model accuracy and the compression ratio. And the method has better self-adaptability aiming at different deep neural network models, reduces the process of manually setting the pruning rate, can achieve better model compression effect, and provides a feasible technology for deploying the deep neural network models to the embedded equipment with limited resources.

Drawings

FIG. 1: the invention is a general framework schematic diagram.

FIG. 2: the model parameter updating schematic diagram of the invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the following embodiments and accompanying drawings.

Referring to fig. 1, the method for compressing a deep neural network model based on pruning threshold automatic search of the present invention includes the following steps:

wherein the threshold interval corresponding to the first pruning threshold is [ V ]₁ ^*，V₁ ^*+σ)，V₁ ^*Representing a first pruning threshold, and sigma representing a search step length of the adaptive grid search;

s4: performing iterative pruning treatment on the original network model based on a second pruning threshold value:

setting the weight of the parameter smaller than the second pruning threshold value to zero to obtain the current pruned network model;

retraining the network model after the current pruning;

judging whether the parameter weight smaller than the second pruning threshold still exists, if so, continuously setting the parameter weight of the second pruning threshold to zero, and then retraining; otherwise, the iterative pruning processing is finished.

S5: and sparse storage, wherein the model subjected to iterative pruning is subjected to sparse storage to obtain a usable compression network model.

In the invention, the model training can be realized by training a deep neural network model with a better effect from the beginning aiming at different tasks, and can also be realized by transferring the model trained on a large database such as ImageNet to a specific task in a fine tuning mode. Model training is the basis of model pruning, and because the model pruning in the invention is carried out based on the accuracy of the pre-training model, a basic model with better accuracy is necessary to obtain.

In the invention, the deep neural network model is compressed mainly by providing a method for balancing the relation between the model compression ratio and the accuracy and adaptively searching the pruning threshold. Specifically, first, the accuracy rate reduction threshold θ of the model is defined_aThe method is characterized in that the model compression is ensured to be carried out within the allowable range of the reduction of the model accuracy rate, the method is different from the existing main method, the accuracy rate is ensured to be always within the expected range of the method on the premise of pruning without reducing the accuracy rate, and meanwhile, the process of long-time retraining caused by the reduction of the accuracy rate of other methods is avoided. Then, the invention acquires all parameters W of the model parameters and calculates the maximum absolute value | W of the parameter weight_max| and minimum | W_minAnd setting the size N of the threshold interval, and dividing the parameter weight between the maximum value and the minimum value at equal intervals to obtain the threshold interval value as follows:

after obtaining the threshold interval value, taking the threshold interval value as an interval unit, the invention obtains a plurality of test thresholds within the range of model parameters as follows:

and then testing the model accuracy corresponding to each test threshold. Within the accuracy reduction range, the given threshold value theta is not exceeded_aUnder the condition of (1), obtaining the optimal pruning threshold value V through grid search_thresholdAnd pruning all parameters with the weight smaller than the threshold in the model, wherein the threshold search formula is as follows:

wherein M is a model parameter mask corresponding to W, mask values of 0 or 1 respectively represent pruning or retaining of the parameters, W ⊙ MⁿRepresenting the value of the parameter after pruning, a (-) is a function of the model's accuracy under the given parameter.

In the invention, pruning optimization is carried out based on a threshold interval obtained in an interval threshold pruning process. Suppose that during the interval threshold, when the threshold is VⁿWhen the model obtains the maximum compression rate on the premise that the accuracy rate is reduced and does not exceed a given range, and when the threshold value is Vⁿ⁺¹And meanwhile, the accuracy of the model is reduced to exceed a given range, and the pruning requirement is not met. Therefore, the invention can obtain a rough pruning threshold interval of Vⁿ,Vⁿ⁺¹) There may be a finer threshold within this interval so that the model can be further compressed. Therefore, the invention combines the thought of the binary search method and finds a better pruning threshold by continuously reducing the threshold interval. Here, the present invention sets an error value eps, that is, when the threshold interval value is smaller than the error value, it can be considered that a better pruning threshold has been found, and the left boundary of the interval is taken as the final pruning threshold. And pruning the neural network model through the threshold value, thereby realizing the further compression of the model.

In the invention, model retraining is to recover a certain accuracy, which is different from other methods that require a large number of retraining processes to recover the accuracy. Since parameters that initially appear to be less important may be important later in the model's updating, and if these parameters were clipped from the beginning, the damage to the model's effect may be unrecoverable, the parameters are updated continuously in the present invention by adding a parameter mask. In this process, the retraining can update the model parameters, and those parameters that have been clipped can also be recovered through the retraining process, which can be seen in fig. 2.

In the invention, sparse storage is a method for efficiently storing the pruned model. Because the parameters stored in the model are stored in a four-dimensional tensor form, in order to improve the storage efficiency, the four-dimensional tensor is converted into a two-dimensional matrix form, the two-dimensional matrix is still sparse at the moment, and then the sparse matrices are effectively stored by adopting row compression storage/column compression storage. The size of the stored model is far smaller than that of the original model, so that the effect of compressing the model is achieved.

In summary, the invention provides a model threshold value searching method based on grid search by balancing the relation between the accuracy and the compression ratio of the deep neural network model, and further optimizes the threshold value range on the basis, thereby achieving better model compression effect. The method can be well applied to some embedded devices with limited resources, and the application range of the deep neural network model is greatly expanded.

Namely, the beneficial technical effects of the invention are as follows:

1. a pruning method based on model accuracy is provided, and the size of the model is greatly reduced by balancing the relation between the model accuracy and the compression ratio under the condition of ensuring that the model accuracy is not reduced.

2. The automatic model threshold searching method based on grid search can adaptively search for a proper pruning threshold aiming at different models, and meanwhile, the method is further optimized by combining a binary search method, so that a better pruning effect is achieved.

3. The method can be combined with methods such as parameter sharing, quantification, low-rank decomposition and the like, the model is further compressed, and the compression effect is improved.

While the invention has been described with reference to specific embodiments, any feature disclosed in this specification may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise; all of the disclosed features, or all of the method or process steps, may be combined in any combination, except mutually exclusive features and/or steps.

Claims

1. A deep neural network model compression method based on pruning threshold automatic search is characterized by comprising the following steps:

s2: carrying out self-adaptive grid search on the model parameters to obtain a first pruning threshold;

s3: further reducing the threshold interval corresponding to the first pruning threshold by combining a binary search method, and searching for a more optimal threshold to obtain a second pruning threshold;

s4: performing iterative pruning treatment on the initial model based on the second pruning threshold value:

s41: setting the weight of the parameter smaller than the second pruning threshold value to zero to obtain the current pruned network model;

s42: retraining the network model after the current pruning;

s43: judging whether the currently retrained network model has the parameter weight smaller than the second pruning threshold, if so, continuing to execute the step S41; otherwise, obtaining a network model after iterative pruning processing based on the network model after current retraining;

s5: and sparsely storing the network model subjected to iterative pruning.

2. The method of claim 1, wherein the step S2 includes the steps of:

s21: setting the accuracy drop threshold theta of the model_a；

s25: testing the model accuracy corresponding to each test threshold, and when the accuracy drop range does not exceed a given threshold theta_aUnder the condition of (1), obtaining the optimal pruning threshold value V through grid search_threshold；

Wherein the content of the first and second substances,

wherein M represents a model parameter mask corresponding to the parameter weight W, mask values of 0 or 1 respectively represent pruning or retaining the parameter weight W, W ⊙ MⁿRepresenting the value of the parameter after pruning, a (-) is a function of the accuracy of the network model at the given parameter.

3. The method of claim 1, wherein the step S3 includes the steps of:

S32: based on the current temporary second pruning threshold

Pruning the initial model, judging whether the accuracy rate of the pruned network model is not more than a given threshold value, if so, executing the step S34; otherwise, executing step S33;

s33: judgment of

If the difference is larger than the preset binary error value, a new threshold value interval is used if the difference is larger than the preset binary error valueIs used as a new temporary second pruning threshold

And performs step S32;

if not, the device will

As the second pruning threshold obtained at the end;

s34: judgment of

Is used as a new temporary second pruning threshold

And performs step S32;

if not, the device will

As the resulting second pruning threshold.