CN110674939A - Deep neural network model compression method based on pruning threshold automatic search - Google Patents
Deep neural network model compression method based on pruning threshold automatic search Download PDFInfo
- Publication number
- CN110674939A CN110674939A CN201910820043.7A CN201910820043A CN110674939A CN 110674939 A CN110674939 A CN 110674939A CN 201910820043 A CN201910820043 A CN 201910820043A CN 110674939 A CN110674939 A CN 110674939A
- Authority
- CN
- China
- Prior art keywords
- pruning
- threshold
- model
- network model
- value
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/082—Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
The invention discloses a deep neural network model compression method based on pruning threshold automatic search, and belongs to the field of deep neural network model compression. The invention comprises the following steps: model training to obtain an initial model for pruning; carrying out self-adaptive grid search on the model parameters to obtain a first pruning threshold; further reducing the threshold interval corresponding to the first pruning threshold by combining a binary search method, and searching for a more optimal threshold to obtain a second pruning threshold; performing iterative pruning processing on the original network model based on the second pruning threshold value; and carrying out sparse storage on the pruned model to obtain a usable compression network model. The deep neural network model compression method based on pruning threshold automatic search can compress the existing main deep neural network model, solves the technical problem that the deep neural network model cannot be deployed on embedded equipment due to large model, and expands the application range of the deep neural network model.
Description
Technical Field
The invention belongs to the field of deep neural network model compression, and particularly relates to a deep neural network model compression method based on pruning threshold automatic search.
Background
The development of deep learning makes deep neural networks increasingly applied to computer vision tasks such as image recognition, detection and tracking, and network models increasingly tend to be designed in a wider and deeper direction. The success of deep learning depends largely on the large number of parameters of the model and the computing device with powerful capabilities. However, the deep neural network is difficult to deploy on a low-storage and low-power-consumption hardware platform (such as a mobile device) due to the huge memory requirement and computational consumption, which greatly limits the application. Therefore, researching how to effectively compress the neural network model under the condition of ensuring that the performance of the existing deep neural network model is not changed is an important problem to be solved.
The model pruning method becomes one of the most representative techniques in the model compression method due to the characteristics of simplicity and effectiveness. The model pruning method mainly comprises the step of obtaining the effect of a compression model by searching an effective parameter importance judging means and cutting unimportant parameters. However, most of the existing main methods clip by defining the pruning rate in advance, and then recover the model accuracy rate by retraining. There are two problems, one is that the definition of pruning rate is artificially specified, rather than model automatic search, model pruning threshold may have a better value; another problem is that excessive pruning may cause the model accuracy to be difficult to recover, and the relationship between the model accuracy and the model compression ratio is difficult to achieve a good balance. Therefore, a new approach is needed to address this need.
Disclosure of Invention
The invention aims to: aiming at the existing problems, a method for balancing the relation between the model compression ratio and the accuracy and adaptively searching the pruning threshold is provided.
The invention discloses a deep neural network model compression method based on pruning threshold automatic search, which comprises the following steps:
s1: carrying out model training on an original network model to be compressed to obtain an initial model for pruning;
s2: searching an interval threshold, and performing self-adaptive grid searching on the model parameters to obtain a first pruning threshold;
s3: carrying out pruning threshold search optimization, further reducing a threshold interval corresponding to the first pruning threshold by combining a binary search method, and searching for a more optimal threshold to obtain a second pruning threshold;
wherein the threshold interval corresponding to the first pruning threshold is [ V ]1 *,V1 *+σ),V1 *Representing a first pruning threshold, and sigma representing a search step length, namely an interval value, of the adaptive grid search;
s4: performing iterative pruning processing on the initial model based on the second pruning threshold value, and performing retraining on the network model after each pruning;
each pruning treatment is carried out, namely the weight of the parameter smaller than the second pruning threshold value is set to be zero, a sparse network model is obtained after pruning, and the accuracy of the network model can be reduced to a certain extent; therefore, the sparse network model obtained after pruning each time is retrained, and the accuracy of loss is improved by a retraining method;
secondly, weighting the parameters of the network idol after retraining, setting the parameters of the network idol smaller than a second pruning threshold to zero, and retraining so as to obtain a final network model after iterative pruning;
s5: and sparsely storing the network model subjected to iterative pruning. Namely, the final network model after the iterative pruning processing is sparsely stored, so that a usable compressed network model is obtained.
Wherein, step S2 includes the following steps:
s21: setting the accuracy drop threshold theta of the modelaThe model is smaller than a given threshold value (theta) in an accurate descending rangea) Pruning is carried out under the condition of (1);
s22: obtaining all parameter weights W of the model, and calculating the maximum absolute value | W of the parameter weightsmax| and minimum | Wmin|;
S23: setting the size N of the threshold interval, and dividing the absolute value of the parameter weight between the maximum value and the minimum value at equal intervals to obtain the value N of the threshold interval0:
s25: testing the model accuracy corresponding to each test threshold, and when the accuracy drop range does not exceed a given threshold thetaaUnder the condition of (1), obtaining the optimal pruning threshold value V through grid searchthresholdAnd pruning all parameter weights in the original network model, which are smaller than the optimal pruning threshold value;
wherein M represents a model parameter mask corresponding to the parameter weight W, mask values of 0 or 1 respectively represent pruning or retaining the parameter weight W, W ⊙ MnExpressing the parameter value after pruning, wherein A (-) is an accuracy function of the network model under the given parameter;
wherein, step S3 includes the following steps:
s31: a threshold value interval [ V ] corresponding to the first pruning threshold value1 *,V1 *Median of + σ) as initial temporary second pruning thresholdS32: based on the current temporary second pruning thresholdPruning the initial modelJudging whether the descending value of the accuracy rate of the network model after pruning processing does not exceed a given threshold value, if so, executing the step S34; otherwise, executing step S33;
s33: judgment ofIf the difference is larger than the preset binary error value, a new threshold value interval is used if the difference is larger than the preset binary error valueIs used as a new temporary second pruning thresholdAnd performs step S32;
s34: judgment ofIf the difference is larger than the preset binary error value, a new threshold value interval is used if the difference is larger than the preset binary error valueIs used as a new temporary second pruning thresholdAnd performs step S32;
In summary, due to the adoption of the technical scheme, the invention has the beneficial effects that: the deep neural network model compression method can perform model compression by adaptively searching the model pruning threshold without reducing the model accuracy, and effectively balances the relationship between the model accuracy and the compression ratio. And the method has better self-adaptability aiming at different deep neural network models, reduces the process of manually setting the pruning rate, can achieve better model compression effect, and provides a feasible technology for deploying the deep neural network models to the embedded equipment with limited resources.
Drawings
FIG. 1: the invention is a general framework schematic diagram.
FIG. 2: the model parameter updating schematic diagram of the invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the following embodiments and accompanying drawings.
Referring to fig. 1, the method for compressing a deep neural network model based on pruning threshold automatic search of the present invention includes the following steps:
s1: carrying out model training on an original network model to be compressed to obtain an initial model for pruning;
s2: searching an interval threshold, and performing self-adaptive grid searching on the model parameters to obtain a first pruning threshold;
s3: carrying out pruning threshold search optimization, further reducing a threshold interval corresponding to the first pruning threshold by combining a binary search method, and searching for a more optimal threshold to obtain a second pruning threshold;
wherein the threshold interval corresponding to the first pruning threshold is [ V ]1 *,V1 *+σ),V1 *Representing a first pruning threshold, and sigma representing a search step length of the adaptive grid search;
s4: performing iterative pruning treatment on the original network model based on a second pruning threshold value:
setting the weight of the parameter smaller than the second pruning threshold value to zero to obtain the current pruned network model;
retraining the network model after the current pruning;
judging whether the parameter weight smaller than the second pruning threshold still exists, if so, continuously setting the parameter weight of the second pruning threshold to zero, and then retraining; otherwise, the iterative pruning processing is finished.
S5: and sparse storage, wherein the model subjected to iterative pruning is subjected to sparse storage to obtain a usable compression network model.
In the invention, the model training can be realized by training a deep neural network model with a better effect from the beginning aiming at different tasks, and can also be realized by transferring the model trained on a large database such as ImageNet to a specific task in a fine tuning mode. Model training is the basis of model pruning, and because the model pruning in the invention is carried out based on the accuracy of the pre-training model, a basic model with better accuracy is necessary to obtain.
In the invention, the deep neural network model is compressed mainly by providing a method for balancing the relation between the model compression ratio and the accuracy and adaptively searching the pruning threshold. Specifically, first, the accuracy rate reduction threshold θ of the model is definedaThe method is characterized in that the model compression is ensured to be carried out within the allowable range of the reduction of the model accuracy rate, the method is different from the existing main method, the accuracy rate is ensured to be always within the expected range of the method on the premise of pruning without reducing the accuracy rate, and meanwhile, the process of long-time retraining caused by the reduction of the accuracy rate of other methods is avoided. Then, the invention acquires all parameters W of the model parameters and calculates the maximum absolute value | W of the parameter weightmax| and minimum | WminAnd setting the size N of the threshold interval, and dividing the parameter weight between the maximum value and the minimum value at equal intervals to obtain the threshold interval value as follows:
after obtaining the threshold interval value, taking the threshold interval value as an interval unit, the invention obtains a plurality of test thresholds within the range of model parameters as follows:
and then testing the model accuracy corresponding to each test threshold. Within the accuracy reduction range, the given threshold value theta is not exceededaUnder the condition of (1), obtaining the optimal pruning threshold value V through grid searchthresholdAnd pruning all parameters with the weight smaller than the threshold in the model, wherein the threshold search formula is as follows:
wherein M is a model parameter mask corresponding to W, mask values of 0 or 1 respectively represent pruning or retaining of the parameters, W ⊙ MnRepresenting the value of the parameter after pruning, a (-) is a function of the model's accuracy under the given parameter.
In the invention, pruning optimization is carried out based on a threshold interval obtained in an interval threshold pruning process. Suppose that during the interval threshold, when the threshold is VnWhen the model obtains the maximum compression rate on the premise that the accuracy rate is reduced and does not exceed a given range, and when the threshold value is Vn+1And meanwhile, the accuracy of the model is reduced to exceed a given range, and the pruning requirement is not met. Therefore, the invention can obtain a rough pruning threshold interval of Vn,Vn+1) There may be a finer threshold within this interval so that the model can be further compressed. Therefore, the invention combines the thought of the binary search method and finds a better pruning threshold by continuously reducing the threshold interval. Here, the present invention sets an error value eps, that is, when the threshold interval value is smaller than the error value, it can be considered that a better pruning threshold has been found, and the left boundary of the interval is taken as the final pruning threshold. And pruning the neural network model through the threshold value, thereby realizing the further compression of the model.
In the invention, model retraining is to recover a certain accuracy, which is different from other methods that require a large number of retraining processes to recover the accuracy. Since parameters that initially appear to be less important may be important later in the model's updating, and if these parameters were clipped from the beginning, the damage to the model's effect may be unrecoverable, the parameters are updated continuously in the present invention by adding a parameter mask. In this process, the retraining can update the model parameters, and those parameters that have been clipped can also be recovered through the retraining process, which can be seen in fig. 2.
In the invention, sparse storage is a method for efficiently storing the pruned model. Because the parameters stored in the model are stored in a four-dimensional tensor form, in order to improve the storage efficiency, the four-dimensional tensor is converted into a two-dimensional matrix form, the two-dimensional matrix is still sparse at the moment, and then the sparse matrices are effectively stored by adopting row compression storage/column compression storage. The size of the stored model is far smaller than that of the original model, so that the effect of compressing the model is achieved.
In summary, the invention provides a model threshold value searching method based on grid search by balancing the relation between the accuracy and the compression ratio of the deep neural network model, and further optimizes the threshold value range on the basis, thereby achieving better model compression effect. The method can be well applied to some embedded devices with limited resources, and the application range of the deep neural network model is greatly expanded.
Namely, the beneficial technical effects of the invention are as follows:
1. a pruning method based on model accuracy is provided, and the size of the model is greatly reduced by balancing the relation between the model accuracy and the compression ratio under the condition of ensuring that the model accuracy is not reduced.
2. The automatic model threshold searching method based on grid search can adaptively search for a proper pruning threshold aiming at different models, and meanwhile, the method is further optimized by combining a binary search method, so that a better pruning effect is achieved.
3. The method can be combined with methods such as parameter sharing, quantification, low-rank decomposition and the like, the model is further compressed, and the compression effect is improved.
While the invention has been described with reference to specific embodiments, any feature disclosed in this specification may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise; all of the disclosed features, or all of the method or process steps, may be combined in any combination, except mutually exclusive features and/or steps.
Claims (3)
1. A deep neural network model compression method based on pruning threshold automatic search is characterized by comprising the following steps:
s1: carrying out model training on an original network model to be compressed to obtain an initial model for pruning;
s2: carrying out self-adaptive grid search on the model parameters to obtain a first pruning threshold;
s3: further reducing the threshold interval corresponding to the first pruning threshold by combining a binary search method, and searching for a more optimal threshold to obtain a second pruning threshold;
wherein the threshold interval corresponding to the first pruning threshold is [ V ]1 *,V1 *+σ),V1 *Representing a first pruning threshold, and sigma representing a search step length, namely an interval value, of the adaptive grid search;
s4: performing iterative pruning treatment on the initial model based on the second pruning threshold value:
s41: setting the weight of the parameter smaller than the second pruning threshold value to zero to obtain the current pruned network model;
s42: retraining the network model after the current pruning;
s43: judging whether the currently retrained network model has the parameter weight smaller than the second pruning threshold, if so, continuing to execute the step S41; otherwise, obtaining a network model after iterative pruning processing based on the network model after current retraining;
s5: and sparsely storing the network model subjected to iterative pruning.
2. The method of claim 1, wherein the step S2 includes the steps of:
s21: setting the accuracy drop threshold theta of the modela;
S22: obtaining all parameter weights W of the model, and calculating the maximum absolute value | W of the parameter weightsmax| and minimum | Wmin|;
S23: setting the size N of the threshold interval, and dividing the absolute value of the parameter weight between the maximum value and the minimum value at equal intervals to obtain the value N of the threshold interval0:
s25: testing the model accuracy corresponding to each test threshold, and when the accuracy drop range does not exceed a given threshold thetaaUnder the condition of (1), obtaining the optimal pruning threshold value V through grid searchthreshold;
wherein M represents a model parameter mask corresponding to the parameter weight W, mask values of 0 or 1 respectively represent pruning or retaining the parameter weight W, W ⊙ MnRepresenting the value of the parameter after pruning, a (-) is a function of the accuracy of the network model at the given parameter.
3. The method of claim 1, wherein the step S3 includes the steps of:
s31: a threshold value interval [ V ] corresponding to the first pruning threshold value1 *,V1 *Median of + σ) as initial temporary second pruning threshold
S32: based on the current temporary second pruning thresholdPruning the initial model, judging whether the accuracy rate of the pruned network model is not more than a given threshold value, if so, executing the step S34; otherwise, executing step S33;
s33: judgment ofIf the difference is larger than the preset binary error value, a new threshold value interval is used if the difference is larger than the preset binary error valueIs used as a new temporary second pruning thresholdAnd performs step S32;
s34: judgment ofIf the difference is larger than the preset binary error value, a new threshold value interval is used if the difference is larger than the preset binary error valueIs used as a new temporary second pruning thresholdAnd performs step S32;
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910820043.7A CN110674939A (en) | 2019-08-31 | 2019-08-31 | Deep neural network model compression method based on pruning threshold automatic search |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910820043.7A CN110674939A (en) | 2019-08-31 | 2019-08-31 | Deep neural network model compression method based on pruning threshold automatic search |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110674939A true CN110674939A (en) | 2020-01-10 |
Family
ID=69076581
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910820043.7A Pending CN110674939A (en) | 2019-08-31 | 2019-08-31 | Deep neural network model compression method based on pruning threshold automatic search |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110674939A (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111310918A (en) * | 2020-02-03 | 2020-06-19 | 腾讯科技(深圳)有限公司 | Data processing method and device, computer equipment and storage medium |
CN111382839A (en) * | 2020-02-23 | 2020-07-07 | 华为技术有限公司 | Method and device for pruning neural network |
CN111444760A (en) * | 2020-02-19 | 2020-07-24 | 天津大学 | Traffic sign detection and identification method based on pruning and knowledge distillation |
CN111814975A (en) * | 2020-07-09 | 2020-10-23 | 广东工业大学 | Pruning-based neural network model construction method and related device |
CN112612602A (en) * | 2020-12-11 | 2021-04-06 | 国网浙江省电力有限公司宁波供电公司 | Automatic compression processing method for target detection network model |
CN113128664A (en) * | 2021-03-16 | 2021-07-16 | 广东电力信息科技有限公司 | Neural network compression method, device, electronic equipment and storage medium |
CN115271043A (en) * | 2022-07-28 | 2022-11-01 | 小米汽车科技有限公司 | Model tuning method, model tuning device and storage medium |
-
2019
- 2019-08-31 CN CN201910820043.7A patent/CN110674939A/en active Pending
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111310918A (en) * | 2020-02-03 | 2020-06-19 | 腾讯科技(深圳)有限公司 | Data processing method and device, computer equipment and storage medium |
CN111310918B (en) * | 2020-02-03 | 2023-07-14 | 腾讯科技(深圳)有限公司 | Data processing method, device, computer equipment and storage medium |
CN111444760B (en) * | 2020-02-19 | 2022-09-09 | 天津大学 | Traffic sign detection and identification method based on pruning and knowledge distillation |
CN111444760A (en) * | 2020-02-19 | 2020-07-24 | 天津大学 | Traffic sign detection and identification method based on pruning and knowledge distillation |
CN111382839A (en) * | 2020-02-23 | 2020-07-07 | 华为技术有限公司 | Method and device for pruning neural network |
CN111382839B (en) * | 2020-02-23 | 2024-05-07 | 华为技术有限公司 | Method and device for pruning neural network |
CN111814975A (en) * | 2020-07-09 | 2020-10-23 | 广东工业大学 | Pruning-based neural network model construction method and related device |
CN111814975B (en) * | 2020-07-09 | 2023-07-28 | 广东工业大学 | Neural network model construction method and related device based on pruning |
CN112612602B (en) * | 2020-12-11 | 2023-12-01 | 国网浙江省电力有限公司宁波供电公司 | Automatic compression processing method for target detection network model |
CN112612602A (en) * | 2020-12-11 | 2021-04-06 | 国网浙江省电力有限公司宁波供电公司 | Automatic compression processing method for target detection network model |
CN113128664A (en) * | 2021-03-16 | 2021-07-16 | 广东电力信息科技有限公司 | Neural network compression method, device, electronic equipment and storage medium |
CN115271043A (en) * | 2022-07-28 | 2022-11-01 | 小米汽车科技有限公司 | Model tuning method, model tuning device and storage medium |
CN115271043B (en) * | 2022-07-28 | 2023-10-20 | 小米汽车科技有限公司 | Model tuning method, device and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110674939A (en) | Deep neural network model compression method based on pruning threshold automatic search | |
CN108764471B (en) | Neural network cross-layer pruning method based on feature redundancy analysis | |
US11531889B2 (en) | Weight data storage method and neural network processor based on the method | |
WO2022006919A1 (en) | Activation fixed-point fitting-based method and system for post-training quantization of convolutional neural network | |
CN109002889B (en) | Adaptive iterative convolution neural network model compression method | |
CN113610232B (en) | Network model quantization method and device, computer equipment and storage medium | |
CN102567973B (en) | Image denoising method based on improved shape self-adaptive window | |
CN110807529A (en) | Training method, device, equipment and storage medium of machine learning model | |
CN110705708A (en) | Compression method and device of convolutional neural network model and computer storage medium | |
WO2023098544A1 (en) | Structured pruning method and apparatus based on local sparsity constraints | |
CN113657421B (en) | Convolutional neural network compression method and device, and image classification method and device | |
CN108734264A (en) | Deep neural network model compression method and device, storage medium, terminal | |
CN110598848A (en) | Migration learning acceleration method based on channel pruning | |
CN111814448B (en) | Pre-training language model quantization method and device | |
CN113241064A (en) | Voice recognition method, voice recognition device, model training method, model training device, electronic equipment and storage medium | |
CN113963176B (en) | Model distillation method and device, electronic equipment and storage medium | |
CN113128664A (en) | Neural network compression method, device, electronic equipment and storage medium | |
CN112613604A (en) | Neural network quantification method and device | |
CN115170902B (en) | Training method of image processing model | |
CN111860770A (en) | Model compression method and system integrating clipping and quantization | |
CN116384470A (en) | Convolutional neural network model compression method and device combining quantization and pruning | |
CN115953651A (en) | Model training method, device, equipment and medium based on cross-domain equipment | |
CN113762505B (en) | Method for clustering pruning according to L2 norms of channels of convolutional neural network | |
CN113570037A (en) | Neural network compression method and device | |
CN113887709A (en) | Neural network adaptive quantization method, apparatus, device, medium, and product |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |