CN110674939A - Deep neural network model compression method based on pruning threshold automatic search - Google Patents

Deep neural network model compression method based on pruning threshold automatic search Download PDF

Info

Publication number
CN110674939A
CN110674939A CN201910820043.7A CN201910820043A CN110674939A CN 110674939 A CN110674939 A CN 110674939A CN 201910820043 A CN201910820043 A CN 201910820043A CN 110674939 A CN110674939 A CN 110674939A
Authority
CN
China
Prior art keywords
pruning
threshold
model
network model
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910820043.7A
Other languages
Chinese (zh)
Inventor
刘欣刚
钟鲁豪
朱超
王文涵
吴立帅
代成
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN201910820043.7A priority Critical patent/CN110674939A/en
Publication of CN110674939A publication Critical patent/CN110674939A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention discloses a deep neural network model compression method based on pruning threshold automatic search, and belongs to the field of deep neural network model compression. The invention comprises the following steps: model training to obtain an initial model for pruning; carrying out self-adaptive grid search on the model parameters to obtain a first pruning threshold; further reducing the threshold interval corresponding to the first pruning threshold by combining a binary search method, and searching for a more optimal threshold to obtain a second pruning threshold; performing iterative pruning processing on the original network model based on the second pruning threshold value; and carrying out sparse storage on the pruned model to obtain a usable compression network model. The deep neural network model compression method based on pruning threshold automatic search can compress the existing main deep neural network model, solves the technical problem that the deep neural network model cannot be deployed on embedded equipment due to large model, and expands the application range of the deep neural network model.

Description

Deep neural network model compression method based on pruning threshold automatic search
Technical Field
The invention belongs to the field of deep neural network model compression, and particularly relates to a deep neural network model compression method based on pruning threshold automatic search.
Background
The development of deep learning makes deep neural networks increasingly applied to computer vision tasks such as image recognition, detection and tracking, and network models increasingly tend to be designed in a wider and deeper direction. The success of deep learning depends largely on the large number of parameters of the model and the computing device with powerful capabilities. However, the deep neural network is difficult to deploy on a low-storage and low-power-consumption hardware platform (such as a mobile device) due to the huge memory requirement and computational consumption, which greatly limits the application. Therefore, researching how to effectively compress the neural network model under the condition of ensuring that the performance of the existing deep neural network model is not changed is an important problem to be solved.
The model pruning method becomes one of the most representative techniques in the model compression method due to the characteristics of simplicity and effectiveness. The model pruning method mainly comprises the step of obtaining the effect of a compression model by searching an effective parameter importance judging means and cutting unimportant parameters. However, most of the existing main methods clip by defining the pruning rate in advance, and then recover the model accuracy rate by retraining. There are two problems, one is that the definition of pruning rate is artificially specified, rather than model automatic search, model pruning threshold may have a better value; another problem is that excessive pruning may cause the model accuracy to be difficult to recover, and the relationship between the model accuracy and the model compression ratio is difficult to achieve a good balance. Therefore, a new approach is needed to address this need.
Disclosure of Invention
The invention aims to: aiming at the existing problems, a method for balancing the relation between the model compression ratio and the accuracy and adaptively searching the pruning threshold is provided.
The invention discloses a deep neural network model compression method based on pruning threshold automatic search, which comprises the following steps:
s1: carrying out model training on an original network model to be compressed to obtain an initial model for pruning;
s2: searching an interval threshold, and performing self-adaptive grid searching on the model parameters to obtain a first pruning threshold;
s3: carrying out pruning threshold search optimization, further reducing a threshold interval corresponding to the first pruning threshold by combining a binary search method, and searching for a more optimal threshold to obtain a second pruning threshold;
wherein the threshold interval corresponding to the first pruning threshold is [ V ]1 *,V1 *+σ),V1 *Representing a first pruning threshold, and sigma representing a search step length, namely an interval value, of the adaptive grid search;
s4: performing iterative pruning processing on the initial model based on the second pruning threshold value, and performing retraining on the network model after each pruning;
each pruning treatment is carried out, namely the weight of the parameter smaller than the second pruning threshold value is set to be zero, a sparse network model is obtained after pruning, and the accuracy of the network model can be reduced to a certain extent; therefore, the sparse network model obtained after pruning each time is retrained, and the accuracy of loss is improved by a retraining method;
secondly, weighting the parameters of the network idol after retraining, setting the parameters of the network idol smaller than a second pruning threshold to zero, and retraining so as to obtain a final network model after iterative pruning;
s5: and sparsely storing the network model subjected to iterative pruning. Namely, the final network model after the iterative pruning processing is sparsely stored, so that a usable compressed network model is obtained.
Wherein, step S2 includes the following steps:
s21: setting the accuracy drop threshold theta of the modelaThe model is smaller than a given threshold value (theta) in an accurate descending rangea) Pruning is carried out under the condition of (1);
s22: obtaining all parameter weights W of the model, and calculating the maximum absolute value | W of the parameter weightsmax| and minimum | Wmin|;
S23: setting the size N of the threshold interval, and dividing the absolute value of the parameter weight between the maximum value and the minimum value at equal intervals to obtain the value N of the threshold interval0
Figure BDA0002187279740000021
S24: with n0A plurality of test thresholds are derived for the interval values:
Figure BDA0002187279740000022
s25: testing the model accuracy corresponding to each test threshold, and when the accuracy drop range does not exceed a given threshold thetaaUnder the condition of (1), obtaining the optimal pruning threshold value V through grid searchthresholdAnd pruning all parameter weights in the original network model, which are smaller than the optimal pruning threshold value;
wherein the content of the first and second substances,
Figure BDA0002187279740000023
wherein M represents a model parameter mask corresponding to the parameter weight W, mask values of 0 or 1 respectively represent pruning or retaining the parameter weight W, W ⊙ MnExpressing the parameter value after pruning, wherein A (-) is an accuracy function of the network model under the given parameter;
wherein, step S3 includes the following steps:
s31: a threshold value interval [ V ] corresponding to the first pruning threshold value1 *,V1 *Median of + σ) as initial temporary second pruning threshold
Figure BDA0002187279740000024
S32: based on the current temporary second pruning threshold
Figure BDA0002187279740000025
Pruning the initial modelJudging whether the descending value of the accuracy rate of the network model after pruning processing does not exceed a given threshold value, if so, executing the step S34; otherwise, executing step S33;
s33: judgment of
Figure BDA0002187279740000026
If the difference is larger than the preset binary error value, a new threshold value interval is used if the difference is larger than the preset binary error value
Figure BDA0002187279740000027
Is used as a new temporary second pruning threshold
Figure BDA0002187279740000028
And performs step S32;
if not, the device will
Figure BDA0002187279740000031
As the second pruning threshold obtained at the end;
s34: judgment ofIf the difference is larger than the preset binary error value, a new threshold value interval is used if the difference is larger than the preset binary error value
Figure BDA0002187279740000033
Is used as a new temporary second pruning thresholdAnd performs step S32;
if not, the device will
Figure BDA0002187279740000035
As the resulting second pruning threshold.
In summary, due to the adoption of the technical scheme, the invention has the beneficial effects that: the deep neural network model compression method can perform model compression by adaptively searching the model pruning threshold without reducing the model accuracy, and effectively balances the relationship between the model accuracy and the compression ratio. And the method has better self-adaptability aiming at different deep neural network models, reduces the process of manually setting the pruning rate, can achieve better model compression effect, and provides a feasible technology for deploying the deep neural network models to the embedded equipment with limited resources.
Drawings
FIG. 1: the invention is a general framework schematic diagram.
FIG. 2: the model parameter updating schematic diagram of the invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the following embodiments and accompanying drawings.
Referring to fig. 1, the method for compressing a deep neural network model based on pruning threshold automatic search of the present invention includes the following steps:
s1: carrying out model training on an original network model to be compressed to obtain an initial model for pruning;
s2: searching an interval threshold, and performing self-adaptive grid searching on the model parameters to obtain a first pruning threshold;
s3: carrying out pruning threshold search optimization, further reducing a threshold interval corresponding to the first pruning threshold by combining a binary search method, and searching for a more optimal threshold to obtain a second pruning threshold;
wherein the threshold interval corresponding to the first pruning threshold is [ V ]1 *,V1 *+σ),V1 *Representing a first pruning threshold, and sigma representing a search step length of the adaptive grid search;
s4: performing iterative pruning treatment on the original network model based on a second pruning threshold value:
setting the weight of the parameter smaller than the second pruning threshold value to zero to obtain the current pruned network model;
retraining the network model after the current pruning;
judging whether the parameter weight smaller than the second pruning threshold still exists, if so, continuously setting the parameter weight of the second pruning threshold to zero, and then retraining; otherwise, the iterative pruning processing is finished.
S5: and sparse storage, wherein the model subjected to iterative pruning is subjected to sparse storage to obtain a usable compression network model.
In the invention, the model training can be realized by training a deep neural network model with a better effect from the beginning aiming at different tasks, and can also be realized by transferring the model trained on a large database such as ImageNet to a specific task in a fine tuning mode. Model training is the basis of model pruning, and because the model pruning in the invention is carried out based on the accuracy of the pre-training model, a basic model with better accuracy is necessary to obtain.
In the invention, the deep neural network model is compressed mainly by providing a method for balancing the relation between the model compression ratio and the accuracy and adaptively searching the pruning threshold. Specifically, first, the accuracy rate reduction threshold θ of the model is definedaThe method is characterized in that the model compression is ensured to be carried out within the allowable range of the reduction of the model accuracy rate, the method is different from the existing main method, the accuracy rate is ensured to be always within the expected range of the method on the premise of pruning without reducing the accuracy rate, and meanwhile, the process of long-time retraining caused by the reduction of the accuracy rate of other methods is avoided. Then, the invention acquires all parameters W of the model parameters and calculates the maximum absolute value | W of the parameter weightmax| and minimum | WminAnd setting the size N of the threshold interval, and dividing the parameter weight between the maximum value and the minimum value at equal intervals to obtain the threshold interval value as follows:
after obtaining the threshold interval value, taking the threshold interval value as an interval unit, the invention obtains a plurality of test thresholds within the range of model parameters as follows:
Figure BDA0002187279740000043
and then testing the model accuracy corresponding to each test threshold. Within the accuracy reduction range, the given threshold value theta is not exceededaUnder the condition of (1), obtaining the optimal pruning threshold value V through grid searchthresholdAnd pruning all parameters with the weight smaller than the threshold in the model, wherein the threshold search formula is as follows:
Figure BDA0002187279740000042
wherein M is a model parameter mask corresponding to W, mask values of 0 or 1 respectively represent pruning or retaining of the parameters, W ⊙ MnRepresenting the value of the parameter after pruning, a (-) is a function of the model's accuracy under the given parameter.
In the invention, pruning optimization is carried out based on a threshold interval obtained in an interval threshold pruning process. Suppose that during the interval threshold, when the threshold is VnWhen the model obtains the maximum compression rate on the premise that the accuracy rate is reduced and does not exceed a given range, and when the threshold value is Vn+1And meanwhile, the accuracy of the model is reduced to exceed a given range, and the pruning requirement is not met. Therefore, the invention can obtain a rough pruning threshold interval of Vn,Vn+1) There may be a finer threshold within this interval so that the model can be further compressed. Therefore, the invention combines the thought of the binary search method and finds a better pruning threshold by continuously reducing the threshold interval. Here, the present invention sets an error value eps, that is, when the threshold interval value is smaller than the error value, it can be considered that a better pruning threshold has been found, and the left boundary of the interval is taken as the final pruning threshold. And pruning the neural network model through the threshold value, thereby realizing the further compression of the model.
In the invention, model retraining is to recover a certain accuracy, which is different from other methods that require a large number of retraining processes to recover the accuracy. Since parameters that initially appear to be less important may be important later in the model's updating, and if these parameters were clipped from the beginning, the damage to the model's effect may be unrecoverable, the parameters are updated continuously in the present invention by adding a parameter mask. In this process, the retraining can update the model parameters, and those parameters that have been clipped can also be recovered through the retraining process, which can be seen in fig. 2.
In the invention, sparse storage is a method for efficiently storing the pruned model. Because the parameters stored in the model are stored in a four-dimensional tensor form, in order to improve the storage efficiency, the four-dimensional tensor is converted into a two-dimensional matrix form, the two-dimensional matrix is still sparse at the moment, and then the sparse matrices are effectively stored by adopting row compression storage/column compression storage. The size of the stored model is far smaller than that of the original model, so that the effect of compressing the model is achieved.
In summary, the invention provides a model threshold value searching method based on grid search by balancing the relation between the accuracy and the compression ratio of the deep neural network model, and further optimizes the threshold value range on the basis, thereby achieving better model compression effect. The method can be well applied to some embedded devices with limited resources, and the application range of the deep neural network model is greatly expanded.
Namely, the beneficial technical effects of the invention are as follows:
1. a pruning method based on model accuracy is provided, and the size of the model is greatly reduced by balancing the relation between the model accuracy and the compression ratio under the condition of ensuring that the model accuracy is not reduced.
2. The automatic model threshold searching method based on grid search can adaptively search for a proper pruning threshold aiming at different models, and meanwhile, the method is further optimized by combining a binary search method, so that a better pruning effect is achieved.
3. The method can be combined with methods such as parameter sharing, quantification, low-rank decomposition and the like, the model is further compressed, and the compression effect is improved.
While the invention has been described with reference to specific embodiments, any feature disclosed in this specification may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise; all of the disclosed features, or all of the method or process steps, may be combined in any combination, except mutually exclusive features and/or steps.

Claims (3)

1. A deep neural network model compression method based on pruning threshold automatic search is characterized by comprising the following steps:
s1: carrying out model training on an original network model to be compressed to obtain an initial model for pruning;
s2: carrying out self-adaptive grid search on the model parameters to obtain a first pruning threshold;
s3: further reducing the threshold interval corresponding to the first pruning threshold by combining a binary search method, and searching for a more optimal threshold to obtain a second pruning threshold;
wherein the threshold interval corresponding to the first pruning threshold is [ V ]1 *,V1 *+σ),V1 *Representing a first pruning threshold, and sigma representing a search step length, namely an interval value, of the adaptive grid search;
s4: performing iterative pruning treatment on the initial model based on the second pruning threshold value:
s41: setting the weight of the parameter smaller than the second pruning threshold value to zero to obtain the current pruned network model;
s42: retraining the network model after the current pruning;
s43: judging whether the currently retrained network model has the parameter weight smaller than the second pruning threshold, if so, continuing to execute the step S41; otherwise, obtaining a network model after iterative pruning processing based on the network model after current retraining;
s5: and sparsely storing the network model subjected to iterative pruning.
2. The method of claim 1, wherein the step S2 includes the steps of:
s21: setting the accuracy drop threshold theta of the modela
S22: obtaining all parameter weights W of the model, and calculating the maximum absolute value | W of the parameter weightsmax| and minimum | Wmin|;
S23: setting the size N of the threshold interval, and dividing the absolute value of the parameter weight between the maximum value and the minimum value at equal intervals to obtain the value N of the threshold interval0
Figure FDA0002187279730000011
S24: with n0A plurality of test thresholds are derived for the interval values:
Figure FDA0002187279730000012
s25: testing the model accuracy corresponding to each test threshold, and when the accuracy drop range does not exceed a given threshold thetaaUnder the condition of (1), obtaining the optimal pruning threshold value V through grid searchthreshold
Wherein the content of the first and second substances,
Figure FDA0002187279730000013
wherein M represents a model parameter mask corresponding to the parameter weight W, mask values of 0 or 1 respectively represent pruning or retaining the parameter weight W, W ⊙ MnRepresenting the value of the parameter after pruning, a (-) is a function of the accuracy of the network model at the given parameter.
3. The method of claim 1, wherein the step S3 includes the steps of:
s31: a threshold value interval [ V ] corresponding to the first pruning threshold value1 *,V1 *Median of + σ) as initial temporary second pruning threshold
Figure FDA0002187279730000014
S32: based on the current temporary second pruning threshold
Figure FDA0002187279730000021
Pruning the initial model, judging whether the accuracy rate of the pruned network model is not more than a given threshold value, if so, executing the step S34; otherwise, executing step S33;
s33: judgment of
Figure FDA0002187279730000022
If the difference is larger than the preset binary error value, a new threshold value interval is used if the difference is larger than the preset binary error valueIs used as a new temporary second pruning threshold
Figure FDA0002187279730000024
And performs step S32;
if not, the device will
Figure FDA0002187279730000025
As the second pruning threshold obtained at the end;
s34: judgment of
Figure FDA0002187279730000026
If the difference is larger than the preset binary error value, a new threshold value interval is used if the difference is larger than the preset binary error value
Figure FDA0002187279730000027
Is used as a new temporary second pruning threshold
Figure FDA0002187279730000028
And performs step S32;
if not, the device will
Figure FDA0002187279730000029
As the resulting second pruning threshold.
CN201910820043.7A 2019-08-31 2019-08-31 Deep neural network model compression method based on pruning threshold automatic search Pending CN110674939A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910820043.7A CN110674939A (en) 2019-08-31 2019-08-31 Deep neural network model compression method based on pruning threshold automatic search

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910820043.7A CN110674939A (en) 2019-08-31 2019-08-31 Deep neural network model compression method based on pruning threshold automatic search

Publications (1)

Publication Number Publication Date
CN110674939A true CN110674939A (en) 2020-01-10

Family

ID=69076581

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910820043.7A Pending CN110674939A (en) 2019-08-31 2019-08-31 Deep neural network model compression method based on pruning threshold automatic search

Country Status (1)

Country Link
CN (1) CN110674939A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111310918A (en) * 2020-02-03 2020-06-19 腾讯科技(深圳)有限公司 Data processing method and device, computer equipment and storage medium
CN111382839A (en) * 2020-02-23 2020-07-07 华为技术有限公司 Method and device for pruning neural network
CN111444760A (en) * 2020-02-19 2020-07-24 天津大学 Traffic sign detection and identification method based on pruning and knowledge distillation
CN111814975A (en) * 2020-07-09 2020-10-23 广东工业大学 Pruning-based neural network model construction method and related device
CN112612602A (en) * 2020-12-11 2021-04-06 国网浙江省电力有限公司宁波供电公司 Automatic compression processing method for target detection network model
CN113128664A (en) * 2021-03-16 2021-07-16 广东电力信息科技有限公司 Neural network compression method, device, electronic equipment and storage medium
CN115271043A (en) * 2022-07-28 2022-11-01 小米汽车科技有限公司 Model tuning method, model tuning device and storage medium

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111310918A (en) * 2020-02-03 2020-06-19 腾讯科技(深圳)有限公司 Data processing method and device, computer equipment and storage medium
CN111310918B (en) * 2020-02-03 2023-07-14 腾讯科技(深圳)有限公司 Data processing method, device, computer equipment and storage medium
CN111444760B (en) * 2020-02-19 2022-09-09 天津大学 Traffic sign detection and identification method based on pruning and knowledge distillation
CN111444760A (en) * 2020-02-19 2020-07-24 天津大学 Traffic sign detection and identification method based on pruning and knowledge distillation
CN111382839A (en) * 2020-02-23 2020-07-07 华为技术有限公司 Method and device for pruning neural network
CN111382839B (en) * 2020-02-23 2024-05-07 华为技术有限公司 Method and device for pruning neural network
CN111814975A (en) * 2020-07-09 2020-10-23 广东工业大学 Pruning-based neural network model construction method and related device
CN111814975B (en) * 2020-07-09 2023-07-28 广东工业大学 Neural network model construction method and related device based on pruning
CN112612602B (en) * 2020-12-11 2023-12-01 国网浙江省电力有限公司宁波供电公司 Automatic compression processing method for target detection network model
CN112612602A (en) * 2020-12-11 2021-04-06 国网浙江省电力有限公司宁波供电公司 Automatic compression processing method for target detection network model
CN113128664A (en) * 2021-03-16 2021-07-16 广东电力信息科技有限公司 Neural network compression method, device, electronic equipment and storage medium
CN115271043A (en) * 2022-07-28 2022-11-01 小米汽车科技有限公司 Model tuning method, model tuning device and storage medium
CN115271043B (en) * 2022-07-28 2023-10-20 小米汽车科技有限公司 Model tuning method, device and storage medium

Similar Documents

Publication Publication Date Title
CN110674939A (en) Deep neural network model compression method based on pruning threshold automatic search
CN108764471B (en) Neural network cross-layer pruning method based on feature redundancy analysis
US11531889B2 (en) Weight data storage method and neural network processor based on the method
WO2022006919A1 (en) Activation fixed-point fitting-based method and system for post-training quantization of convolutional neural network
CN109002889B (en) Adaptive iterative convolution neural network model compression method
CN113610232B (en) Network model quantization method and device, computer equipment and storage medium
CN102567973B (en) Image denoising method based on improved shape self-adaptive window
CN110807529A (en) Training method, device, equipment and storage medium of machine learning model
CN110705708A (en) Compression method and device of convolutional neural network model and computer storage medium
WO2023098544A1 (en) Structured pruning method and apparatus based on local sparsity constraints
CN113657421B (en) Convolutional neural network compression method and device, and image classification method and device
CN108734264A (en) Deep neural network model compression method and device, storage medium, terminal
CN110598848A (en) Migration learning acceleration method based on channel pruning
CN111814448B (en) Pre-training language model quantization method and device
CN113241064A (en) Voice recognition method, voice recognition device, model training method, model training device, electronic equipment and storage medium
CN113963176B (en) Model distillation method and device, electronic equipment and storage medium
CN113128664A (en) Neural network compression method, device, electronic equipment and storage medium
CN112613604A (en) Neural network quantification method and device
CN115170902B (en) Training method of image processing model
CN111860770A (en) Model compression method and system integrating clipping and quantization
CN116384470A (en) Convolutional neural network model compression method and device combining quantization and pruning
CN115953651A (en) Model training method, device, equipment and medium based on cross-domain equipment
CN113762505B (en) Method for clustering pruning according to L2 norms of channels of convolutional neural network
CN113570037A (en) Neural network compression method and device
CN113887709A (en) Neural network adaptive quantization method, apparatus, device, medium, and product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination