CN110619385A - Structured network model compression acceleration method based on multi-stage pruning - Google Patents

Structured network model compression acceleration method based on multi-stage pruning Download PDF

Info

Publication number
CN110619385A
CN110619385A CN201910820048.XA CN201910820048A CN110619385A CN 110619385 A CN110619385 A CN 110619385A CN 201910820048 A CN201910820048 A CN 201910820048A CN 110619385 A CN110619385 A CN 110619385A
Authority
CN
China
Prior art keywords
pruning
network model
filter
layer
sensitivity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910820048.XA
Other languages
Chinese (zh)
Other versions
CN110619385B (en
Inventor
刘欣刚
吴立帅
钟鲁豪
韩硕
王文涵
代成
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN201910820048.XA priority Critical patent/CN110619385B/en
Publication of CN110619385A publication Critical patent/CN110619385A/en
Application granted granted Critical
Publication of CN110619385B publication Critical patent/CN110619385B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a structured network model compression acceleration method based on multi-stage pruning, and belongs to the technical field of model compression acceleration. The invention comprises the following steps: obtaining a pre-training model, and training to obtain an initial complete network model; measuring sensitivity of the convolutional layers, and obtaining a sensitivity-pruning rate curve of each convolutional layer through controlling variables; performing single-layer pruning according to the sensitivity sequence from low to high, and finely adjusting and retraining the network model; selecting a sample as a verification set, and measuring the information entropy of the output characteristic diagram of the filter; performing iterative flexible pruning according to the magnitude sequence of the output entropy, and finely adjusting a retraining network model; and (4) hard pruning, retraining the network model to restore the network performance, and obtaining and storing the lightweight model. The invention can compress the large-scale convolutional neural network on the premise of keeping the original network performance, can reduce the local memory occupation of the network, reduces the floating point operation and the video memory occupation during the operation, and realizes the light weight of the network.

Description

Structured network model compression acceleration method based on multi-stage pruning
Technical Field
The invention relates to the technical field of model compression and acceleration, in particular to a structured network model compression and acceleration method based on multi-stage pruning.
Background
The deep convolutional neural network is widely applied to the related fields of computer vision, natural language processing and the like, has achieved great success, and as people pay more attention to the convolutional neural network, more and more layers and more complex-structured networks appear like bamboo shoots in spring after rain, the deep convolutional neural network is applied to more and more research fields, and higher requirements are provided for the development of hardware equipment.
With the rapid development of deep learning, the hardware condition is not improved as rapidly, and the development of the convolutional neural network depends on the improvement of the computing power of the computer equipment and the increase of the storage space nowadays, in particular the improvement of the parallel computing power of an image processor. The operation of the neural network is very difficult on the mobile embedded device because a large amount of storage space is consumed for the operation of the neural network, and huge floating point operation is generated. Taking a classic VGG-16 network as an example, a 224 × 224 color picture is recognized, only the number of parameters of the original network reaches 1 hundred million and 3 million, which occupies more than 520MB of storage space, and the middle feature map will occupy nearly 13MB of storage space when performing forward propagation once, and perform more than 309 million floating point operations. The application of the convolutional neural network to the embedded device is severely restricted by huge cost.
Many researches in recent years show that the neural network actually has huge redundant parameters, namely over-parameterization, and has huge optimization space in actual deployment, thereby proving the practical feasibility of model compression. Model pruning is widely researched as a high-efficiency and strong-universality model compression method, but the compression effect realized by the conventional pruning method is very limited, the actual compression storage and reduction operation cannot be obtained by the pruning algorithm at a plurality of parameter levels, and the parameter reduction and the actual network acceleration are difficult to be considered by the pruning algorithm at a plurality of filter levels. Therefore, it is important to design an efficient compression algorithm for the structured network model.
Disclosure of Invention
The invention aims to: aiming at the existing problems, a more efficient model compression acceleration method with stronger domain adaptability is provided.
The invention relates to a structured network model compression accelerating method based on multi-stage pruning, which comprises the following steps:
s1: acquiring a pre-training model, and training an original network model to be processed on a training data set to obtain a complete network model;
s2: measuring the sensitivity of the convolutional layers of the original network model based on a pre-training model, and obtaining a sensitivity-pruning rate change curve of each convolutional layer by a control variable method;
s3: carrying out sensitivity interlayer iteration pruning, carrying out single-layer pruning on the current network model according to the sensitivity sequence from low to high, and finely adjusting the network model;
s4: measuring the importance index of the filter, selecting a sample as a verification set, and measuring the information entropy of the filter output characteristic diagram of the current network model, namely the output image entropy;
s5: performing iterative flexible pruning on the current network model according to the magnitude sequence of the entropy of the output images, and finely adjusting a retraining model;
s6: and (4) hard pruning, and performing retraining on the current network model to obtain and store the lightweight model.
Wherein, step S1 includes the following steps:
s11: initializing original network parameters of a network model to be processed;
s12: and pre-training on the training set to obtain a complete network model.
Wherein, step S2 includes the following steps:
s21: setting a maximum pruning rate range and a pruning rate increase step;
s22: performing layer-by-layer sensitivity calculation on the convolutional layer by using a control variable method to obtain a sensitivity coefficient S of the ith convolutional layeri
Si≡Acc(L,0)-Acc(L,-i),1≤i≤L
Acc (L,0) represents the recognition rate of the original network model on the test data set, Acc (L, -i) represents the recognition rate on the test data set after the filter of the ith convolution layer is deleted according to a certain ratio and the non-ith convolution layer is kept unchanged;
s23: and establishing a corresponding relation between the sensitivity sequence of each convolutional layer under the current set pruning rate and the pruning rate to obtain a sensitivity-pruning rate change curve of the convolutional layer.
Wherein, step S3 includes the following steps:
s31: calculating the F norm W of each filter of each convolution layeri,j||F
Wherein, wi,jJ-th filter, w, representing the ith convolutional layeri,j(c,k1,k2) (k) th representing a two-dimensional parameter matrix on the c-th channel in the jth filter1,k2) A parameter value;
s32: performing single-layer hard pruning according to a sensitivity order, permanently deleting the filters determined to be deleted from the current network model, deleting the failed output characteristic channels according to the corresponding relation between the filters and the output characteristic channels, and then executing the same operation on the next convolution layer until all the convolution layers are traversed;
s33: and (4) loading the residual network parameters to the model after pruning, and carrying out fine tuning retraining on the training data set.
Wherein, step S4 includes the following steps:
s41: randomly sampling a training data set to construct a verification set;
s42: for the remaining filters (filters of the current network model), one forward propagation on the validation set, the output image entropy of each filter is calculated:
wherein Ei,jEntropy of the output image of the jth filter for the ith convolutional layer, pk,lRepresents a pixel pair with a central pixel of k and a neighborhood pixel of l in the feature map, Hi,j[s][t]And (d) a parameter value representing a (s, t) position in an output characteristic diagram of the jth filter of the ith convolutional layer.
S43: and carrying out logarithmic normalization analysis on the output image entropy, and establishing the corresponding relation between the output image entropy of each filter under the current pruning rate and the pruning rate.
Wherein, step S5 includes the following steps:
s51: according to the entropy sorting of the output images of the filters, the pruning priority order of the filters between single layers is determined: the smaller the entropy of the output image of the filter is, the higher the pruning priority of the filter is;
performing flexible pruning layer by layer on the current network model, temporarily zeroing a filter to be deleted, temporarily zeroing a failed output characteristic channel according to the corresponding relation between the filter and the output characteristic channel, and then executing the same operation on the next convolution layer;
s52: and (4) loading the residual network parameters of the network with sparse filter stages, and performing fine tuning retraining on a training data set.
Wherein, step S6 includes the following steps:
s61: acquiring the sparsity of each filter of the filter-level sparse network;
s62: and sequencing the sparsity of each filter of the current network model, and deleting the filters with corresponding ratios according to the target pruning rate.
S63: and loading the rest network parameters, performing retraining to improve the network performance, and storing the structure and the parameters of the final lightweight network model.
In order to evaluate the overall performance of the compression acceleration algorithm of the structured model based on the multi-level pruning, the floating point operation times generated when the original network and the pruned network are subjected to forward propagation can be counted to evaluate the acceleration effect of the network, and the network parameter statistics can be performed on the new network structure to evaluate the compression effect of the network.
In summary, due to the adoption of the technical scheme, the invention has the beneficial effects that:
the deep convolutional neural network model is compressed by a multi-level structured pruning method, the limitation of the existing neural network on embedded edge equipment is considered, the original network is improved by adopting a filter pruning method, on the basis of keeping the performance of the original network, the storage space occupied by network parameters is reduced to the maximum extent, the video memory occupied by a middle activation layer during the operation of the network is reduced, the floating point operation times in the forward propagation process are reduced, the operation efficiency of the network is improved, and the aim of lightening the network is fulfilled. The invention can effectively reduce the parameter redundancy of large-scale deep convolution, expand the application scene on the neural network edge equipment and reduce the hardware dependence.
Drawings
Fig. 1 is a flowchart of an iterative pruning method according to an embodiment of the present invention.
FIG. 2 is a diagram illustrating first-order sensitivity hard pruning according to an embodiment of the present invention.
Fig. 3 is a schematic diagram of flexible pruning of entropy of a two-level image according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the following embodiments and accompanying drawings.
Referring to fig. 1, the method for compressing and accelerating the structured network model based on the multi-level pruning, provided by the invention, comprises the following specific implementation steps:
s1: acquiring a pre-training model, and training an original network model to be processed on a training data set to obtain a complete network model;
s2: measuring the sensitivity of the convolutional layers of the original network model based on a pre-training model, and obtaining a sensitivity-pruning rate change curve of each convolutional layer by a control variable method;
s3: carrying out sensitivity interlayer iteration pruning, carrying out single-layer pruning on the current network model according to the sensitivity sequence from low to high, and finely adjusting the network model;
s4: measuring the importance index of the filter, selecting a sample as a verification set, and measuring the information entropy of the filter output characteristic diagram of the current network model, namely the output image entropy;
s5: performing iterative flexible pruning on the current network model according to the magnitude sequence of the entropy of the output images, and finely adjusting a retraining model;
s6: and (4) hard pruning, and performing retraining on the current network model to obtain and store the lightweight model.
In the invention, the existing layer-by-layer sequential pruning algorithm from the first convolutional layer to the last convolutional layer is improved, the sensitivity of each convolutional layer pruning is evaluated, and the influence of each filter on the network performance is considered in a global sense because the convolutional neural network is an overall mutually-associated system. The method is characterized in that a single convolutional layer gradually deletes a single filter on the basis of an original network, other variables are controlled to be kept unchanged, and the corresponding identification rate reduction degrees of different convolutional layers on the same data set after pruning are compared, so that a filter importance index (sensitivity) is defined by taking the identification rate reduction degrees as a reference. And (3) carrying out sensitivity analysis on the convolutional layers one by adopting a control variable method, and defining the sensitivity of the ith convolutional layer as follows:
Si≡Acc(L,0)-Acc(L,-i),1≤i≤L
wherein SiFor sensitivity coefficient of ith convolutional layer, Acc (L,0) is recognition rate of original model, Acc (L, -i) is recognition rate on test data set after deleting ith convolutional layer filter according to a certain ratio for keeping other layers unchanged.
Because the invention follows a greedy pruning strategy, after a certain filter is pruned, the corresponding characteristic channels of the output characteristic diagram corresponding to the convolutional layer can be lost, and the loss of the characteristic channels of the layer can cause the corresponding channels of all the filters of the next convolutional layer to fail, so that the parameters of the failure can be ignored when the importance index evaluation of the filter is carried out, therefore, the pruning process always accompanies a large amount of cross pruning, and the finally obtained pruning rate can be greater than the preset pruning rate.
Thus, the higher the sensitivity of a particular convolution layer, the greater the degradation, indicating that the higher the sensitivity of the filter, the more important it is. In the initial iteration process, the pruning rate selected by the pruning rate is larger, and the filter pruning with low sensitivity is preferably selected, because the damage to the network performance after the low-sensitivity convolutional layers are pruned is smaller, the loss of the network performance in the initial iteration stage can be quickly recovered, which is equivalent to the pruning of the high-sensitivity convolutional layers on a model equivalent to the original network, and the optimal selection is based on the consideration of the network performance and the iteration pruning rate. And then after the network scale is gradually reduced, the pruning rate is reduced to carry out pruning with slight increase, a filter with higher sensitivity is pruned, and the performance is restored by iterative training for a longer time. The first-stage pruning at this stage adopts a hard pruning strategy, because the purpose of the first stage is to quickly approach the pruning rate of the target, and filters with larger redundancy are quickly deleted in a relatively extensive manner.
Fig. 2 shows a first-order sensitivity hard pruning diagram, and the process is divided into two stages: and performing hard pruning and fine-tuning retraining models according to the sensitivity sequence of the convolutional layer. And determining a filter to be deleted according to sensitivity analysis, deleting the filter and the corresponding convolutional layer channel, and entering the next iteration pruning process if the pruning rate of the current overall network does not reach the initial target pruning rate.
Then, fine pruning is performed at a higher pruning rate, and the filters with weaker functionality are slowly deleted, so that the pruning rate is improved. Fig. 3 shows a schematic diagram of entropy flexible pruning of a secondary image, and secondary fine pruning is performed after primary pruning reaches a bottleneck of increasing pruning rate.
The invention provides a more accurate filter importance measurement index, namely the entropy information quantity of the output two-dimensional image. The traditional filter importance index quantification standard is often measured by the nuclear norm of a filter or the sparsity of a filter output characteristic diagram, and most of the ideas consider the influence degree of the filter on a loss function from a mathematical point of view and do not pay much attention to the essential function of the filter. The invention is inspired by the entropy of image information in the traditional digital image processing theory, and considers the information quantity of the characteristic diagram extracted by each filter, which is directly related to the essential function of the filter. The larger the entropy value obtained, the better the filter is to act as a feature selector, i.e. the lower the priority the filter is pruned during pruning. Defining the importance index of the filter as the two-dimensional image entropy (containing time information and space information among pixels) of the output feature map, and determining the pruning order of each layer of the filter according to the normalized entropy. The obtained output entropy histogram of each convolution layer filter is compared with the F norm and the characteristic sparsity distribution of the filter, so that the filter has better distinguishability.
The output image entropy for each filter is defined as follows:
wherein Ei,jEntropy of the output image of the jth filter for the ith convolutional layer, pk,lRepresents a pixel pair with a central pixel of k and a neighborhood pixel of l in the feature map, Hi,j[s][t]And (d) a parameter value representing a (s, t) position in an output characteristic diagram of the jth filter of the ith convolutional layer.
Finally, in the present invention, the relevant performance indicators include accelerated analysis of the new network structure:
wherein, Flops represents the total floating point operation times in the original network, including floating point addition operation and floating point multiplication operation. K represents the size of the convolution kernel, Ni,Wi,HiThe number of channels representing the intermediate characteristic map (filtering of the previous convolution layer)Number of devices), length and width of the feature map, subscript to distinguish different convolutional layers;
the calculation mode of the total floating point operation times in the pruned lightweight network is as follows:
wherein Flops represents the total floating-point operation times in the pruned lightweight network, including floating-point addition operation and floating-point multiplication operation. PiRepresenting the pruning rate of the final filter stage of the ith convolutional layer. In actual operation, the measurement formula is also suitable for a full connection layer, and only K is required to be 1;
performing parameter level compression ratio analysis on the new network structure:
wherein P represents the compression rate of the parameter level after pruning, and the numerator denominator item respectively represents the number of each convolution layer filter before and after pruning.
The invention is subjected to feasibility tests on three widely applied convolutional neural networks (LeNet-5, AlexNet and VGG-16), and experimental results show that the multi-level structured pruning scheme provided by the invention can effectively realize model compression of the original network and can keep the performance of the original network. Under the condition that the identification rate of a test data set is basically unchanged, the multilevel structured pruning method can achieve more than 60% of pruning rate and 5.6 times of floating point operation acceleration on a LeNet-5 network; 94% of pruning rate and 117.5 times of floating point operation acceleration can be achieved on an AlexNet network, and the parameter compression rate of a convolutional layer can reach 192.3 times; the pruning rate of 78.6 percent can be achieved on the VGG-16 network, 54.5 percent of floating point operation can be reduced, and the video memory occupation is 31.9 percent. Therefore, the model compression method of the invention can quickly recover the original performance of the network. The method effectively reduces the storage space and floating point operation for a common large network, and reduces the dependence on hardware conditions.
While the invention has been described with reference to specific embodiments, any feature disclosed in this specification may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise; all of the disclosed features, or all of the method or process steps, may be combined in any combination, except mutually exclusive features and/or steps.

Claims (6)

1. The structured network model compression acceleration method based on multi-stage pruning is characterized by comprising the following steps:
s1: obtaining a pre-training model:
training an original network model to be processed on a training data set to obtain a complete network model;
s2: measuring the sensitivity of the convolutional layers of the original network model based on a pre-training model, and obtaining a sensitivity-pruning rate change curve of each convolutional layer by a control variable method;
s3: sensitivity interlayer iterative pruning:
carrying out single-layer pruning on the current network model according to the sensitivity sequence from low to high, and finely adjusting the network model;
s4: measurement filter importance index:
selecting a sample as a verification set, and measuring the information entropy of a filter output characteristic diagram of the current network model, namely the output image entropy;
s5: iterative pruning of image entropy:
performing iterative flexible pruning on the current network model according to the magnitude sequence of the entropy of the output images, and finely adjusting a retraining model;
s6: and (4) hard pruning, and performing retraining on the current network model to obtain and store the lightweight model.
2. The method for compressing and accelerating the structured network model based on multi-level pruning according to claim 1, wherein the step S1 comprises the following steps:
s11: initializing original network parameters of a network model to be processed;
s12: and pre-training on the training set to obtain a complete network model.
3. The method for compressing and accelerating the structured network model based on multi-level pruning according to claim 1, wherein the step S2 comprises the following steps:
s21: setting a maximum pruning rate range and a pruning rate increase step;
s22: performing layer-by-layer sensitivity calculation on the convolutional layer by using a control variable method to obtain a sensitivity coefficient S of the ith convolutional layeri
Si≡Acc(L,0)-Acc(L,-i),1≤i≤L
Acc (L,0) represents the recognition rate of the original network model on the test data set, Acc (L, -i) represents the recognition rate on the test data set after the filter of the ith convolution layer is deleted according to a certain ratio and the non-ith convolution layer is kept unchanged;
s23: and establishing a corresponding relation between the sensitivity sequence of each convolutional layer under the current set pruning rate and the pruning rate to obtain a sensitivity-pruning rate change curve of the convolutional layer.
4. The method for compressing and accelerating the structured network model based on multi-level pruning according to claim 1, wherein the step S3 comprises the following steps:
s31: calculating the F norm W of each filter of each convolution layeri,j||F
Wherein, wi,jJ-th filter, w, representing the ith convolutional layeri,j(c,k1,k2) (k) th representing a two-dimensional parameter matrix on the c-th channel in the jth filter1,k2) A parameter value;
s32: performing single-layer hard pruning according to a sensitivity order, permanently deleting the filters determined to be deleted from the current network model, deleting the failed output characteristic channels according to the corresponding relation between the filters and the output characteristic channels, and then executing the same operation on the next convolution layer until all the convolution layers are traversed;
s33: and (4) loading the residual network parameters to the model after pruning, and carrying out fine tuning retraining on the training data set.
5. The method for compressing and accelerating the structured network model based on multi-level pruning according to claim 1, wherein the step S4 comprises the following steps:
s41: randomly sampling a training data set to construct a verification set;
s42: for the filters of the current network model, carrying out forward propagation on the verification set once, and calculating the output image entropy of each filter:
wherein Ei,jEntropy of the output image of the jth filter for the ith convolutional layer, pk,lRepresents a pixel pair with a central pixel of k and a neighborhood pixel of l in the feature map, Hi,j[s][t]And (d) a parameter value representing a (s, t) position in an output characteristic diagram of the jth filter of the ith convolutional layer.
S43: and carrying out logarithmic normalization analysis on the output image entropy, and establishing the corresponding relation between the output image entropy of each filter under the current pruning rate and the pruning rate.
6. The method for compressing and accelerating the structured network model based on multi-level pruning according to claim 1, wherein the step S5 comprises the following steps:
s51: according to the entropy sorting of the output images of the filters, the pruning priority order of the filters between single layers is determined: the smaller the entropy of the output image of the filter is, the higher the pruning priority of the filter is;
performing flexible pruning layer by layer on the current network model, temporarily zeroing a filter to be deleted, temporarily zeroing a failed output characteristic channel according to the corresponding relation between the filter and the output characteristic channel, and then executing the same operation on the next convolution layer;
s52: and (4) loading the residual network parameters of the network with sparse filter stages, and performing fine tuning retraining on a training data set.
CN201910820048.XA 2019-08-31 2019-08-31 Structured network model compression acceleration method based on multi-stage pruning Expired - Fee Related CN110619385B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910820048.XA CN110619385B (en) 2019-08-31 2019-08-31 Structured network model compression acceleration method based on multi-stage pruning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910820048.XA CN110619385B (en) 2019-08-31 2019-08-31 Structured network model compression acceleration method based on multi-stage pruning

Publications (2)

Publication Number Publication Date
CN110619385A true CN110619385A (en) 2019-12-27
CN110619385B CN110619385B (en) 2022-07-29

Family

ID=68922910

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910820048.XA Expired - Fee Related CN110619385B (en) 2019-08-31 2019-08-31 Structured network model compression acceleration method based on multi-stage pruning

Country Status (1)

Country Link
CN (1) CN110619385B (en)

Cited By (29)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111079691A (en) * 2019-12-27 2020-04-28 中国科学院重庆绿色智能技术研究院 Pruning method based on double-flow network
CN111340225A (en) * 2020-02-28 2020-06-26 中云智慧(北京)科技有限公司 Deep convolution neural network model compression and acceleration method
CN111367657A (en) * 2020-02-21 2020-07-03 重庆邮电大学 Computing resource collaborative cooperation method based on deep reinforcement learning
CN111382581A (en) * 2020-01-21 2020-07-07 沈阳雅译网络技术有限公司 One-time pruning compression method in machine translation
CN111401516A (en) * 2020-02-21 2020-07-10 华为技术有限公司 Neural network channel parameter searching method and related equipment
CN111507224A (en) * 2020-04-09 2020-08-07 河海大学常州校区 CNN facial expression recognition significance analysis method based on network pruning
CN111563455A (en) * 2020-05-08 2020-08-21 南昌工程学院 Damage identification method based on time series signal and compressed convolution neural network
CN111667054A (en) * 2020-06-05 2020-09-15 北京百度网讯科技有限公司 Method and device for generating neural network model, electronic equipment and storage medium
CN111881828A (en) * 2020-07-28 2020-11-03 浙江大学 Obstacle detection method for mobile terminal equipment
CN112101547A (en) * 2020-09-14 2020-12-18 中国科学院上海微系统与信息技术研究所 Pruning method and device for network model, electronic equipment and storage medium
CN112183725A (en) * 2020-09-27 2021-01-05 安徽寒武纪信息科技有限公司 Method of providing neural network, computing device, and computer-readable storage medium
CN112464810A (en) * 2020-11-25 2021-03-09 创新奇智(合肥)科技有限公司 Smoking behavior detection method and device based on attention map
CN112488297A (en) * 2020-12-03 2021-03-12 深圳信息职业技术学院 Neural network pruning method, model generation method and device
CN112508187A (en) * 2020-10-22 2021-03-16 联想(北京)有限公司 Machine learning model compression method, device and equipment
CN112561054A (en) * 2020-12-03 2021-03-26 中国科学院光电技术研究所 Neural network filter pruning method based on batch characteristic heat map
CN112733925A (en) * 2021-01-04 2021-04-30 国网山东省电力公司枣庄供电公司 Method and system for constructing light image classification network based on FPCC-GAN
CN112734036A (en) * 2021-01-14 2021-04-30 西安电子科技大学 Target detection method based on pruning convolutional neural network
CN112766452A (en) * 2021-01-05 2021-05-07 同济大学 Dual-environment particle swarm optimization method and system
CN112884149A (en) * 2021-03-19 2021-06-01 华南理工大学 Deep neural network pruning method and system based on random sensitivity ST-SM
CN112927173A (en) * 2021-04-12 2021-06-08 平安科技(深圳)有限公司 Model compression method and device, computing equipment and storage medium
CN113011588A (en) * 2021-04-21 2021-06-22 华侨大学 Pruning method, device, equipment and medium for convolutional neural network
CN113128664A (en) * 2021-03-16 2021-07-16 广东电力信息科技有限公司 Neural network compression method, device, electronic equipment and storage medium
CN113837284A (en) * 2021-09-26 2021-12-24 天津大学 Double-branch filter pruning method based on deep learning
CN113837381A (en) * 2021-09-18 2021-12-24 杭州海康威视数字技术股份有限公司 Network pruning method, device, equipment and medium for deep neural network model
CN114049514A (en) * 2021-10-24 2022-02-15 西北工业大学 Image classification network compression method based on parameter reinitialization
CN114998648A (en) * 2022-05-16 2022-09-02 电子科技大学 Performance prediction compression method based on gradient architecture search
CN115099400A (en) * 2022-03-14 2022-09-23 北京石油化工学院 Poisson distribution-based neural network hybrid differential pruning method and device
WO2023030513A1 (en) * 2021-09-05 2023-03-09 汉熵通信有限公司 Internet of things system
CN118228842A (en) * 2024-05-22 2024-06-21 北京灵汐科技有限公司 Data processing method, data processing device, electronic equipment and storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180046915A1 (en) * 2016-08-12 2018-02-15 Beijing Deephi Intelligence Technology Co., Ltd. Compression of deep neural networks with proper use of mask
CN109657780A (en) * 2018-06-15 2019-04-19 清华大学 A kind of model compression method based on beta pruning sequence Active Learning
CN109711528A (en) * 2017-10-26 2019-05-03 北京深鉴智能科技有限公司 Based on characteristic pattern variation to the method for convolutional neural networks beta pruning
CN109711532A (en) * 2018-12-06 2019-05-03 东南大学 A kind of accelerated method inferred for hardware realization rarefaction convolutional neural networks
CN109886397A (en) * 2019-03-21 2019-06-14 西安交通大学 A kind of neural network structure beta pruning compression optimization method for convolutional layer
CN110059823A (en) * 2019-04-28 2019-07-26 中国科学技术大学 Deep neural network model compression method and device
CN110097178A (en) * 2019-05-15 2019-08-06 电科瑞达(成都)科技有限公司 It is a kind of paid attention to based on entropy neural network model compression and accelerated method
CN110119811A (en) * 2019-05-15 2019-08-13 电科瑞达(成都)科技有限公司 A kind of convolution kernel method of cutting out based on entropy significance criteria model

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180046915A1 (en) * 2016-08-12 2018-02-15 Beijing Deephi Intelligence Technology Co., Ltd. Compression of deep neural networks with proper use of mask
CN109711528A (en) * 2017-10-26 2019-05-03 北京深鉴智能科技有限公司 Based on characteristic pattern variation to the method for convolutional neural networks beta pruning
CN109657780A (en) * 2018-06-15 2019-04-19 清华大学 A kind of model compression method based on beta pruning sequence Active Learning
CN109711532A (en) * 2018-12-06 2019-05-03 东南大学 A kind of accelerated method inferred for hardware realization rarefaction convolutional neural networks
CN109886397A (en) * 2019-03-21 2019-06-14 西安交通大学 A kind of neural network structure beta pruning compression optimization method for convolutional layer
CN110059823A (en) * 2019-04-28 2019-07-26 中国科学技术大学 Deep neural network model compression method and device
CN110097178A (en) * 2019-05-15 2019-08-06 电科瑞达(成都)科技有限公司 It is a kind of paid attention to based on entropy neural network model compression and accelerated method
CN110119811A (en) * 2019-05-15 2019-08-13 电科瑞达(成都)科技有限公司 A kind of convolution kernel method of cutting out based on entropy significance criteria model

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
JIAN-HAO LUO ET AL: "ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression", 《ARXIV》 *
彭冬亮等: "基于GoogLeNet模型的剪枝算法", 《控制与决策》 *
马治楠等: "基于深层卷积神经网络的剪枝优化", 《电子技术应用》 *

Cited By (44)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111079691A (en) * 2019-12-27 2020-04-28 中国科学院重庆绿色智能技术研究院 Pruning method based on double-flow network
CN111382581A (en) * 2020-01-21 2020-07-07 沈阳雅译网络技术有限公司 One-time pruning compression method in machine translation
CN111401516B (en) * 2020-02-21 2024-04-26 华为云计算技术有限公司 Searching method for neural network channel parameters and related equipment
CN111367657A (en) * 2020-02-21 2020-07-03 重庆邮电大学 Computing resource collaborative cooperation method based on deep reinforcement learning
CN111401516A (en) * 2020-02-21 2020-07-10 华为技术有限公司 Neural network channel parameter searching method and related equipment
CN111340225A (en) * 2020-02-28 2020-06-26 中云智慧(北京)科技有限公司 Deep convolution neural network model compression and acceleration method
CN111507224B (en) * 2020-04-09 2022-08-30 河海大学常州校区 CNN facial expression recognition significance analysis method based on network pruning
CN111507224A (en) * 2020-04-09 2020-08-07 河海大学常州校区 CNN facial expression recognition significance analysis method based on network pruning
CN111563455A (en) * 2020-05-08 2020-08-21 南昌工程学院 Damage identification method based on time series signal and compressed convolution neural network
CN111667054A (en) * 2020-06-05 2020-09-15 北京百度网讯科技有限公司 Method and device for generating neural network model, electronic equipment and storage medium
CN111667054B (en) * 2020-06-05 2023-09-01 北京百度网讯科技有限公司 Method, device, electronic equipment and storage medium for generating neural network model
CN111881828A (en) * 2020-07-28 2020-11-03 浙江大学 Obstacle detection method for mobile terminal equipment
CN111881828B (en) * 2020-07-28 2022-05-06 浙江大学 Obstacle detection method for mobile terminal equipment
CN112101547B (en) * 2020-09-14 2024-04-16 中国科学院上海微系统与信息技术研究所 Pruning method and device for network model, electronic equipment and storage medium
CN112101547A (en) * 2020-09-14 2020-12-18 中国科学院上海微系统与信息技术研究所 Pruning method and device for network model, electronic equipment and storage medium
CN112183725A (en) * 2020-09-27 2021-01-05 安徽寒武纪信息科技有限公司 Method of providing neural network, computing device, and computer-readable storage medium
CN112183725B (en) * 2020-09-27 2023-01-17 安徽寒武纪信息科技有限公司 Method of providing neural network, computing device, and computer-readable storage medium
CN112508187A (en) * 2020-10-22 2021-03-16 联想(北京)有限公司 Machine learning model compression method, device and equipment
CN112464810A (en) * 2020-11-25 2021-03-09 创新奇智(合肥)科技有限公司 Smoking behavior detection method and device based on attention map
CN112561054A (en) * 2020-12-03 2021-03-26 中国科学院光电技术研究所 Neural network filter pruning method based on batch characteristic heat map
CN112488297A (en) * 2020-12-03 2021-03-12 深圳信息职业技术学院 Neural network pruning method, model generation method and device
CN112488297B (en) * 2020-12-03 2023-10-13 深圳信息职业技术学院 Neural network pruning method, model generation method and device
CN112733925A (en) * 2021-01-04 2021-04-30 国网山东省电力公司枣庄供电公司 Method and system for constructing light image classification network based on FPCC-GAN
CN112766452A (en) * 2021-01-05 2021-05-07 同济大学 Dual-environment particle swarm optimization method and system
CN112734036B (en) * 2021-01-14 2023-06-02 西安电子科技大学 Target detection method based on pruning convolutional neural network
CN112734036A (en) * 2021-01-14 2021-04-30 西安电子科技大学 Target detection method based on pruning convolutional neural network
CN113128664A (en) * 2021-03-16 2021-07-16 广东电力信息科技有限公司 Neural network compression method, device, electronic equipment and storage medium
CN112884149A (en) * 2021-03-19 2021-06-01 华南理工大学 Deep neural network pruning method and system based on random sensitivity ST-SM
CN112884149B (en) * 2021-03-19 2024-03-22 华南理工大学 Random sensitivity ST-SM-based deep neural network pruning method and system
WO2022217704A1 (en) * 2021-04-12 2022-10-20 平安科技(深圳)有限公司 Model compression method and apparatus, computing device and storage medium
CN112927173A (en) * 2021-04-12 2021-06-08 平安科技(深圳)有限公司 Model compression method and device, computing equipment and storage medium
CN113011588B (en) * 2021-04-21 2023-05-30 华侨大学 Pruning method, device, equipment and medium of convolutional neural network
CN113011588A (en) * 2021-04-21 2021-06-22 华侨大学 Pruning method, device, equipment and medium for convolutional neural network
WO2023030513A1 (en) * 2021-09-05 2023-03-09 汉熵通信有限公司 Internet of things system
CN113837381A (en) * 2021-09-18 2021-12-24 杭州海康威视数字技术股份有限公司 Network pruning method, device, equipment and medium for deep neural network model
CN113837381B (en) * 2021-09-18 2024-01-05 杭州海康威视数字技术股份有限公司 Network pruning method, device, equipment and medium of deep neural network model
CN113837284A (en) * 2021-09-26 2021-12-24 天津大学 Double-branch filter pruning method based on deep learning
CN113837284B (en) * 2021-09-26 2023-09-15 天津大学 Double-branch filter pruning method based on deep learning
CN114049514A (en) * 2021-10-24 2022-02-15 西北工业大学 Image classification network compression method based on parameter reinitialization
CN114049514B (en) * 2021-10-24 2024-03-19 西北工业大学 Image classification network compression method based on parameter reinitialization
CN115099400A (en) * 2022-03-14 2022-09-23 北京石油化工学院 Poisson distribution-based neural network hybrid differential pruning method and device
CN115099400B (en) * 2022-03-14 2024-08-06 北京石油化工学院 Neural network hybrid differential pruning method and pruning device based on poisson distribution
CN114998648A (en) * 2022-05-16 2022-09-02 电子科技大学 Performance prediction compression method based on gradient architecture search
CN118228842A (en) * 2024-05-22 2024-06-21 北京灵汐科技有限公司 Data processing method, data processing device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN110619385B (en) 2022-07-29

Similar Documents

Publication Publication Date Title
CN110619385B (en) Structured network model compression acceleration method based on multi-stage pruning
CN112016674B (en) Knowledge distillation-based convolutional neural network quantification method
CN114037844B (en) Global rank perception neural network model compression method based on filter feature map
CN110909667B (en) Lightweight design method for multi-angle SAR target recognition network
CN109308696B (en) No-reference image quality evaluation method based on hierarchical feature fusion network
CN108960314B (en) Training method and device based on difficult samples and electronic equipment
CN113159173A (en) Convolutional neural network model compression method combining pruning and knowledge distillation
CN112052951B (en) Pruning neural network method, system, equipment and readable storage medium
CN111079899A (en) Neural network model compression method, system, device and medium
CN112668630B (en) Lightweight image classification method, system and equipment based on model pruning
CN113420651A (en) Lightweight method and system of deep convolutional neural network and target detection method
CN112101487B (en) Compression method and device for fine-grained recognition model
CN113255910A (en) Pruning method and device for convolutional neural network, electronic equipment and storage medium
CN113837940A (en) Image super-resolution reconstruction method and system based on dense residual error network
CN110059823A (en) Deep neural network model compression method and device
Huang et al. Compressing multidimensional weather and climate data into neural networks
CN114819061A (en) Sparse SAR target classification method and device based on transfer learning
CN111401140B (en) Offline learning method of intelligent video monitoring system in edge computing environment
CN116453096A (en) Image foreign matter detection method, device, electronic equipment and storage medium
CN114972753A (en) Lightweight semantic segmentation method and system based on context information aggregation and assisted learning
CN114065831A (en) Hyperspectral image classification method based on multi-scale random depth residual error network
CN117421657A (en) Sampling and learning method and system for noisy labels based on oversampling strategy
CN115564043B (en) Image classification model pruning method and device, electronic equipment and storage medium
CN112613604A (en) Neural network quantification method and device
CN115905546B (en) Graph convolution network literature identification device and method based on resistive random access memory

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20220729

CF01 Termination of patent right due to non-payment of annual fee