CN113610227A - Efficient deep convolutional neural network pruning method - Google Patents

Efficient deep convolutional neural network pruning method Download PDF

Info

Publication number
CN113610227A
CN113610227A CN202110838976.6A CN202110838976A CN113610227A CN 113610227 A CN113610227 A CN 113610227A CN 202110838976 A CN202110838976 A CN 202110838976A CN 113610227 A CN113610227 A CN 113610227A
Authority
CN
China
Prior art keywords
neural network
convolutional neural
sub
network
scaling factor
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110838976.6A
Other languages
Chinese (zh)
Other versions
CN113610227B (en
Inventor
谢雪梅
石光明
杨建朋
汪振宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Provincial Laboratory Of Artificial Intelligence And Digital Economy Guangzhou
Guangzhou Institute of Technology of Xidian University
Original Assignee
Guangdong Provincial Laboratory Of Artificial Intelligence And Digital Economy Guangzhou
Guangzhou Institute of Technology of Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Provincial Laboratory Of Artificial Intelligence And Digital Economy Guangzhou, Guangzhou Institute of Technology of Xidian University filed Critical Guangdong Provincial Laboratory Of Artificial Intelligence And Digital Economy Guangzhou
Priority to CN202110838976.6A priority Critical patent/CN113610227B/en
Publication of CN113610227A publication Critical patent/CN113610227A/en
Application granted granted Critical
Publication of CN113610227B publication Critical patent/CN113610227B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/086Learning methods using evolutionary algorithms, e.g. genetic algorithms or genetic programming
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/12Computing arrangements based on biological models using genetic models
    • G06N3/126Evolutionary algorithms, e.g. genetic algorithms or genetic programming

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • General Physics & Mathematics (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Physiology (AREA)
  • Genetics & Genomics (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a high-efficiency deep convolutional neural network pruning method, which mainly solves the problem that the existing deep convolutional neural network consumes large storage resources and calculation resources, and the implementation scheme is as follows: optimizing a scaling factor by a sparse learning method based on an ADMM algorithm, and training a deep convolutional neural network to make the network structure sparse; searching the cutting rate suitable for each layer of the trained deep convolutional neural network by using a genetic algorithm, and automatically searching the optimal cutting rate meeting the requirement under the guidance of a fitness function; and (4) cutting each layer of the network after sparse learning training by using the optimal cutting rate to obtain the convolutional neural network with the optimal efficiency. The invention can greatly reduce the precision loss of the convolutional neural network after pruning, greatly reduces the consumption of the convolutional neural network on storage resources and calculation resources by reducing the parameter quantity of the network, and can be used for compressing the deep convolutional neural network.

Description

Efficient deep convolutional neural network pruning method
Technical Field
The invention belongs to the technical field of computers, and mainly relates to a high-efficiency pruning method for a deep convolutional neural network, which can be used for compressing the deep convolutional neural network.
Background
In recent years, a neural network technology has a good effect in scientific research and practical application, but compared with a traditional algorithm, a large amount of storage resources and calculation resources are consumed in the calculation process of the neural network, so that the power consumption and the cost are high when the neural network is deployed, and the use of the neural network on mobile terminal equipment with limited power consumption is limited. The neural network pruning is used as a method for compressing the neural network, and the storage consumption and the calculation consumption of the neural network are reduced by removing redundant components in the neural network, so that the aim of reducing the power consumption during the operation of the neural network is fulfilled.
Currently, neural network pruning methods are divided into two main categories, unstructured pruning and structured pruning.
The unstructured pruning method achieves the purpose of compressing the neural network by arbitrarily removing the non-important weight in the neural network, and although the method can obtain high compression rate, the original data structure of the neural network is damaged because the position of the removed weight is disordered, and special equipment is also needed to store the clipped weight parameters, so that the application of the method on general equipment is limited.
The structured pruning method achieves the purpose of compressing the neural network by removing the non-important channels or convolution kernels in the neural network, and the removal of the channels cannot damage the data structure of the neural network, so the method can be well applied to the existing computing equipment. However, the structured pruning method has inaccurate positioning of redundant components in the neural network, low compression ratio of the neural network and great influence on the performance of the neural network.
In addition, most of the neural network models after pruning are trained again by using the unstructured method and the structured method, which consumes a lot of time and has low efficiency.
Disclosure of Invention
The invention aims to provide an efficient deep convolution neural network pruning method aiming at the defects of the prior art, so that under the condition that a model after pruning is not trained any more, good compression rate and precision are ensured, and pruning efficiency is improved.
The technical idea of the invention is as follows: training of a convolutional neural network is restrained and regulated through a sparse learning method based on an ADMM algorithm, so that the network structure is sparse; carrying out heuristic search on the appropriate cutting rate of each layer of the trained convolutional neural network by utilizing a genetic algorithm, and automatically searching out the optimal cutting rate meeting the requirement under the guidance of a fitness function; and (4) cutting the network through the optimal cutting rate to obtain the optimal convolutional neural network. The method comprises the following implementation steps:
(1) training a deep convolutional neural network by using an ADMM-based sparse learning method:
(1a) introducing a scaling factor gamma to each channel of a deep convolutional neural networkl,i
(1b) Adding 0-norm regular terms of scaling factors in each channel to the loss function loss of the deep convolutional neural network to obtain a new loss function lossnew
(1c) Downloading training data set from public data website, training parameters except scaling factor in neural network by using the training data set and stochastic gradient descent algorithm, and using ADMM algorithm to apply scaling factor gammal,iOptimizing until the new loss function in the step (1b) converges to obtain a trained deep convolution neural network model;
(2) searching the optimal sub-clipping rate in the convolutional neural network after sparse learning training by using a genetic algorithm:
(2a) setting the maximum iteration times of the genetic algorithm, and calculating the total parameter B of the trained convolutional neural network model0And the calculated quantity D0
(2b) Initializing M groups of cutting rates, wherein each group comprises N different cutting rates, and N is equal to the number of network layers;
(2c) respectively coding each group of cutting rates into binary codes, and performing cross and variation operation on the binary codes to generate a plurality of new binary codes, wherein the sum of the number of the newly generated binary codes and the number of the original binary codes is P;
(2d) decoding the P binary codes in (2c) and decoding each binary code into a set of clipping rates Pi,j
(2e) Adjusting the network model according to the clipping rate generated in the step (2d), and selecting channels with smaller scaling factor values in each layer of the convolutional neural network to enable the ratio of the number of the selected channels in each layer of the convolutional neural network to the total number of the channels in the layer to be equal to the clipping rate;
(2f) deleting the selected channel in the step (2e) from the convolutional neural network to obtain P sub-networks, wherein each sub-network corresponds to one group of cutting rates;
(2g) calculating an accuracy a for each sub-network in (2f)0Parameter b0And the calculated quantity d0And utilize a0,b0And d0Calculating the fitness f of each sub-networki
(2h) Fitness f by each sub-networkiCalculate the probability q that each network is selectediScreening R sub-networks from the P sub-networks obtained in the step (2f) by a roulette wheel selection method to obtain the current cutting rate P corresponding to each sub-network in the step (2d)i,j,1≤R≤P;
(2i) Repeating the processes from (2b) to (2h), and when the iteration times reach the maximum iteration times set in (2a), finishing the search to obtain the optimal sub-cutting rate p of the convolutional neural network model corresponding to the optimal fitnessbest
(3) Selecting omega channels with smaller scale factor value in each layer of the convolutional neural network to ensure that the ratio of the number omega of the selected channels in each layer of the convolutional neural network to the total number beta of the channels in the layer
Figure BDA0003178241130000021
Equal to the optimal sub-clipping rate pbestTo complete the cutting of the convolution neural network model to obtain the best resultA good convolutional neural network.
The invention has the following advantages:
1. low loss of precision
The deep convolutional neural network is trained by using the sparse learning method based on the ADMM algorithm, the increase of redundant parameter values in the convolutional neural network is inhibited, the influence of parameters on network performance is reduced, the pruning rate is finely screened by using the genetic algorithm, the influence of network model pruning on network precision is further reduced, and the precision loss caused by the network model pruning is greatly reduced.
2. High pruning efficiency
According to the invention, by utilizing the sparse learning method based on the ADMM algorithm and the heuristic search method based on the genetic algorithm, parameter coarse screening and parameter fine screening are respectively carried out at the training and pruning stages of the convolutional neural network, so that the precision loss caused by pruning is greatly reduced, and the pruned network obtained by the invention can have almost the same performance as the original network without retraining, thereby saving the retraining time and improving the pruning efficiency.
3. Adjustable pruning
The invention utilizes the genetic algorithm to carry out heuristic search on the cutting rate suitable for the convolutional neural network, changes the search direction by changing the calculation mode of fitness, and further changes the final search result, so that the invention can adjust pruning according to the actual requirement.
Simulation results show that the deep convolutional neural network can be efficiently compressed and extremely small precision loss is generated. In the classification task based on CIFAR-10 and ImageNet data sets, the average compression rate of the parameters of the convolutional neural network reaches 68.0 percent, the average compression rate of the calculated amount reaches 63.4 percent, and the average compression amount of the channel number reaches 63.2 percent. And the precision is equal to or even surpasses that before compression.
Drawings
FIG. 1 is a flow chart of an implementation of the present invention;
FIG. 2 is a diagram of simulation results of the present invention.
Detailed Description
Embodiments and effects of the present invention will be described in further detail below with reference to the accompanying drawings.
Referring to fig. 1, the implementation steps of this example are as follows:
step 1, training a deep convolutional neural network by using an ADMM-based sparse learning method.
The deep convolutional neural network is an existing neural network comprising N convolutional layers, wherein the input of the l layer is represented as xlInputting x for each layerlPerforming convolution and normalization operations, the operation set being denoted as f (·), and the output of each channel of the deep convolutional neural network being represented as:
yl,i=f(xl,wl,i,bl,i),l=1,2,…N;i=1,2,...,n <1>
where l is the index of the number of neural network layers, i is the index of the channel, wl.iAnd bl,iRespectively, weight and bias set of channels, nlRepresenting the total number of channels at layer I in the deep convolutional neural network, f (-) representing the set of convolutional kernel normalization operations in the deep neural network, xlAnd the input of the l layer in the neural network is represented, the N represents the total number of layers of the deep convolutional neural network, and the numerical value of the deep convolutional neural network is adjusted according to different task requirements.
The step trains the deep convolutional neural network by using an ADMM-based sparse learning method, and the method is realized as follows
(1.1) introducing a scaling factor into each channel of the deep convolutional neural network to obtain the output of each channel:
yl,i=γl,i·f(xl,wl,i,bl,i),l=1,2,...N;i=1,2,...,n, <2>
wherein, γl,iA scaling factor representing the ith channel of the l layer in the deep convolutional neural network; if gamma isl,iIs 0, then the output y of the corresponding channell,iA value of 0 indicates that the corresponding channel is invalid and can be safely clipped.
(1.2) scaling factor γ according to (1.1)l,iDefinition of (2) thinning of deep convolutional neural network channelsSparseness xicSparsity ξ using a scaling factorsTo express that the problem of solving channel sparsity is converted into a scaling factor gammal,iThe constraint optimization problem of (2):
Figure BDA0003178241130000041
wherein E (-) denotes the loss function loss of the network and Γ denotes the scaling factor γl,iVector of the set, WlAnd BlRespectively, weight and bias set of the l-th layer, C is total number of channels, | Γ0Is 10Norm, representing the number of non-zero factors.
(1.3) adding a scaling factor γ for each channel of the deep convolutional neural network to the loss function E (-) in (1.2)l,iL of0Norm regularization term, which transforms the constrained optimization problem into the following scaling factor gammal,iThe unconstrained optimization problem of (2):
Figure BDA0003178241130000042
and obtaining a new loss function loss according to the unconstrained optimization expressionnew
Figure BDA0003178241130000043
Wherein λ is a hyper-parameter, λ > 0.
(1.4) in order to solve the unconstrained optimization problem in (1.3), adding auxiliary variables into the unconstrained optimization problem, transforming by using an augmented Lagrange multiplier method, and carrying out formulation, and converting a formula < 4 > into the following formula:
Figure BDA0003178241130000051
where ρ is a constant, U is a dual variable for the recipe, and Z is an auxiliary variable. L Z | L0Is 10Norm, representing the number of auxiliary variables.
(1.5) the equation (5) in (1.4) is decomposed into the following three subproblems:
(1.5.1) the first sub-problem is expressed as:
Figure BDA0003178241130000052
where k is the iteration index, Zk-1And Uk-1The sets of auxiliary variables and dual variables in the k-1 iteration, respectively, can be considered as constants.
(1.5.2) the second sub-problem is expressed as:
Figure BDA0003178241130000053
wherein, gamma iskIs the result of the first sub-problem over k iterations, Uk-1Is the dual variable for the (k-1) th iteration. Z, gammakAnd Uk-1All are vector representations, and the equation < 7 > can be converted to the following equation:
Figure BDA0003178241130000054
wherein z isj
Figure BDA0003178241130000055
And
Figure BDA0003178241130000056
are respectively the vectors Z, ΓkAnd Uk-1C represents the total number of channels of the neural network, whose value is equal to the vector Z, ΓkAnd Uk-1Length of (d);
(1.5.3) the third sub-problem is represented as:
Figure BDA0003178241130000057
wherein Z iskIs the result of the second sub-problem over k iterations.
(1.6) solving the formula in (1.4) by iteratively solving the three sub-problems in (1.5):
(1.6.1) solving the first subproblem expressed by the formula < 6 > in (1.5.1) to obtain the scaling factor gamma in the kth iterationl,iSet of (f)k
The first problem consists of two parts, the first part is the initial loss function loss of the convolutional neural network; the second portion is convex and differentiable. According to specific task requirements, downloading a training data set corresponding to a task from an open data website, training the problem by using the training data set and a random gradient descent algorithm until the problem is converged, and obtaining a scaling factor gamma in the (k + 1) th iterationl,iSet of (f)k
(1.6.2) solving the second subproblem represented by formula < 8 > in (1.5.2), the auxiliary variable z in formula < 8 >jTo obtain a set Z of auxiliary variables in the kth iterationk
If the auxiliary variable zjIf not, then the formula < 8 > is rewritten as:
Figure BDA0003178241130000061
zero for the perfect square term of the formula < 10 >, i.e.
Figure BDA0003178241130000062
The minimum value of the formula < 10 > is obtained as lambda,
Figure BDA0003178241130000063
is zjThe value at the kth iteration;
if the auxiliary variable zjIs 0, i.e.
Figure BDA0003178241130000064
Is 0, then | z in equation < 8 >j|0The term is 0, leaving only the fully squared term, which is the most significantSmall value of
Figure BDA0003178241130000065
By combining the two value-taking situations,
Figure BDA0003178241130000066
the values of (a) are as follows:
Figure BDA0003178241130000067
expressing the result of the second sub-problem as
Figure BDA0003178241130000068
ZkIs the set of auxiliary variables in the k-th iteration.
(1.6.3) solving a third subproblem represented by the formula < 9 > in (1.5.3) to obtain a set U of dual variables in the kth iterationk
Since the third sub-problem in (1.5.3) is a convex quadratic optimization problem, the result is its extreme value, which is expressed as follows:
Uk=Γk-Zk+Uk-1 <12>
wherein, Uk-1For the set of dual variables in the k-1 iteration, ΓkAnd ZkThe results of the first and second subproblems at k iterations, respectively;
(1.7) at the k +1 th iteration, the result U of the third sub-problem in (1.6.3) at the k-th iteration is processedkInputting to the first subproblem of the solution of (1.6.1);
(1.8) repeating the process from (1.6) to (1.7) until a new loss function loss in (1.3)newAnd (5) converging, finishing the training of the deep convolutional network by using an ADMM-based sparse learning method, and obtaining the trained convolutional neural network.
And 2, searching the optimal sub-cutting rate in the convolutional neural network after sparse learning training by utilizing a genetic algorithm.
(2.1) setting the maximum number of iterationsCalculating the parameter B of the trained neural network0And the calculated quantity D0
Figure BDA0003178241130000071
Figure BDA0003178241130000072
Where N represents the total number of layers of the neural network, NlRepresenting the total number of channels, k, at layer I in the neural networkwAnd khRespectively representing the width and length of a two-dimensional convolution kernel in a neural network channel,
Figure BDA0003178241130000073
and
Figure BDA0003178241130000074
respectively representing the width and the length of an output characteristic diagram of the first layer of the neural network;
(2.2) initializing M individuals as a population of genetic algorithms, each individual having a set of channel clipping rates expressed as:
si={pi,j|0≤pi,j<1,j=1,2,...,N},i=1,2,...,M, <15>
wherein s isiIs the i-th group cutting rate, pi,jRepresents siThe channel cutting rate of the j-th layer, M represents the group number of the cutting rates, each group of cutting rates comprises N different cutting rates, and N is equal to the number of network layers;
(2.3) generating a binary code for the M sets of clipping rates initialized in (2.2), having a value of either 0 or 1, as follows:
Figure BDA0003178241130000075
wherein, ciIs the ith code, and m is a hyper-parameter that determines the length of the binary code.
On the basis of binary codes represented by the expression < 16 >, a plurality of new binary codes are generated by carrying out cross and mutation operations on the binary codes, and the sum of the number of the newly generated binary codes and the number of the original binary codes is P. Wherein, crossover refers to randomly exchanging some bits between two codes, and mutation refers to changing some bits (0 to 1 or 1 to 0) in the codes;
(2.4) decoding all P codes into P group clipping rate by the inverse process for < 16 > in (2.3), expressed as follows:
Figure BDA0003178241130000076
wherein p isi,jRepresents the jth clipping rate of the ith group;
(2.5) according to the clipping rate generated in (2.4), adjusting the network model, selecting the channels with smaller scaling factor values in each layer of the convolutional neural network, and enabling the ratio of the number of the selected channels in each layer of the convolutional neural network to the total number of the channels in the layer to be equal to the clipping rate pi,j;
(2.6) deleting the selected channel in (2.5) from the neural network to obtain P sub-networks, wherein each sub-network corresponds to a group of cutting rates Pi,j
(2.7) for different tasks, downloading corresponding test data sets from the public data website, inputting the test sets into each sub-network in (2.6), and calculating the accuracy rate d according to the network output results and the labelsi
di=φ(xi,g)
Wherein d isiRepresenting the accuracy, x, of the ith sub-networkiThe output result of the ith sub-network is shown, and g is the label of the test set sample.
(2.8) calculating the parameter a of each sub-network according to the formula < 13 > and the formula < 14 > in (2.1)iAnd the calculated amount biAnd calculating the fitness f of the ith sub-networki
Figure BDA0003178241130000081
Wherein A is0And B0And D0The accuracy, the parameter quantity and the calculated quantity of the original network are respectively; theta is a hyperparameter with the balance formula < 18 > of each part, 1 is more than or equal to theta and less than or equal to 1, and epsilon represents the allowable deviation of precision.
(2.9) fitness f obtained in (2.8)iSumming to obtain a total fitness sum (f), and then calculating a weighted fitness q for each sub-networki
Figure BDA0003178241130000082
Where sum (f) is all sub-network fitness fiSum of (a), qiIs the weighted fitness of each sub-network, indicating the probability that each sub-network is selected.
(2.10) selecting a sub-network according to the probability q of each sub-network in (2.9)iSelecting R sub-networks by a wheel selection method to obtain the corresponding cutting rate p of each sub-network at (2.6)i,jAnd inputting the iteration into (2.2) for the next iteration, wherein R is more than or equal to 1 and less than or equal to P.
(2.11) repeating (2.2) to (2.10) until the search is finished after the maximum iteration times are reached, and obtaining the optimal sub-clipping rate p of the convolutional neural network modelbest
And step three, utilizing the optimal sub-clipping rate to clip the convolutional neural network after sparse learning training.
Selecting omega channels with smaller scale factor value in each layer of the convolutional neural network to ensure that the ratio of the number omega of the selected channels in each layer of the convolutional neural network to the total number beta of the channels in the layer
Figure BDA0003178241130000091
Equal to the optimal sub-clipping rate pbestAnd deleting the selected omega channels to finish cutting the convolutional neural network model to obtain the optimal convolutional neural network, so that the calculation time consumption is longer for providing subsequent deep learning tasks and practical scene applicationShort convolutional neural network model occupying less storage resources.
The effects of the present invention are further illustrated by the following simulations.
1. Simulation experiment conditions are as follows:
the operating system of the simulation experiment is Ubuntu18.04 LTS, the deep learning software platform is Pythrch, and the GPU adopts Yingweida TITAN Xp;
the data set used in the simulation experiment of the invention comprises: CIFAR-10, ImageNet;
the neural network related to the simulation experiment of the invention comprises: vgg-16, ResNet-50, ResNet-56, ResNet-110, GoogleNet, DenseNet-40.
2. Simulation experiment contents:
simulation 1, simulation experiment for convolutional neural network training:
the CIFAR-10 data set is used for training the classification neural network Vgg-16 by respectively using a traditional training mode and the ADMM-based sparse learning training mode in the invention, and the result is shown in FIG. 2. Wherein, fig. 2(a) is a neural network feature diagram obtained by using a conventional training method; fig. 2(b) is a neural network feature diagram obtained by the ADMM-based sparse learning training method of the present invention. In fig. 2, black cells represent invalid features and white texture cells represent valid features.
As can be seen from FIG. 2, the invalid characteristic diagram obtained by the ADMM-based sparse learning training mode of the invention is obviously more than the neural network characteristic diagram obtained by the traditional training mode, which shows that the sparse learning effect of the invention on the convolutional neural network is obvious.
Simulation 2, pruning simulation experiment of the convolutional neural network:
the method is utilized to train convolutional neural networks Vgg-16, ResNet-56, ResNet-110, GoogleNet and DenseNet-40 on a training set of a CIFAR-10 data set and complete pruning;
the convolutional neural network ResNet-50 is trained and pruned on the training set of the ImageNet data set by the method.
Classifying data from the test set of the CIFAR-10 dataset using the pruned convolutional neural network Vgg-16, ResNet-56 ResNet-110, GoogleNet, DenseNet-40; the data from the test set of the ImageNet dataset was classified using the pruned convolutional neural network ResNet-50, resulting in the effect of the pruned convolutional neural network on classifying the data from both test sets, as shown in table 1.
TABLE 1 Classification Effect of the pruned convolutional neural network on test sets
Figure BDA0003178241130000101
The unit of count 'M' for the quantities in the table indicates 106A number of floating point numbers, the count unit 'B' of the calculated amount representing 109A secondary floating-point operation. The original network represents the convolutional neural network which is not pruned, and the pruned network represents the convolutional neural network which is pruned by the method.
As can be seen from Table 1, the maximum compression ratio of the invention to the parameters of the convolutional neural network is 87.8%, the minimum compression ratio is 43.0%, and the average compression ratio is 68.0%; the highest compression ratio of the calculated amount of the convolutional neural network is 83.6 percent, the lowest compression ratio is 47.9 percent, and the average compression ratio is 63.4 percent; the highest compression ratio for the convolutional neural network channel was 81.6%, the lowest compression ratio was 41.4%, and the average compression ratio was 63.2%.
The results in table 1 show that the convolutional neural network can be effectively compressed, the error rate of the convolutional neural network obtained after pruning is basically consistent with that of the original network, and the convolutional neural network can be effectively compressed and the generated precision loss is small. The compression of the convolutional neural network by the method greatly reduces the storage consumption and the calculation consumption of the convolutional neural network, thereby improving various subsequent use effects.

Claims (5)

1. An efficient deep convolutional neural network pruning method is characterized by comprising the following steps:
(1) training a deep convolutional neural network by using an ADMM-based sparse learning method:
(1a) introducing a scaling factor gamma to each channel of a deep convolutional neural networkl,i
(1b) Adding 0-norm regular terms of scaling factors in each channel to the loss function loss of the deep convolutional neural network to obtain a new loss function lossnew
(1c) Downloading training data set from public data website, training parameters except scaling factor in neural network by using the training data set and stochastic gradient descent algorithm, and using ADMM algorithm to apply scaling factor gammal,iOptimizing until the new loss function in the step (1b) converges to obtain a trained deep convolution neural network model;
(2) searching the optimal sub-clipping rate of the deep convolutional neural network model by using a genetic algorithm:
(2a) setting the maximum iteration times of the genetic algorithm, and calculating the total parameter B of the trained convolutional neural network model0And the calculated quantity D0
(2b) Initializing M groups of cutting rates, wherein each group comprises N different cutting rates, and N is equal to the number of network layers;
(2c) respectively coding each group of cutting rates into binary codes, and performing cross and variation operation on the binary codes to generate a plurality of new binary codes, wherein the sum of the number of the newly generated binary codes and the number of the original binary codes is P;
(2d) decoding the P binary codes in (2c) and decoding each binary code into a set of clipping rates Pi,j
(2e) Adjusting the network model according to the clipping rate generated in the step (2d), and selecting channels with smaller scaling factor values in each layer of the convolutional neural network to enable the ratio of the number of the selected channels in each layer of the convolutional neural network to the total number of the channels in the layer to be equal to the clipping rate;
(2f) deleting the selected channel in the step (2e) from the convolutional neural network to obtain P sub-networks, wherein each sub-network corresponds to one group of cutting rates;
(2g) calculating an accuracy a for each sub-network in (2f)0Parameter b0And the calculated quantity d0And utilize a0,b0And d0ComputingFitness f of each egress subnetworki
(2h) Fitness f by each sub-networkiCalculate the probability q that each network is selectediScreening R sub-networks from the P sub-networks obtained in the step (2f) by a roulette wheel selection method to obtain the current cutting rate P corresponding to each sub-network in the step (2d)i,j,1≤R≤P;
(2i) Repeating the processes from (2b) to (2h), and when the iteration times reach the maximum iteration times set in (2a), finishing the search to obtain the optimal sub-cutting rate p of the convolutional neural network model corresponding to the optimal fitnessbest
(3) Selecting omega channels with smaller scale factor value in each layer of the convolutional neural network to ensure that the ratio of the number omega of the selected channels in each layer of the convolutional neural network to the total number beta of the channels in the layer
Figure FDA0003178241120000021
And the optimal sub-clipping rate is equal to the optimal sub-clipping rate, so that the clipping of the convolutional neural network model is completed, and the optimal sub-network of the convolutional neural network is obtained.
2. The method of claim 1, wherein: (1b) loss function loss ofnewThe following configuration is implemented:
adding scaling factor gamma of each channel of deep convolution neural network to original loss function E (-) of neural networkl,iL of0Norm regularization term, which transforms the constrained optimization problem into the following scaling factor gammal,iThe unconstrained optimization problem of (2):
Figure FDA0003178241120000022
and obtaining a new loss function loss according to the unconstrained optimization expressionnew
Figure FDA0003178241120000023
Wherein, λ is a hyper-parameter, λ > 0, E (-) represents the loss function loss of the original network, and Γ represents the scaling factor γl,iVector of the set, WlAnd BlRespectively, the weight and bias set of the network layer/, | Γ0Is 10Norm, representing the number of non-zero factors.
3. The method of claim 1, wherein: (1c) the method for optimizing the scaling factor by using the ADMM algorithm is implemented as follows:
(1c1) in order to solve the unconstrained optimization problem in the formula (1b) < 1 >, auxiliary variables are added into the unconstrained optimization problem, an augmented Lagrange multiplier method is used for transformation and formulation, and the formula < 1 > is converted into the following formula:
Figure FDA0003178241120000024
where Γ represents the scaling factor γl,iThe vector of the set, ρ is a constant, U is a dual variable for the recipe, and Z is an auxiliary variable. L Z | L0Is 10A norm representing the number of auxiliary variables;
(1c2) and decomposing the formula < 3 > in the (1c1) into three subproblems and solving the subproblems in an iterative manner.
(1c3) The first sub-question is represented as:
Figure FDA0003178241120000031
where k is the iteration index, Zk-1And Uk-1The sets of the auxiliary variable and the dual variable in the k-1 iteration are respectively regarded as constants;
(1c4) solving the first sub-problem represented by the formula < 4 > to obtain the scaling factor gamma in the kth iterationl,iSet of (f)k
Downloading training numbers for corresponding tasks from public data web sitesTraining the problem by using the training data set and the stochastic gradient descent algorithm until the problem converges to obtain a scaling factor gamma in the kth iterationl,iSet of (f)k
(1c5) The second sub-question is represented as:
Figure FDA0003178241120000032
wherein, gamma iskIs the result of the first sub-problem over k iterations, Uk-1Is the dual variable for the (k-1) th iteration. Z, gammakAnd Uk -1All are vector representations, and the formula < 5 > can be converted to the following formula:
Figure FDA0003178241120000033
wherein z isj
Figure FDA0003178241120000034
And
Figure FDA0003178241120000035
are respectively the vectors Z, ΓkAnd Uk-1C represents the total number of channels of the neural network, whose value is equal to the vector Z, ΓkAnd Uk-1Length of (d);
(1c6) auxiliary variable z according to the formula < 6 >jTo obtain a set Z of auxiliary variables in the kth iterationk
If the auxiliary variable zjIf not, then rewrite the formula < 6 > as:
Figure FDA0003178241120000036
zero for the perfect square term of equation < 7 >, i.e.
Figure FDA0003178241120000037
The minimum value of the formula < 10 > is obtained as lambda,
Figure FDA0003178241120000038
is zjThe value at the kth iteration;
if the auxiliary variable zjIs 0, i.e.
Figure FDA0003178241120000041
Is 0, then | z in equation < 6 >j|0The term is 0, leaving only the fully squared term to have a minimum value of
Figure FDA0003178241120000042
The two value conditions are integrated to obtain
Figure FDA0003178241120000043
The values of (a) are as follows:
Figure FDA0003178241120000044
the result of the second subproblem with the formula < 5 > is expressed as
Figure FDA0003178241120000045
ZkIs an auxiliary variable set in the k iteration;
(1c7) the third sub-problem is represented as:
Figure FDA0003178241120000046
wherein Z iskIs the result of the second sub-problem over k iterations. This subproblem is a convex quadratic optimization problem, the result of which is its extremum, expressed as follows:
Uk=Γk-Zk+Uk-1, <10>
wherein, Uk-1For the set of dual variables in the k-1 iteration, ΓkAnd ZkThe results of the first and second subproblems at k iterations, respectively;
(1c8) at the k +1 th iteration, the result U of the third subproblem in (1c7) in the k-th iteration is calculatedkInput into the first sub-question in (1c 3);
(1c9) repeating the process from (1c3) to (1c8) until the new loss function loss in (1b)newAnd converging, and finishing the optimization of the scaling factor by using an ADMM-based sparse learning method to obtain an optimized scaling factor set gamma.
4. The method of claim 1, wherein: (2b) the clipping rate is initialized as follows:
initializing M individuals as a population of genetic algorithms, each individual representing a set of channel clipping rates, each set comprising N different clipping rates:
si={pi,j|0≤pi,j<1,j=1,2,...,N},i=1,2,...,M, <11>
wherein s isiIs the ith individual, pi,jRepresents the s thiChannel clipping rate of the j-th layer. M denotes the total number of individuals and N denotes the number of layers of the network.
5. The method of claim 1, wherein: (2g) fitness f of a computational operator networkiThe formula is as follows:
Figure FDA0003178241120000051
wherein, ai,biAnd diThe accuracy, parameters and computational load of each sub-network are respectively represented. A. the0And B0And D0Are these parameters of the original network. Theta is a hyper-parameter of each part of the balance formula, 1 is more than or equal to theta less than or equal to 1, and epsilon represents the allowance of precisionAnd (4) deviation.
CN202110838976.6A 2021-07-23 2021-07-23 Deep convolutional neural network pruning method for image classification Active CN113610227B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110838976.6A CN113610227B (en) 2021-07-23 2021-07-23 Deep convolutional neural network pruning method for image classification

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110838976.6A CN113610227B (en) 2021-07-23 2021-07-23 Deep convolutional neural network pruning method for image classification

Publications (2)

Publication Number Publication Date
CN113610227A true CN113610227A (en) 2021-11-05
CN113610227B CN113610227B (en) 2023-11-21

Family

ID=78338225

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110838976.6A Active CN113610227B (en) 2021-07-23 2021-07-23 Deep convolutional neural network pruning method for image classification

Country Status (1)

Country Link
CN (1) CN113610227B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113935485A (en) * 2021-12-15 2022-01-14 江苏游隼微电子有限公司 Convolutional neural network clipping method based on adjacent layer weight
CN114330644A (en) * 2021-12-06 2022-04-12 华中光电技术研究所(中国船舶重工集团公司第七一七研究所) Neural network model compression method based on structure search and channel pruning
CN114781604A (en) * 2022-04-13 2022-07-22 广州安凯微电子股份有限公司 Coding method of neural network weight parameter, coder and neural network processor

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109242142A (en) * 2018-07-25 2019-01-18 浙江工业大学 A kind of spatio-temporal segmentation parameter optimization method towards infrastructure networks
CN109683161A (en) * 2018-12-20 2019-04-26 南京航空航天大学 A method of the inverse synthetic aperture radar imaging based on depth ADMM network
CN110276450A (en) * 2019-06-25 2019-09-24 交叉信息核心技术研究院(西安)有限公司 Deep neural network structural sparse system and method based on more granularities
CN110874631A (en) * 2020-01-20 2020-03-10 浙江大学 Convolutional neural network pruning method based on feature map sparsification
CN111105035A (en) * 2019-12-24 2020-05-05 西安电子科技大学 Neural network pruning method based on combination of sparse learning and genetic algorithm
CN111368699A (en) * 2020-02-28 2020-07-03 交叉信息核心技术研究院(西安)有限公司 Convolutional neural network pruning method based on patterns and pattern perception accelerator
US20210192352A1 (en) * 2019-12-19 2021-06-24 Northeastern University Computer-implemented methods and systems for compressing deep neural network models using alternating direction method of multipliers (admm)

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109242142A (en) * 2018-07-25 2019-01-18 浙江工业大学 A kind of spatio-temporal segmentation parameter optimization method towards infrastructure networks
CN109683161A (en) * 2018-12-20 2019-04-26 南京航空航天大学 A method of the inverse synthetic aperture radar imaging based on depth ADMM network
CN110276450A (en) * 2019-06-25 2019-09-24 交叉信息核心技术研究院(西安)有限公司 Deep neural network structural sparse system and method based on more granularities
US20210192352A1 (en) * 2019-12-19 2021-06-24 Northeastern University Computer-implemented methods and systems for compressing deep neural network models using alternating direction method of multipliers (admm)
CN111105035A (en) * 2019-12-24 2020-05-05 西安电子科技大学 Neural network pruning method based on combination of sparse learning and genetic algorithm
CN110874631A (en) * 2020-01-20 2020-03-10 浙江大学 Convolutional neural network pruning method based on feature map sparsification
CN111368699A (en) * 2020-02-28 2020-07-03 交叉信息核心技术研究院(西安)有限公司 Convolutional neural network pruning method based on patterns and pattern perception accelerator

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
AO REN.ET AL: "ADMM-NN: An Algorithm-Hardware Co-Design Framework of DNNs Using Alternating Direction Method of Multipliers", 《ARXIV》 *
ZHENYU WANG.ET AL: "Network pruning using sparse learning and genetic algorithm", 《NEUROCOMPUTING》, vol. 404 *
赖叶静等: "深度神经网络模型压缩方法与进展", 《华东师范大学学报》(自然科学版), no. 05 *
高 晗等: "深度学习模型压缩与加速综述", 《软件学报》, vol. 32, no. 01 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114330644A (en) * 2021-12-06 2022-04-12 华中光电技术研究所(中国船舶重工集团公司第七一七研究所) Neural network model compression method based on structure search and channel pruning
CN113935485A (en) * 2021-12-15 2022-01-14 江苏游隼微电子有限公司 Convolutional neural network clipping method based on adjacent layer weight
CN113935485B (en) * 2021-12-15 2022-03-04 江苏游隼微电子有限公司 Convolutional neural network clipping method based on adjacent layer weight
CN114781604A (en) * 2022-04-13 2022-07-22 广州安凯微电子股份有限公司 Coding method of neural network weight parameter, coder and neural network processor
CN114781604B (en) * 2022-04-13 2024-02-20 广州安凯微电子股份有限公司 Coding method of neural network weight parameters, coder and neural network processor

Also Published As

Publication number Publication date
CN113610227B (en) 2023-11-21

Similar Documents

Publication Publication Date Title
Liu et al. Frequency-domain dynamic pruning for convolutional neural networks
CN113610227A (en) Efficient deep convolutional neural network pruning method
CN111461322B (en) Deep neural network model compression method
Yang et al. Automatic neural network compression by sparsity-quantization joint learning: A constrained optimization-based approach
CN109002889B (en) Adaptive iterative convolution neural network model compression method
CN111860982A (en) Wind power plant short-term wind power prediction method based on VMD-FCM-GRU
CN109445935B (en) Self-adaptive configuration method of high-performance big data analysis system in cloud computing environment
CN111079781A (en) Lightweight convolutional neural network image identification method based on low rank and sparse decomposition
CN111105035A (en) Neural network pruning method based on combination of sparse learning and genetic algorithm
Chen et al. A statistical framework for low-bitwidth training of deep neural networks
CN105718943A (en) Character selection method based on particle swarm optimization algorithm
CN113657421B (en) Convolutional neural network compression method and device, and image classification method and device
CN113343427B (en) Structural topology configuration prediction method based on convolutional neural network
CN111832839B (en) Energy consumption prediction method based on sufficient incremental learning
CN112949610A (en) Improved Elman neural network prediction method based on noise reduction algorithm
Shin et al. Prediction confidence based low complexity gradient computation for accelerating DNN training
CN110263917B (en) Neural network compression method and device
Sambharya et al. End-to-end learning to warm-start for real-time quadratic optimization
CN112036651A (en) Electricity price prediction method based on quantum immune optimization BP neural network algorithm
Zhang et al. Akecp: Adaptive knowledge extraction from feature maps for fast and efficient channel pruning
Yang et al. Non-uniform dnn structured subnets sampling for dynamic inference
CN117035837B (en) Method for predicting electricity purchasing demand of power consumer and customizing retail contract
CN116663745A (en) LSTM drainage basin water flow prediction method based on PCA_DWT
Chen et al. DNN gradient lossless compression: Can GenNorm be the answer?
CN116757255A (en) Method for improving weight reduction of mobile NetV2 distracted driving behavior detection model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant