CN109472352A - A kind of deep neural network model method of cutting out based on characteristic pattern statistical nature - Google Patents
A kind of deep neural network model method of cutting out based on characteristic pattern statistical nature Download PDFInfo
- Publication number
- CN109472352A CN109472352A CN201811440153.2A CN201811440153A CN109472352A CN 109472352 A CN109472352 A CN 109472352A CN 201811440153 A CN201811440153 A CN 201811440153A CN 109472352 A CN109472352 A CN 109472352A
- Authority
- CN
- China
- Prior art keywords
- layer
- feature
- characteristic
- batch
- neural network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 31
- 238000003062 neural network model Methods 0.000 title claims abstract description 17
- 238000013528 artificial neural network Methods 0.000 claims abstract description 24
- 238000010606 normalization Methods 0.000 claims abstract description 5
- 239000011159 matrix material Substances 0.000 claims description 28
- 230000006835 compression Effects 0.000 claims description 16
- 238000007906 compression Methods 0.000 claims description 16
- 238000010586 diagram Methods 0.000 claims description 12
- 238000004364 calculation method Methods 0.000 claims description 10
- 238000011156 evaluation Methods 0.000 claims description 9
- 230000004913 activation Effects 0.000 claims description 5
- 238000012545 processing Methods 0.000 claims description 4
- 230000001133 acceleration Effects 0.000 claims description 3
- 238000005457 optimization Methods 0.000 claims 1
- 239000011248 coating agent Substances 0.000 abstract 2
- 238000000576 coating method Methods 0.000 abstract 2
- 239000010410 layer Substances 0.000 description 117
- 230000000694 effects Effects 0.000 description 7
- 238000001994 activation Methods 0.000 description 3
- 238000013461 design Methods 0.000 description 3
- 238000001914 filtration Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 230000009467 reduction Effects 0.000 description 3
- 238000013473 artificial intelligence Methods 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- 229910010888 LiIn Inorganic materials 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000019771 cognition Effects 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 210000002569 neuron Anatomy 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
- 230000002688 persistence Effects 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 238000013139 quantization Methods 0.000 description 1
- 239000002356 single layer Substances 0.000 description 1
- 238000013179 statistical model Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/082—Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Mathematical Optimization (AREA)
- Life Sciences & Earth Sciences (AREA)
- Molecular Biology (AREA)
- Pure & Applied Mathematics (AREA)
- Mathematical Analysis (AREA)
- Computational Mathematics (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Evolutionary Computation (AREA)
- Algebra (AREA)
- Databases & Information Systems (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a kind of deep neural network model method of cutting out based on characteristic pattern statistical nature, step is realized are as follows: step 1, for the characteristic layer in deep neural network model calculates the statistical nature of the corresponding characteristic pattern of its each output channel;Wherein characteristic layer is made of convolutional layer and active coating two parts, or is made of convolutional layer, normalization layer and active coating three parts;Step 2, according to the statistical nature of the corresponding characteristic pattern of output channel each in characteristic layer, calculate the judging quota of each output channel in characteristic layer;Step 3, according to the importance of each output channel in judging quota judging characteristic layer, unessential output channel and its corresponding parameter are removed.The present invention can be effectively reduced the dimension of neural network characteristics layer, improve the operational efficiency of network model, while can reduce network size, and the influence generated to precision is smaller.
Description
Technical Field
The invention belongs to the field of artificial intelligence and pattern recognition, and particularly relates to deep neural network model compression.
Background
The deep learning (deep learning) has remarkable achievements in solving the problem of high-level abstract cognition, so that a new step is formed in artificial intelligence, and a technical basis is provided for high-precision and multi-type target detection, identification and tracking. However, due to complex operation and huge resource requirements, the neural network can only be deployed on a high-performance computing platform, and the application of the neural network on mobile equipment is limited. In 2015, Deep Compression published by Han applies network model cutting, weight sharing and quantization, coding and other modes to model Compression, so that model storage achieves a good effect, and researches on network Compression methods by researchers are also initiated. The research of the current deep learning model compression method can be mainly divided into the following directions:
(1) the design of a finer model, the use of a finer and more efficient model design, the reduction of the model size to a great extent, and good performance.
(2) Model cutting, the network with complex structure has very good performance, and the parameters of the network have redundancy, so that for the trained model network, an effective judgment means is usually found to judge the importance of the parameters, and unimportant connections or convolution kernels are cut to reduce the redundancy of the model.
(3) The thinning of the kernel induces the updating of the weight in the training process, so that the kernel is more sparse, and for a sparse matrix, a more compact storage mode can be used, but the sparse matrix is used for operating on a hardware platform, so that the operation efficiency is not high, and the sparse matrix is easily influenced by the bandwidth, and therefore the speed is not obvious.
The method for clipping the pre-trained network model is the most used method in the current model compression, usually an effective judgment index is searched for to judge the importance of neurons or characteristic diagrams, and unimportant connections or convolution kernels are clipped to reduce the redundancy of the model. Li proposes a clipping mode based on magnitude, and uses the sum of the absolute values of the weights to judge the importance of the weights, and uses the sum of the absolute values of all the weights in the convolution kernel as the evaluation index of the convolution kernel. Hu defines a variable apoz (average persistence of zeros) to measure the number of values activated to 0 in each convolution kernel as a criterion for evaluating whether a convolution kernel is important. Luo proposes a clipping mode based on entropy, and uses entropy to determine the importance of a convolution kernel. Anwar adopts a random clipping mode, and then determines the locally optimal clipping mode according to the performance of each random statistical model. The LDA analysis of Tian finds that for each class, there are many convolution kernels with highly uncorrelated activations, so this can be exploited to cull a large number of filters with only a small amount of information without affecting the performance of the model.
In summary, the limitations of the existing solutions are as follows:
a. the core sparsification method only considers the compression storage of the network, the compression effect in the operation process is not obvious, and the speed is not obviously improved;
b. the magnitude of the weight is used as a judgment index, only the numerical characteristic of the weight is considered, the data characteristic of a network layer is not considered, and the compression effect is not high;
c. the complexity of the calculation of the evaluation index is higher, and more calculation capacity is consumed;
d. the random cutting method has strong randomness and is easy to damage the parameter characteristics of the network;
therefore, it is necessary to provide a method for realizing deep neural network compression and acceleration, which has a simple calculation method, can fully consider redundancy in the network, has strong applicability, and does not depend on a special acceleration library.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention discloses a deep neural network model cutting method based on feature map statistics. Compared with other compression methods, the method has the advantages that various statistical characteristics of the neural network characteristic layer are used as judgment standards, the parameter layer of the network is cut, the numerical characteristics and the statistical characteristics of the network are fully considered, good compression efficiency is obtained, and meanwhile the running speed is improved.
The technical scheme adopted by the invention is as follows:
a deep neural network model cutting method based on feature map statistical features comprises the following steps:
step 1, calculating statistical characteristics of characteristic graphs corresponding to all output channels of a characteristic layer in a deep neural network model; wherein the characteristic layer is composed of a convolution layer and an active layer, or composed of a convolution layer, a normalization layer (BatchNorm layer) and an active layer; the invention only calculates the characteristic layer of the layer (characteristic layer/full connection layer) with the storage parameter at the back;
step 2, calculating the judgment indexes of each output channel in the characteristic layer according to the statistical characteristics of the characteristic graph corresponding to each output channel in the characteristic layer;
and 3, judging the importance of each output channel in the characteristic layer according to the judgment indexes, and removing the unimportant output channels and the corresponding parameters thereof.
In an iteration step (epoch) of the deep neural network, samples are sent to the neural network in different batches for calculation. In the step 2, feature statistics is carried out on feature graphs corresponding to output channels of feature layers in the network model in batches; for the ith feature layer, the statistical features of the feature maps corresponding to all the output channels of the ith feature layer comprise a mean vectorAnd a standard deviation vector S _ viThe calculation steps are as follows:
s11: initializing; setting NsumCounting the number of processed samples, and initializing to 0; n is a radical ofbatchAs the number of batches, NbatchCeil (total number of samples/N), N being the number of samples in a batch, ceil (·) being an upward rounding function; n isbatchIs the count of the current batch, initialized to 1; initializing a mean vectorAnd the standard deviation vector S _ viIs 1 XCiA zero vector of (2), wherein CiThe number of output channels of the ith feature layer;
s12: will n bebatchOutput (features maps) X of ith feature layer corresponding to each batch of samplesiExpressed as one size of NxCi×Hi×WiOf the four-dimensional tensor, wherein HiAnd WiThe height and width of a feature map (feature map) corresponding to an output channel of the ith feature layer respectively; the feature layer jth output channel feature map (feature map) X corresponding to the kth sample in the batchikjIs one size of Hi×WiK 1, 2, …, N, j 1, 2, …, Ci;
S13: mixing XiDimension conversion (view or reshape) is performed to obtain the size of N × Ci×(Hi×Wi) Three-dimensional tensor X of* iI.e. to XiEach of which is a feature map XikjStretching from two-dimensional matrix to one-dimensional vector X* ikj;
S14: calculating X* ikjStatistical characteristics of (1), including mean valuesAnd standard deviation Sikj(both are scalar):
wherein,represents X* kjThe m-th element;
s15: byk=1,2,…,N,j=1,2,…,CiForm a size of NxCiIs a mean matrixFrom Sikj,k=1,2,…,N,j=1,2,…,CiForm a size of NxCiIs a standard deviation matrix S _ mi;
S16: matrix of mean valuesAnd standard deviation matrix S _ miThe channel-by-channel equalization (mean filtering) is performed, and the processing formula is as follows:
Nsum=Nsum+N (5)
wherein,is thatA row vector corresponding to the k-th row;is S _ miA row vector corresponding to the k-th row;
s17: judging whether the current batch is the last batch, namely whether n is availablebatch=NbatchIf yes, the batch cycle is terminated, and the current mean vectorAnd the standard deviation vector S _ viThe statistical characteristics of the characteristic graphs corresponding to all output channels of the characteristic layer are obtained; otherwise, updating the count of the current batch: n isbatch=nbatch+1, jump to S12 and continue execution.
Further, in the step 2, the judgment index of the jth output channel in the ith feature layerThe calculation formula of (a) is as follows:
wherein,and S _ vijIs the mean vector corresponding to the feature layerAnd the standard deviation vector S _ viThe j-th element in the feature layer represents the mean and standard deviation of the feature map corresponding to the j-th output channel in the feature layer, α and β are two scale factors (hyper-parameters), α is the meanThreshold of (2), mean value With followingThe value becomes smaller moving towards minus infinity; on the contrary, when the mean value isThenWith followingThe value becomes larger and moves towards zero, β is the standard deviation S _ vijWhen the standard deviation S _ v isij<β,With S _ vijThe value becomes smaller and moves towards the zero direction; otherwise, thenWith S _ vijThe value becomes larger moving to positive infinity. When mean valueSum standard deviation S _ vijWhen the value of (c) is smaller, α -subentry plays a dominant role, when the mean value is smallerSum standard deviation S _ vijThe β -sub-item plays a leading role when the value of (b) is larger, α and β are determined by two methods, one is that the range of the over-parameter value is set from low to high according to the empirical value, the judgment index is calculated according to the value set each time, so as to cut the neural network model, and the model after cutting is retrained to recover the precision, so as to gradually achieve an optimal effect (namely, the number of channels cut off is the most under the condition that the precision of the network is reduced and does not exceed the set threshold), and the other is proportional scaling, namely, the number of channels cut off is scaledAnd β ═ η Σ S _ vij/CiMu and η are scaling factors with a range of (0, 0.4), which can be dynamically adjusted according to the network parameters.
Further, in the step 3, for the ith feature layer LiIf, ifThen it is determined that the jth output channel in the feature layer is not heavyTo remove the channel and its corresponding parameters.
Further, for the ith feature layer LiThe steps of removing the unimportant channels and their corresponding parameters are as follows:
s31: recording corresponding evaluation criteriaOf (2) a set of channels RiSet RiThe number of the elements in (A) is recorded as length (R)i);
S32: a feature layer LiOf the convolution kernel WiExpressed as a size Ci-1×Ci×Khi×KwiOf the four-dimensional tensor, convolution kernel WiCorresponding offset BiIs one with the size of 1 XCiVector of (2), wherein Ci-1Represents the last feature layer Li-1If L, the number of output channelsiIs the first feature layer, then Ci-1Representing the number of channels of the sample input, KhiAnd KwiHeight and width of the convolution kernel, respectively; removing convolution kernel WiIn the set RiForm a new convolution kernelIt has a size ofWith new convolution kernelsInstead of the convolution kernel Wi(ii) a Removing bias BiIn the set RiForming a new offset for the corresponding element of the channelIt has a size of
S33: if the characteristic layer LiNext layer L ofi+1Or a feature layer, the next layer Li+1Of the convolution kernel Wi+1Expressed as a size Ci×Ci+1×Kh(i+1)×Kw(i+1)Of the four-dimensional tensor, wherein Ci+1Represents the next layer Li+1Number of output channels, Kh(i+1)And Kw(i+1)Are respectively a convolution kernel Wi+1Height and width of; (ii) a Removing convolution kernel Wi+1In the set RiForm a new convolution kernelIt has a size ofWith new convolution kernelsInstead of the convolution kernel Wi+1;
S34: if the characteristic layer LiNext layer L ofi+1Is a fully connected layer, the next layer L isi+1Parameter V ofi+1Expressed as one size (C)i×Khi×Kwi)×Ci+1A matrix of (a); removing parameter Vi+1In the set RiForming a new parameter for the corresponding element of the channelIt has a size ofUsing new parametersInstead of the parameter Vi+1。
After model clipping is completed, the deep neural network needs to be retrained for several iterations to recover the accuracy of the network, and the iteration times are related to the clipped feature layer and the judgment criterion.
Has the advantages that:
compared with the prior art, the method has the advantages that the statistical characteristics of the deep neural network are fully utilized, the evaluation index based on the mean value and the standard deviation is constructed, the dimensionality of the characteristic layer of the neural network can be effectively reduced, the training speed of the deep neural network is improved, the frame scale and the weight number of the deep neural network are reduced, the running speed/efficiency of the deep neural network is improved, and the influence on the precision is small. The method has the following characteristics and effects:
firstly, when a judgment index of convolution kernel clipping is constructed, the statistical characteristics of a neural network are considered, and the characteristics of neural network values and the characteristics in a characteristic layer are considered by using the mean value and the standard deviation of the characteristic layer. The data characteristics of the characteristic layer reflect the action effect of the parameter of the convolution kernel, and further cut out the characteristic diagram which shows bad performance and the corresponding convolution kernel, thereby realizing the reduction of the network model frame and the compression of the parameter quantity.
Second, in the criterion formula, the over-parameters α and β can be flexibly set to change the number of removed channels, when the statistical feature falls close to 0, α -sub-item will play a dominant role, and when the statistical feature falls away from 0, β -sub-item will play a dominant role.
Thirdly, the invention provides a new judgment criterion on the basis of fully considering the statistical characteristics of the neural network, has lower algorithm complexity and better performance, and can be deployed on a real-time network and embedded equipment.
Drawings
FIG. 1 is a flow diagram of the present invention;
FIG. 2 is a schematic structural diagram of a feature layer;
FIG. 3 is a schematic diagram of the internal structure of a feature layer;
FIG. 4 is an example of the existence pattern of feature layers in a neural network; FIG. 4(a) is a plurality of successive layers of features, and FIG. 4(b) is a single layer of features;
FIG. 5 is a general block diagram of the design of the present invention;
FIG. 6 is a schematic diagram of model selection and clipping according to the present invention;
Detailed Description
The present invention is described in detail below with reference to specific examples, which will help those skilled in the art to further understand the present invention. The examples described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.
Fig. 1 is a schematic flow chart of a deep neural network model clipping method based on feature map statistics in this example, which implements reduction of a network model frame and compression of parameters by removing a specific feature map and a corresponding convolution kernel, and the specific implementation steps for the model clipping method are as follows:
(1) sequentially calculating the statistical characteristics of each characteristic graph in the characteristic layer aiming at the characteristic layer in the deep neural network;
(2) constructing a judgment criterion according to the statistical characteristics;
(3) and removing the characteristic graph which does not meet the judgment criterion and the corresponding convolution kernel thereof.
It should be noted that the operation object of the embodiment of the present invention is the feature layer of the deep neural network which has been trained to converge, wherein the featureThe layer is formed by combining two parts, a convolutional layer and an active layer (or can be described as an activation function, a nonlinear layer), or by combining three parts, a convolutional layer, a BatchNorm layer and an active layer, as shown in FIG. 2. For types and modules of the network, including but not limited to convolutional layer, batch normalization layer, active layer, full connectivity layer, Resnet module. In the deep neural network framework, the internal structure diagram of the feature layer is shown in fig. 3, and the ith feature layer is LiThe convolution kernel of the ith feature layer is WiConvolution kernel WiCorresponding bias is Bi. The invention only calculates the feature layers of the layers (feature layers/fully connected layers) which are followed by the stored parameters, such as the feature layers except the last feature layer in fig. 4(a), and does not calculate the feature layers of the layers which are not followed by the stored parameters, such as the feature layers in fig. 4(b), i.e. if only the pooling layer, the normalization layer, the activation layer or the softmax layer is followed by the feature layers, no operation is performed on such feature layers.
In an iteration step (epoch) of the deep neural network, samples are sent to the neural network in different batches for calculation. Performing feature statistics on the statistical features of the feature map in the network model in batches, and taking the ith feature layer L as the following descriptioniFor illustration, the implementation steps are as follows:
s31: an intermediate variable is initialized. N is a radical ofsumCounting the number of processed samples, and initializing to 0; n is a radical ofbatchAs the number of batches, NbatchCeil (total number of samples/N), N being the number of samples in a batch, ceil (·) being an upward rounding function; n isbatchIs the count of the current batch, initialized to 0; mean valueAnd standard deviation SikjIs a scalar, k is 1, 2, …, N, j is 1, 2, …, CiInitialized to 0; mean matrixAnd standard deviation matrix S _ miIs initialized to be largeIs (N, C)i) Zero matrix of (1), and mean vectorAnd the standard deviation vector S _ viIs initialized to a size of (1, C)i) Zero vector of (C)iIs a characteristic layer LiThe number of output channels.
S32: will n bebatchL of ith characteristic layer corresponding to each batch of samplesiThe output is obtained by performing view or reshape (dimension conversion)Size is represented by (N, C)i,Hi,Wi) Is changed into (N, C)i,Hi*Wi) Equivalent to two-dimensional feature map XikjStretching into one dimension represents X* ikj,k∈[1,N]Is the ith feature layer LiIndex of middle sample, j ∈ [1, C >i]Is the ith feature layer LiIn the feature map index of the jth channel, it should be emphasized that the feature map XikjRepresents the ith feature layer LiOf the jth channel (H)i,Wi) Set of elements of size, feature map X* ikjRepresenting a feature layerOf the jth channel (H)i*Wi) A set of elements of size;
s33: computing feature layersCharacteristic diagram X of the jth channel* ikjThe statistical characteristics of (A): mean valueAnd standard deviation Sikj:
For feature layerAny of the characteristic diagrams X* ikjAll adopt the mean valueAnd standard deviation SikjAs a characteristic diagram X* ikjThe statistical characteristics of (1). In the characteristic layerMay be generated to a corresponding size of (N, C)i) Is a mean matrix ofAnd standard deviation matrix S _ mi;
S34: matrix of mean valuesAnd standard deviation matrix S _ miThe equalization processing is carried out according to channels, and the method is realized as follows:
Nsum=Nsum+N (5)
wherein N issumIs front nbatchThe superposition of batch sample numbers is used for counting the number of processed samples; n is the N-thbatchNumber of samples of a batch;is a mean matrixAs a result of the mean filtering, the average value,is the mean of all channels corresponding to the kth sample; s _ viIs the standard deviation matrix S _ miAs a result of the mean filtering, the average value,is the standard deviation of all channels for the kth sample.
S35: updating the current batch: n isbatch=nbatch+1, if nbatch=NbatchThen the batch cycle is terminated; else mean valueAnd standard deviation SikjSetting to 0; mean matrixAnd standard deviation matrix S _ miReset to zero matrix and mean vectorAnd the standard deviation vector S _ viAlso reset to zero vector and go to S32 to continue execution. Obtaining a mean vector according to the batch iterationAnd the standard deviation vector S _ viFor the feature layer LiThe judgment index of the jth channelThe calculation process is as follows:
wherein,and S _ vijIs a characteristic layer LiThe mean and standard deviation of the jth channel in (i), α and β are two hyperparameters, these two scaling factors being used to define the meanSum standard deviation S _ vijIs a minimum value, preventing the presence of a divisor of zero. In case of smaller superparameters, scaling is used, i.e.And β ═ η Σ S _ vij/CiMu and η are scale factors, the value range is (0, 0.4), when the super parameter is increased, a successive approximation method is adopted, the range of the super parameter value is gradually increased from low to high, and an optimal effect is gradually achieved.
Removing feature maps and associated parameters that do not meet the criteria for CiCharacteristic layer L of each output channeliIn other words, the evaluation index is calculatedRecording corresponding evaluation indexesOf (2) a set of channels RiThe steps of removing the corresponding channel are as follows:
s71: characteristic layer LiOf the convolution kernel WiHas a size of (C)i-1,Ci,Khi,Kwi),Ci-1Represents the last feature layer Li-1Number of output channels or number of channels of sample input (if feature layer)LiAs the first feature layer), CiFor the current feature layer LiNumber of output channels, Khi,KwiThe size of the convolution kernel. Constructing a new convolution kernelIt has a size of Represents from CiMinus the set RiNumber of channels (C) after number of elements containedi-length(Ri)). A convolution kernel WiCut in the 1 st dimension of (c) not belonging to the set RiIs copied to a new convolution kernelThen using the new convolution kernelInstead of the convolution kernel Wi(ii) a Constructing a new biasIt has a size ofWill bias BiMiddle cut not belonging to set RiIs copied to the new offsetThen using the new biasInstead of the bias Bi;
S72: if the feature layer LiIf there is a feature layer after that, for the next feature layer Li+1Of the convolution kernel Wi+1Has a size of (C)i,Ci+1,Kh(i+1)×Kw(i+1)),Ci+1Represents the next feature layer Li+1The number of output channels. Constructing a new convolution kernelIt has a size ofA convolution kernel Wi+1Cut in the 1 st dimension of (c) not belonging to the set RiIs copied to a new convolution kernelThen using the new convolution kernelInstead of the convolution kernel Wi+1;
S73: if the feature layer LiFollowed by a full connection layer having a number of output channels Ci+1The corresponding parameter size is (C)i×Khi×Kwi,Ci+1). Constructing a new parameterIt has a size ofWill be parameter Vi+1Cut in the 1 st dimension of (c) not belonging to the set RiThe corresponding element of the channel of (1) is copied to the new parameterThen using the new parametersInstead of the parameter Vi+1;
Further, after model clipping is completed, the deep neural network needs to be retrained for several iterations to restore the accuracy of the network, the number of iterations is related to the clipped feature layer and the evaluation criterion, the clipped feature layer is close to the input layer and has fewer iterations, the clipped feature layer is close to the output layer and has more iterations, and in the evaluation criterion, the higher the values of α and β are, more iterations are needed to restore the accuracy of the network.
It will be understood by those skilled in the art that all or part of the steps carried out to implement the above-described implementation method can be implemented by program instructions, and the program can be stored in a computer-readable storage medium. The foregoing describes a specific embodiment of the present invention. It should be understood that the compression operation of the feature layer and the fully-connected layer is performed by the present invention, and therefore, any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the scope of protection of the present invention.
Claims (5)
1. A deep neural network model clipping method based on feature map statistical features is characterized in that in order to improve compression efficiency and acceleration performance of a network, aiming at optimization of a deep neural network, the following steps are implemented:
step 1, calculating statistical characteristics of characteristic graphs corresponding to all output channels of a characteristic layer in a deep neural network model; wherein the characteristic layer is composed of a convolution layer and an activation layer, or composed of a convolution layer, a normalization layer and an activation layer;
step 2, calculating the judgment indexes of each output channel in the characteristic layer according to the statistical characteristics of the characteristic graph corresponding to each output channel in the characteristic layer;
and 3, judging the importance of each output channel in the characteristic layer according to the judgment indexes, and removing the unimportant output channels and the corresponding parameters thereof.
2. The feature map statistical feature-based deep neural network model clipping method according to claim 1, wherein in step 2, feature statistics is performed in batches on feature maps corresponding to output channels of feature layers in the deep neural network model; for the ith feature layer, the statistical features of the feature maps corresponding to all the output channels of the ith feature layer comprise a mean vectorAnd a standard deviation vector S _ viThe calculation steps are as follows:
s11: initializing; setting NsumCounting the number of processed samples, and initializing to 0; n is a radical ofbatchAs the number of batches, NbatchCeil (total number of samples/N), N being the number of samples in a batch, ceil (·) being an upward rounding function; n isbatchIs the count of the current batch, initialized to 1; initializing a mean vectorAnd the standard deviation vector S _ viIs 1 XCiA zero vector of (2), wherein CiThe number of output channels of the ith feature layer;
s12: will n bebatchOutput X of ith characteristic layer corresponding to each batch of samplesiExpressed as one size of NxCi×Hi×WiOf the four-dimensional tensor, wherein HiAnd WiThe height and the width of a characteristic diagram corresponding to an output channel of the ith characteristic layer respectively; the jth output channel feature map X of the feature layer corresponding to the kth sample in the batchikjIs one size of Hi×WiOf a two-dimensional matrix,k=1,2,…,N,j=1,2,…,Ci;
S13: handle XiEach of which is a feature map XikjStretching from two-dimensional matrix to one-dimensional vector X* ikj;
S14: calculating X* ikjStatistical characteristics of (1), including mean valuesAnd standard deviation Sikj:
Wherein,represents X* ikjThe m-th element;
s15: byForm a size of NxCiIs a mean matrixFrom Sikj,k=1,2,…,N,j=1,2,…,CiForm a size of NxCiIs a standard deviation matrix S _ mi;
S16: matrix of mean valuesAnd standard deviation matrix S _ miCarrying out equalization processing according to channels, wherein the processing formula is as follows:
Nsum=Nsum+N (5)
wherein,is thatA row vector corresponding to the k-th row;is S _ miA row vector corresponding to the k-th row;
s17: judging whether the current batch is the last batch, namely whether n is availablebatch=NatchIf yes, the batch cycle is terminated, and the current mean vectorAnd the standard deviation vector S _ viThe statistical characteristics of the characteristic graphs corresponding to all output channels of the characteristic layer are obtained; otherwise, updating the count of the current batch: n isbatch=nbatch+1, jump to S12 and continue execution.
3. The method according to claim 2, wherein in step 2, the judgment index of the jth output channel in the ith feature layer is determined according to the feature map statistical feature-based deep neural network model clipping methodThe calculation formula of (a) is as follows:
wherein,and S _ vijIs the mean vector corresponding to the feature layerAnd the standard deviation vector S _ viThe j-th element in the feature layer represents the mean and standard deviation of the feature map corresponding to the j-th output channel in the feature layer, α and β are two scale factors, and epsilon is a minimum value.
4. The method for deep neural network model clipping based on feature map statistical features according to claim 3, wherein in the step 3, for the ith feature layer LiIf, ifThen the jth output channel in the feature layer is determined to be unimportant and the channel and its corresponding parameters are removed.
5. The feature map statistical feature-based deep neural network model clipping method of claim 4, wherein L is for the ith feature layeriThe steps of removing the unimportant channels and their corresponding parameters are as follows:
s31: recording corresponding evaluation criteriaOf (2) a set of channels RiSet RiThe number of the elements in (A) is recorded as length (R)i);
S32: a feature layer LiOf the convolution kernel WiExpressed as a size Ci-1×Ci×Khi×KwiOf the four-dimensional tensor, convolution kernel WiCorresponding offset BiIs one with the size of 1 XCiVector of (2), wherein Ci-1Represents the last feature layer Li-1If the number of output channels isLiIs the first feature layer, then Ci-1Representing the number of channels of the sample input, KhiAnd KwiAre respectively a convolution kernel WiHeight and width of; removing convolution kernel WiIn the set RiForm a new convolution kernel Wi *Of a size ofWith a new convolution kernel Wi *Instead of the convolution kernel Wi(ii) a Removing bias BiIn the set RiForming a new offset for the corresponding element of the channelIt has a size of
S33: if the characteristic layer LiNext layer L ofi+1Or a feature layer, the next layer Li+1Of the convolution kernel Wi+1Expressed as a size Ci×Ci+1×Kh(i+1)×Kw(i+1)Of the four-dimensional tensor, wherein Ci+1Represents the next layer Li+1Number of output channels, Kh(i+1)And Kw(i+1)Are respectively a convolution kernel Wi+1Height and width of; removing convolution kernel Wi+1In the set RiForm a new convolution kernelIt has a size ofWith new convolution kernelsInstead of the convolution kernel Wi+1;
S34: if the characteristic layerLiNext layer L ofi+1Is a fully connected layer, the next layer L isi+1Parameter V ofi+1Expressed as one size (C)i×Khi×Kwi)×Ci+1A matrix of (a); removing parameter Vi+1In the set RiForming a new parameter for the corresponding element of the channelIt has a size ofUsing new parametersInstead of the parameter Vi+1。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811440153.2A CN109472352A (en) | 2018-11-29 | 2018-11-29 | A kind of deep neural network model method of cutting out based on characteristic pattern statistical nature |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811440153.2A CN109472352A (en) | 2018-11-29 | 2018-11-29 | A kind of deep neural network model method of cutting out based on characteristic pattern statistical nature |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109472352A true CN109472352A (en) | 2019-03-15 |
Family
ID=65674220
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811440153.2A Pending CN109472352A (en) | 2018-11-29 | 2018-11-29 | A kind of deep neural network model method of cutting out based on characteristic pattern statistical nature |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109472352A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109978069A (en) * | 2019-04-02 | 2019-07-05 | 南京大学 | The method for reducing ResNeXt model over-fitting in picture classification |
CN110119811A (en) * | 2019-05-15 | 2019-08-13 | 电科瑞达(成都)科技有限公司 | A kind of convolution kernel method of cutting out based on entropy significance criteria model |
CN110232436A (en) * | 2019-05-08 | 2019-09-13 | 华为技术有限公司 | Pruning method, device and the storage medium of convolutional neural networks |
CN110309847A (en) * | 2019-04-26 | 2019-10-08 | 深圳前海微众银行股份有限公司 | A kind of model compression method and device |
CN112036563A (en) * | 2019-06-03 | 2020-12-04 | 国际商业机器公司 | Deep learning model insights using provenance data |
CN117636057A (en) * | 2023-12-13 | 2024-03-01 | 石家庄铁道大学 | Train bearing damage classification and identification method based on multi-branch cross-space attention model |
-
2018
- 2018-11-29 CN CN201811440153.2A patent/CN109472352A/en active Pending
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109978069A (en) * | 2019-04-02 | 2019-07-05 | 南京大学 | The method for reducing ResNeXt model over-fitting in picture classification |
CN110309847A (en) * | 2019-04-26 | 2019-10-08 | 深圳前海微众银行股份有限公司 | A kind of model compression method and device |
CN110232436A (en) * | 2019-05-08 | 2019-09-13 | 华为技术有限公司 | Pruning method, device and the storage medium of convolutional neural networks |
CN110119811A (en) * | 2019-05-15 | 2019-08-13 | 电科瑞达(成都)科技有限公司 | A kind of convolution kernel method of cutting out based on entropy significance criteria model |
CN110119811B (en) * | 2019-05-15 | 2021-07-27 | 电科瑞达(成都)科技有限公司 | Convolution kernel cutting method based on entropy importance criterion model |
CN112036563A (en) * | 2019-06-03 | 2020-12-04 | 国际商业机器公司 | Deep learning model insights using provenance data |
CN117636057A (en) * | 2023-12-13 | 2024-03-01 | 石家庄铁道大学 | Train bearing damage classification and identification method based on multi-branch cross-space attention model |
CN117636057B (en) * | 2023-12-13 | 2024-06-11 | 石家庄铁道大学 | Train bearing damage classification and identification method based on multi-branch cross-space attention model |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109472352A (en) | A kind of deep neural network model method of cutting out based on characteristic pattern statistical nature | |
Dai et al. | Compressing neural networks using the variational information bottleneck | |
CN114937151B (en) | Lightweight target detection method based on multiple receptive fields and attention feature pyramid | |
CN108510012B (en) | Target rapid detection method based on multi-scale feature map | |
CN113449864B (en) | Feedback type impulse neural network model training method for image data classification | |
CN103927531B (en) | It is a kind of based on local binary and the face identification method of particle group optimizing BP neural network | |
CN110334580A (en) | The equipment fault classification method of changeable weight combination based on integrated increment | |
Chang et al. | Automatic channel pruning via clustering and swarm intelligence optimization for CNN | |
Paupamah et al. | Quantisation and pruning for neural network compression and regularisation | |
CN111988329B (en) | Network intrusion detection method based on deep learning | |
CN111898689A (en) | Image classification method based on neural network architecture search | |
CN110442143B (en) | Unmanned aerial vehicle situation data clustering method based on combined multi-target pigeon swarm optimization | |
CN111047078B (en) | Traffic characteristic prediction method, system and storage medium | |
Ullah et al. | About pyramid structure in convolutional neural networks | |
CN104050505A (en) | Multilayer-perceptron training method based on bee colony algorithm with learning factor | |
CN107563430A (en) | A kind of convolutional neural networks algorithm optimization method based on sparse autocoder and gray scale correlation fractal dimension | |
CN112884149A (en) | Deep neural network pruning method and system based on random sensitivity ST-SM | |
CN110298434A (en) | A kind of integrated deepness belief network based on fuzzy division and FUZZY WEIGHTED | |
Ma et al. | A survey of sparse-learning methods for deep neural networks | |
CN101833670B (en) | Image matching method based on lateral inhibition and chaos quantum particle swarm optimization | |
CN117726939A (en) | Hyperspectral image classification method based on multi-feature fusion | |
CN117154256A (en) | Electrochemical repair method for lithium battery | |
CN117590173A (en) | Cable partial discharge pattern recognition method based on convolutional neural network | |
Chin et al. | A high-performance adaptive quantization approach for edge CNN applications | |
Hollósi et al. | Improve the accuracy of neural networks using capsule layers |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |