CN109472352A - A kind of deep neural network model method of cutting out based on characteristic pattern statistical nature - Google Patents

A kind of deep neural network model method of cutting out based on characteristic pattern statistical nature Download PDF

Info

Publication number
CN109472352A
CN109472352A CN201811440153.2A CN201811440153A CN109472352A CN 109472352 A CN109472352 A CN 109472352A CN 201811440153 A CN201811440153 A CN 201811440153A CN 109472352 A CN109472352 A CN 109472352A
Authority
CN
China
Prior art keywords
layer
feature
characteristic
batch
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811440153.2A
Other languages
Chinese (zh)
Inventor
周彦
刘广毅
王冬丽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiangtan University
Original Assignee
Xiangtan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiangtan University filed Critical Xiangtan University
Priority to CN201811440153.2A priority Critical patent/CN109472352A/en
Publication of CN109472352A publication Critical patent/CN109472352A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Mathematical Optimization (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Computational Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a kind of deep neural network model method of cutting out based on characteristic pattern statistical nature, step is realized are as follows: step 1, for the characteristic layer in deep neural network model calculates the statistical nature of the corresponding characteristic pattern of its each output channel;Wherein characteristic layer is made of convolutional layer and active coating two parts, or is made of convolutional layer, normalization layer and active coating three parts;Step 2, according to the statistical nature of the corresponding characteristic pattern of output channel each in characteristic layer, calculate the judging quota of each output channel in characteristic layer;Step 3, according to the importance of each output channel in judging quota judging characteristic layer, unessential output channel and its corresponding parameter are removed.The present invention can be effectively reduced the dimension of neural network characteristics layer, improve the operational efficiency of network model, while can reduce network size, and the influence generated to precision is smaller.

Description

Deep neural network model cutting method based on characteristic graph statistical characteristics
Technical Field
The invention belongs to the field of artificial intelligence and pattern recognition, and particularly relates to deep neural network model compression.
Background
The deep learning (deep learning) has remarkable achievements in solving the problem of high-level abstract cognition, so that a new step is formed in artificial intelligence, and a technical basis is provided for high-precision and multi-type target detection, identification and tracking. However, due to complex operation and huge resource requirements, the neural network can only be deployed on a high-performance computing platform, and the application of the neural network on mobile equipment is limited. In 2015, Deep Compression published by Han applies network model cutting, weight sharing and quantization, coding and other modes to model Compression, so that model storage achieves a good effect, and researches on network Compression methods by researchers are also initiated. The research of the current deep learning model compression method can be mainly divided into the following directions:
(1) the design of a finer model, the use of a finer and more efficient model design, the reduction of the model size to a great extent, and good performance.
(2) Model cutting, the network with complex structure has very good performance, and the parameters of the network have redundancy, so that for the trained model network, an effective judgment means is usually found to judge the importance of the parameters, and unimportant connections or convolution kernels are cut to reduce the redundancy of the model.
(3) The thinning of the kernel induces the updating of the weight in the training process, so that the kernel is more sparse, and for a sparse matrix, a more compact storage mode can be used, but the sparse matrix is used for operating on a hardware platform, so that the operation efficiency is not high, and the sparse matrix is easily influenced by the bandwidth, and therefore the speed is not obvious.
The method for clipping the pre-trained network model is the most used method in the current model compression, usually an effective judgment index is searched for to judge the importance of neurons or characteristic diagrams, and unimportant connections or convolution kernels are clipped to reduce the redundancy of the model. Li proposes a clipping mode based on magnitude, and uses the sum of the absolute values of the weights to judge the importance of the weights, and uses the sum of the absolute values of all the weights in the convolution kernel as the evaluation index of the convolution kernel. Hu defines a variable apoz (average persistence of zeros) to measure the number of values activated to 0 in each convolution kernel as a criterion for evaluating whether a convolution kernel is important. Luo proposes a clipping mode based on entropy, and uses entropy to determine the importance of a convolution kernel. Anwar adopts a random clipping mode, and then determines the locally optimal clipping mode according to the performance of each random statistical model. The LDA analysis of Tian finds that for each class, there are many convolution kernels with highly uncorrelated activations, so this can be exploited to cull a large number of filters with only a small amount of information without affecting the performance of the model.
In summary, the limitations of the existing solutions are as follows:
a. the core sparsification method only considers the compression storage of the network, the compression effect in the operation process is not obvious, and the speed is not obviously improved;
b. the magnitude of the weight is used as a judgment index, only the numerical characteristic of the weight is considered, the data characteristic of a network layer is not considered, and the compression effect is not high;
c. the complexity of the calculation of the evaluation index is higher, and more calculation capacity is consumed;
d. the random cutting method has strong randomness and is easy to damage the parameter characteristics of the network;
therefore, it is necessary to provide a method for realizing deep neural network compression and acceleration, which has a simple calculation method, can fully consider redundancy in the network, has strong applicability, and does not depend on a special acceleration library.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention discloses a deep neural network model cutting method based on feature map statistics. Compared with other compression methods, the method has the advantages that various statistical characteristics of the neural network characteristic layer are used as judgment standards, the parameter layer of the network is cut, the numerical characteristics and the statistical characteristics of the network are fully considered, good compression efficiency is obtained, and meanwhile the running speed is improved.
The technical scheme adopted by the invention is as follows:
a deep neural network model cutting method based on feature map statistical features comprises the following steps:
step 1, calculating statistical characteristics of characteristic graphs corresponding to all output channels of a characteristic layer in a deep neural network model; wherein the characteristic layer is composed of a convolution layer and an active layer, or composed of a convolution layer, a normalization layer (BatchNorm layer) and an active layer; the invention only calculates the characteristic layer of the layer (characteristic layer/full connection layer) with the storage parameter at the back;
step 2, calculating the judgment indexes of each output channel in the characteristic layer according to the statistical characteristics of the characteristic graph corresponding to each output channel in the characteristic layer;
and 3, judging the importance of each output channel in the characteristic layer according to the judgment indexes, and removing the unimportant output channels and the corresponding parameters thereof.
In an iteration step (epoch) of the deep neural network, samples are sent to the neural network in different batches for calculation. In the step 2, feature statistics is carried out on feature graphs corresponding to output channels of feature layers in the network model in batches; for the ith feature layer, the statistical features of the feature maps corresponding to all the output channels of the ith feature layer comprise a mean vectorAnd a standard deviation vector S _ viThe calculation steps are as follows:
s11: initializing; setting NsumCounting the number of processed samples, and initializing to 0; n is a radical ofbatchAs the number of batches, NbatchCeil (total number of samples/N), N being the number of samples in a batch, ceil (·) being an upward rounding function; n isbatchIs the count of the current batch, initialized to 1; initializing a mean vectorAnd the standard deviation vector S _ viIs 1 XCiA zero vector of (2), wherein CiThe number of output channels of the ith feature layer;
s12: will n bebatchOutput (features maps) X of ith feature layer corresponding to each batch of samplesiExpressed as one size of NxCi×Hi×WiOf the four-dimensional tensor, wherein HiAnd WiThe height and width of a feature map (feature map) corresponding to an output channel of the ith feature layer respectively; the feature layer jth output channel feature map (feature map) X corresponding to the kth sample in the batchikjIs one size of Hi×WiK 1, 2, …, N, j 1, 2, …, Ci
S13: mixing XiDimension conversion (view or reshape) is performed to obtain the size of N × Ci×(Hi×Wi) Three-dimensional tensor X of* iI.e. to XiEach of which is a feature map XikjStretching from two-dimensional matrix to one-dimensional vector X* ikj
S14: calculating X* ikjStatistical characteristics of (1), including mean valuesAnd standard deviation Sikj(both are scalar):
wherein,represents X* kjThe m-th element;
s15: byk=1,2,…,N,j=1,2,…,CiForm a size of NxCiIs a mean matrixFrom Sikj,k=1,2,…,N,j=1,2,…,CiForm a size of NxCiIs a standard deviation matrix S _ mi
S16: matrix of mean valuesAnd standard deviation matrix S _ miThe channel-by-channel equalization (mean filtering) is performed, and the processing formula is as follows:
Nsum=Nsum+N (5)
wherein,is thatA row vector corresponding to the k-th row;is S _ miA row vector corresponding to the k-th row;
s17: judging whether the current batch is the last batch, namely whether n is availablebatch=NbatchIf yes, the batch cycle is terminated, and the current mean vectorAnd the standard deviation vector S _ viThe statistical characteristics of the characteristic graphs corresponding to all output channels of the characteristic layer are obtained; otherwise, updating the count of the current batch: n isbatch=nbatch+1, jump to S12 and continue execution.
Further, in the step 2, the judgment index of the jth output channel in the ith feature layerThe calculation formula of (a) is as follows:
wherein,and S _ vijIs the mean vector corresponding to the feature layerAnd the standard deviation vector S _ viThe j-th element in the feature layer represents the mean and standard deviation of the feature map corresponding to the j-th output channel in the feature layer, α and β are two scale factors (hyper-parameters), α is the meanThreshold of (2), mean value With followingThe value becomes smaller moving towards minus infinity; on the contrary, when the mean value isThenWith followingThe value becomes larger and moves towards zero, β is the standard deviation S _ vijWhen the standard deviation S _ v isij<β,With S _ vijThe value becomes smaller and moves towards the zero direction; otherwise, thenWith S _ vijThe value becomes larger moving to positive infinity. When mean valueSum standard deviation S _ vijWhen the value of (c) is smaller, α -subentry plays a dominant role, when the mean value is smallerSum standard deviation S _ vijThe β -sub-item plays a leading role when the value of (b) is larger, α and β are determined by two methods, one is that the range of the over-parameter value is set from low to high according to the empirical value, the judgment index is calculated according to the value set each time, so as to cut the neural network model, and the model after cutting is retrained to recover the precision, so as to gradually achieve an optimal effect (namely, the number of channels cut off is the most under the condition that the precision of the network is reduced and does not exceed the set threshold), and the other is proportional scaling, namely, the number of channels cut off is scaledAnd β ═ η Σ S _ vij/CiMu and η are scaling factors with a range of (0, 0.4), which can be dynamically adjusted according to the network parameters.
Further, in the step 3, for the ith feature layer LiIf, ifThen it is determined that the jth output channel in the feature layer is not heavyTo remove the channel and its corresponding parameters.
Further, for the ith feature layer LiThe steps of removing the unimportant channels and their corresponding parameters are as follows:
s31: recording corresponding evaluation criteriaOf (2) a set of channels RiSet RiThe number of the elements in (A) is recorded as length (R)i);
S32: a feature layer LiOf the convolution kernel WiExpressed as a size Ci-1×Ci×Khi×KwiOf the four-dimensional tensor, convolution kernel WiCorresponding offset BiIs one with the size of 1 XCiVector of (2), wherein Ci-1Represents the last feature layer Li-1If L, the number of output channelsiIs the first feature layer, then Ci-1Representing the number of channels of the sample input, KhiAnd KwiHeight and width of the convolution kernel, respectively; removing convolution kernel WiIn the set RiForm a new convolution kernelIt has a size ofWith new convolution kernelsInstead of the convolution kernel Wi(ii) a Removing bias BiIn the set RiForming a new offset for the corresponding element of the channelIt has a size of
S33: if the characteristic layer LiNext layer L ofi+1Or a feature layer, the next layer Li+1Of the convolution kernel Wi+1Expressed as a size Ci×Ci+1×Kh(i+1)×Kw(i+1)Of the four-dimensional tensor, wherein Ci+1Represents the next layer Li+1Number of output channels, Kh(i+1)And Kw(i+1)Are respectively a convolution kernel Wi+1Height and width of; (ii) a Removing convolution kernel Wi+1In the set RiForm a new convolution kernelIt has a size ofWith new convolution kernelsInstead of the convolution kernel Wi+1
S34: if the characteristic layer LiNext layer L ofi+1Is a fully connected layer, the next layer L isi+1Parameter V ofi+1Expressed as one size (C)i×Khi×Kwi)×Ci+1A matrix of (a); removing parameter Vi+1In the set RiForming a new parameter for the corresponding element of the channelIt has a size ofUsing new parametersInstead of the parameter Vi+1
After model clipping is completed, the deep neural network needs to be retrained for several iterations to recover the accuracy of the network, and the iteration times are related to the clipped feature layer and the judgment criterion.
Has the advantages that:
compared with the prior art, the method has the advantages that the statistical characteristics of the deep neural network are fully utilized, the evaluation index based on the mean value and the standard deviation is constructed, the dimensionality of the characteristic layer of the neural network can be effectively reduced, the training speed of the deep neural network is improved, the frame scale and the weight number of the deep neural network are reduced, the running speed/efficiency of the deep neural network is improved, and the influence on the precision is small. The method has the following characteristics and effects:
firstly, when a judgment index of convolution kernel clipping is constructed, the statistical characteristics of a neural network are considered, and the characteristics of neural network values and the characteristics in a characteristic layer are considered by using the mean value and the standard deviation of the characteristic layer. The data characteristics of the characteristic layer reflect the action effect of the parameter of the convolution kernel, and further cut out the characteristic diagram which shows bad performance and the corresponding convolution kernel, thereby realizing the reduction of the network model frame and the compression of the parameter quantity.
Second, in the criterion formula, the over-parameters α and β can be flexibly set to change the number of removed channels, when the statistical feature falls close to 0, α -sub-item will play a dominant role, and when the statistical feature falls away from 0, β -sub-item will play a dominant role.
Thirdly, the invention provides a new judgment criterion on the basis of fully considering the statistical characteristics of the neural network, has lower algorithm complexity and better performance, and can be deployed on a real-time network and embedded equipment.
Drawings
FIG. 1 is a flow diagram of the present invention;
FIG. 2 is a schematic structural diagram of a feature layer;
FIG. 3 is a schematic diagram of the internal structure of a feature layer;
FIG. 4 is an example of the existence pattern of feature layers in a neural network; FIG. 4(a) is a plurality of successive layers of features, and FIG. 4(b) is a single layer of features;
FIG. 5 is a general block diagram of the design of the present invention;
FIG. 6 is a schematic diagram of model selection and clipping according to the present invention;
Detailed Description
The present invention is described in detail below with reference to specific examples, which will help those skilled in the art to further understand the present invention. The examples described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.
Fig. 1 is a schematic flow chart of a deep neural network model clipping method based on feature map statistics in this example, which implements reduction of a network model frame and compression of parameters by removing a specific feature map and a corresponding convolution kernel, and the specific implementation steps for the model clipping method are as follows:
(1) sequentially calculating the statistical characteristics of each characteristic graph in the characteristic layer aiming at the characteristic layer in the deep neural network;
(2) constructing a judgment criterion according to the statistical characteristics;
(3) and removing the characteristic graph which does not meet the judgment criterion and the corresponding convolution kernel thereof.
It should be noted that the operation object of the embodiment of the present invention is the feature layer of the deep neural network which has been trained to converge, wherein the featureThe layer is formed by combining two parts, a convolutional layer and an active layer (or can be described as an activation function, a nonlinear layer), or by combining three parts, a convolutional layer, a BatchNorm layer and an active layer, as shown in FIG. 2. For types and modules of the network, including but not limited to convolutional layer, batch normalization layer, active layer, full connectivity layer, Resnet module. In the deep neural network framework, the internal structure diagram of the feature layer is shown in fig. 3, and the ith feature layer is LiThe convolution kernel of the ith feature layer is WiConvolution kernel WiCorresponding bias is Bi. The invention only calculates the feature layers of the layers (feature layers/fully connected layers) which are followed by the stored parameters, such as the feature layers except the last feature layer in fig. 4(a), and does not calculate the feature layers of the layers which are not followed by the stored parameters, such as the feature layers in fig. 4(b), i.e. if only the pooling layer, the normalization layer, the activation layer or the softmax layer is followed by the feature layers, no operation is performed on such feature layers.
In an iteration step (epoch) of the deep neural network, samples are sent to the neural network in different batches for calculation. Performing feature statistics on the statistical features of the feature map in the network model in batches, and taking the ith feature layer L as the following descriptioniFor illustration, the implementation steps are as follows:
s31: an intermediate variable is initialized. N is a radical ofsumCounting the number of processed samples, and initializing to 0; n is a radical ofbatchAs the number of batches, NbatchCeil (total number of samples/N), N being the number of samples in a batch, ceil (·) being an upward rounding function; n isbatchIs the count of the current batch, initialized to 0; mean valueAnd standard deviation SikjIs a scalar, k is 1, 2, …, N, j is 1, 2, …, CiInitialized to 0; mean matrixAnd standard deviation matrix S _ miIs initialized to be largeIs (N, C)i) Zero matrix of (1), and mean vectorAnd the standard deviation vector S _ viIs initialized to a size of (1, C)i) Zero vector of (C)iIs a characteristic layer LiThe number of output channels.
S32: will n bebatchL of ith characteristic layer corresponding to each batch of samplesiThe output is obtained by performing view or reshape (dimension conversion)Size is represented by (N, C)i,Hi,Wi) Is changed into (N, C)i,Hi*Wi) Equivalent to two-dimensional feature map XikjStretching into one dimension represents X* ikj,k∈[1,N]Is the ith feature layer LiIndex of middle sample, j ∈ [1, C >i]Is the ith feature layer LiIn the feature map index of the jth channel, it should be emphasized that the feature map XikjRepresents the ith feature layer LiOf the jth channel (H)i,Wi) Set of elements of size, feature map X* ikjRepresenting a feature layerOf the jth channel (H)i*Wi) A set of elements of size;
s33: computing feature layersCharacteristic diagram X of the jth channel* ikjThe statistical characteristics of (A): mean valueAnd standard deviation Sikj
For feature layerAny of the characteristic diagrams X* ikjAll adopt the mean valueAnd standard deviation SikjAs a characteristic diagram X* ikjThe statistical characteristics of (1). In the characteristic layerMay be generated to a corresponding size of (N, C)i) Is a mean matrix ofAnd standard deviation matrix S _ mi
S34: matrix of mean valuesAnd standard deviation matrix S _ miThe equalization processing is carried out according to channels, and the method is realized as follows:
Nsum=Nsum+N (5)
wherein N issumIs front nbatchThe superposition of batch sample numbers is used for counting the number of processed samples; n is the N-thbatchNumber of samples of a batch;is a mean matrixAs a result of the mean filtering, the average value,is the mean of all channels corresponding to the kth sample; s _ viIs the standard deviation matrix S _ miAs a result of the mean filtering, the average value,is the standard deviation of all channels for the kth sample.
S35: updating the current batch: n isbatch=nbatch+1, if nbatch=NbatchThen the batch cycle is terminated; else mean valueAnd standard deviation SikjSetting to 0; mean matrixAnd standard deviation matrix S _ miReset to zero matrix and mean vectorAnd the standard deviation vector S _ viAlso reset to zero vector and go to S32 to continue execution. Obtaining a mean vector according to the batch iterationAnd the standard deviation vector S _ viFor the feature layer LiThe judgment index of the jth channelThe calculation process is as follows:
wherein,and S _ vijIs a characteristic layer LiThe mean and standard deviation of the jth channel in (i), α and β are two hyperparameters, these two scaling factors being used to define the meanSum standard deviation S _ vijIs a minimum value, preventing the presence of a divisor of zero. In case of smaller superparameters, scaling is used, i.e.And β ═ η Σ S _ vij/CiMu and η are scale factors, the value range is (0, 0.4), when the super parameter is increased, a successive approximation method is adopted, the range of the super parameter value is gradually increased from low to high, and an optimal effect is gradually achieved.
Removing feature maps and associated parameters that do not meet the criteria for CiCharacteristic layer L of each output channeliIn other words, the evaluation index is calculatedRecording corresponding evaluation indexesOf (2) a set of channels RiThe steps of removing the corresponding channel are as follows:
s71: characteristic layer LiOf the convolution kernel WiHas a size of (C)i-1,Ci,Khi,Kwi),Ci-1Represents the last feature layer Li-1Number of output channels or number of channels of sample input (if feature layer)LiAs the first feature layer), CiFor the current feature layer LiNumber of output channels, Khi,KwiThe size of the convolution kernel. Constructing a new convolution kernelIt has a size of Represents from CiMinus the set RiNumber of channels (C) after number of elements containedi-length(Ri)). A convolution kernel WiCut in the 1 st dimension of (c) not belonging to the set RiIs copied to a new convolution kernelThen using the new convolution kernelInstead of the convolution kernel Wi(ii) a Constructing a new biasIt has a size ofWill bias BiMiddle cut not belonging to set RiIs copied to the new offsetThen using the new biasInstead of the bias Bi
S72: if the feature layer LiIf there is a feature layer after that, for the next feature layer Li+1Of the convolution kernel Wi+1Has a size of (C)i,Ci+1,Kh(i+1)×Kw(i+1)),Ci+1Represents the next feature layer Li+1The number of output channels. Constructing a new convolution kernelIt has a size ofA convolution kernel Wi+1Cut in the 1 st dimension of (c) not belonging to the set RiIs copied to a new convolution kernelThen using the new convolution kernelInstead of the convolution kernel Wi+1
S73: if the feature layer LiFollowed by a full connection layer having a number of output channels Ci+1The corresponding parameter size is (C)i×Khi×Kwi,Ci+1). Constructing a new parameterIt has a size ofWill be parameter Vi+1Cut in the 1 st dimension of (c) not belonging to the set RiThe corresponding element of the channel of (1) is copied to the new parameterThen using the new parametersInstead of the parameter Vi+1
Further, after model clipping is completed, the deep neural network needs to be retrained for several iterations to restore the accuracy of the network, the number of iterations is related to the clipped feature layer and the evaluation criterion, the clipped feature layer is close to the input layer and has fewer iterations, the clipped feature layer is close to the output layer and has more iterations, and in the evaluation criterion, the higher the values of α and β are, more iterations are needed to restore the accuracy of the network.
It will be understood by those skilled in the art that all or part of the steps carried out to implement the above-described implementation method can be implemented by program instructions, and the program can be stored in a computer-readable storage medium. The foregoing describes a specific embodiment of the present invention. It should be understood that the compression operation of the feature layer and the fully-connected layer is performed by the present invention, and therefore, any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the scope of protection of the present invention.

Claims (5)

1. A deep neural network model clipping method based on feature map statistical features is characterized in that in order to improve compression efficiency and acceleration performance of a network, aiming at optimization of a deep neural network, the following steps are implemented:
step 1, calculating statistical characteristics of characteristic graphs corresponding to all output channels of a characteristic layer in a deep neural network model; wherein the characteristic layer is composed of a convolution layer and an activation layer, or composed of a convolution layer, a normalization layer and an activation layer;
step 2, calculating the judgment indexes of each output channel in the characteristic layer according to the statistical characteristics of the characteristic graph corresponding to each output channel in the characteristic layer;
and 3, judging the importance of each output channel in the characteristic layer according to the judgment indexes, and removing the unimportant output channels and the corresponding parameters thereof.
2. The feature map statistical feature-based deep neural network model clipping method according to claim 1, wherein in step 2, feature statistics is performed in batches on feature maps corresponding to output channels of feature layers in the deep neural network model; for the ith feature layer, the statistical features of the feature maps corresponding to all the output channels of the ith feature layer comprise a mean vectorAnd a standard deviation vector S _ viThe calculation steps are as follows:
s11: initializing; setting NsumCounting the number of processed samples, and initializing to 0; n is a radical ofbatchAs the number of batches, NbatchCeil (total number of samples/N), N being the number of samples in a batch, ceil (·) being an upward rounding function; n isbatchIs the count of the current batch, initialized to 1; initializing a mean vectorAnd the standard deviation vector S _ viIs 1 XCiA zero vector of (2), wherein CiThe number of output channels of the ith feature layer;
s12: will n bebatchOutput X of ith characteristic layer corresponding to each batch of samplesiExpressed as one size of NxCi×Hi×WiOf the four-dimensional tensor, wherein HiAnd WiThe height and the width of a characteristic diagram corresponding to an output channel of the ith characteristic layer respectively; the jth output channel feature map X of the feature layer corresponding to the kth sample in the batchikjIs one size of Hi×WiOf a two-dimensional matrix,k=1,2,…,N,j=1,2,…,Ci
S13: handle XiEach of which is a feature map XikjStretching from two-dimensional matrix to one-dimensional vector X* ikj
S14: calculating X* ikjStatistical characteristics of (1), including mean valuesAnd standard deviation Sikj
Wherein,represents X* ikjThe m-th element;
s15: byForm a size of NxCiIs a mean matrixFrom Sikj,k=1,2,…,N,j=1,2,…,CiForm a size of NxCiIs a standard deviation matrix S _ mi
S16: matrix of mean valuesAnd standard deviation matrix S _ miCarrying out equalization processing according to channels, wherein the processing formula is as follows:
Nsum=Nsum+N (5)
wherein,is thatA row vector corresponding to the k-th row;is S _ miA row vector corresponding to the k-th row;
s17: judging whether the current batch is the last batch, namely whether n is availablebatch=NatchIf yes, the batch cycle is terminated, and the current mean vectorAnd the standard deviation vector S _ viThe statistical characteristics of the characteristic graphs corresponding to all output channels of the characteristic layer are obtained; otherwise, updating the count of the current batch: n isbatch=nbatch+1, jump to S12 and continue execution.
3. The method according to claim 2, wherein in step 2, the judgment index of the jth output channel in the ith feature layer is determined according to the feature map statistical feature-based deep neural network model clipping methodThe calculation formula of (a) is as follows:
wherein,and S _ vijIs the mean vector corresponding to the feature layerAnd the standard deviation vector S _ viThe j-th element in the feature layer represents the mean and standard deviation of the feature map corresponding to the j-th output channel in the feature layer, α and β are two scale factors, and epsilon is a minimum value.
4. The method for deep neural network model clipping based on feature map statistical features according to claim 3, wherein in the step 3, for the ith feature layer LiIf, ifThen the jth output channel in the feature layer is determined to be unimportant and the channel and its corresponding parameters are removed.
5. The feature map statistical feature-based deep neural network model clipping method of claim 4, wherein L is for the ith feature layeriThe steps of removing the unimportant channels and their corresponding parameters are as follows:
s31: recording corresponding evaluation criteriaOf (2) a set of channels RiSet RiThe number of the elements in (A) is recorded as length (R)i);
S32: a feature layer LiOf the convolution kernel WiExpressed as a size Ci-1×Ci×Khi×KwiOf the four-dimensional tensor, convolution kernel WiCorresponding offset BiIs one with the size of 1 XCiVector of (2), wherein Ci-1Represents the last feature layer Li-1If the number of output channels isLiIs the first feature layer, then Ci-1Representing the number of channels of the sample input, KhiAnd KwiAre respectively a convolution kernel WiHeight and width of; removing convolution kernel WiIn the set RiForm a new convolution kernel Wi *Of a size ofWith a new convolution kernel Wi *Instead of the convolution kernel Wi(ii) a Removing bias BiIn the set RiForming a new offset for the corresponding element of the channelIt has a size of
S33: if the characteristic layer LiNext layer L ofi+1Or a feature layer, the next layer Li+1Of the convolution kernel Wi+1Expressed as a size Ci×Ci+1×Kh(i+1)×Kw(i+1)Of the four-dimensional tensor, wherein Ci+1Represents the next layer Li+1Number of output channels, Kh(i+1)And Kw(i+1)Are respectively a convolution kernel Wi+1Height and width of; removing convolution kernel Wi+1In the set RiForm a new convolution kernelIt has a size ofWith new convolution kernelsInstead of the convolution kernel Wi+1
S34: if the characteristic layerLiNext layer L ofi+1Is a fully connected layer, the next layer L isi+1Parameter V ofi+1Expressed as one size (C)i×Khi×Kwi)×Ci+1A matrix of (a); removing parameter Vi+1In the set RiForming a new parameter for the corresponding element of the channelIt has a size ofUsing new parametersInstead of the parameter Vi+1
CN201811440153.2A 2018-11-29 2018-11-29 A kind of deep neural network model method of cutting out based on characteristic pattern statistical nature Pending CN109472352A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811440153.2A CN109472352A (en) 2018-11-29 2018-11-29 A kind of deep neural network model method of cutting out based on characteristic pattern statistical nature

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811440153.2A CN109472352A (en) 2018-11-29 2018-11-29 A kind of deep neural network model method of cutting out based on characteristic pattern statistical nature

Publications (1)

Publication Number Publication Date
CN109472352A true CN109472352A (en) 2019-03-15

Family

ID=65674220

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811440153.2A Pending CN109472352A (en) 2018-11-29 2018-11-29 A kind of deep neural network model method of cutting out based on characteristic pattern statistical nature

Country Status (1)

Country Link
CN (1) CN109472352A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109978069A (en) * 2019-04-02 2019-07-05 南京大学 The method for reducing ResNeXt model over-fitting in picture classification
CN110119811A (en) * 2019-05-15 2019-08-13 电科瑞达(成都)科技有限公司 A kind of convolution kernel method of cutting out based on entropy significance criteria model
CN110232436A (en) * 2019-05-08 2019-09-13 华为技术有限公司 Pruning method, device and the storage medium of convolutional neural networks
CN110309847A (en) * 2019-04-26 2019-10-08 深圳前海微众银行股份有限公司 A kind of model compression method and device
CN112036563A (en) * 2019-06-03 2020-12-04 国际商业机器公司 Deep learning model insights using provenance data
CN117636057A (en) * 2023-12-13 2024-03-01 石家庄铁道大学 Train bearing damage classification and identification method based on multi-branch cross-space attention model

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109978069A (en) * 2019-04-02 2019-07-05 南京大学 The method for reducing ResNeXt model over-fitting in picture classification
CN110309847A (en) * 2019-04-26 2019-10-08 深圳前海微众银行股份有限公司 A kind of model compression method and device
CN110232436A (en) * 2019-05-08 2019-09-13 华为技术有限公司 Pruning method, device and the storage medium of convolutional neural networks
CN110119811A (en) * 2019-05-15 2019-08-13 电科瑞达(成都)科技有限公司 A kind of convolution kernel method of cutting out based on entropy significance criteria model
CN110119811B (en) * 2019-05-15 2021-07-27 电科瑞达(成都)科技有限公司 Convolution kernel cutting method based on entropy importance criterion model
CN112036563A (en) * 2019-06-03 2020-12-04 国际商业机器公司 Deep learning model insights using provenance data
CN117636057A (en) * 2023-12-13 2024-03-01 石家庄铁道大学 Train bearing damage classification and identification method based on multi-branch cross-space attention model
CN117636057B (en) * 2023-12-13 2024-06-11 石家庄铁道大学 Train bearing damage classification and identification method based on multi-branch cross-space attention model

Similar Documents

Publication Publication Date Title
CN109472352A (en) A kind of deep neural network model method of cutting out based on characteristic pattern statistical nature
Dai et al. Compressing neural networks using the variational information bottleneck
CN114937151B (en) Lightweight target detection method based on multiple receptive fields and attention feature pyramid
CN108510012B (en) Target rapid detection method based on multi-scale feature map
CN113449864B (en) Feedback type impulse neural network model training method for image data classification
CN103927531B (en) It is a kind of based on local binary and the face identification method of particle group optimizing BP neural network
CN110334580A (en) The equipment fault classification method of changeable weight combination based on integrated increment
Chang et al. Automatic channel pruning via clustering and swarm intelligence optimization for CNN
Paupamah et al. Quantisation and pruning for neural network compression and regularisation
CN111988329B (en) Network intrusion detection method based on deep learning
CN111898689A (en) Image classification method based on neural network architecture search
CN110442143B (en) Unmanned aerial vehicle situation data clustering method based on combined multi-target pigeon swarm optimization
CN111047078B (en) Traffic characteristic prediction method, system and storage medium
Ullah et al. About pyramid structure in convolutional neural networks
CN104050505A (en) Multilayer-perceptron training method based on bee colony algorithm with learning factor
CN107563430A (en) A kind of convolutional neural networks algorithm optimization method based on sparse autocoder and gray scale correlation fractal dimension
CN112884149A (en) Deep neural network pruning method and system based on random sensitivity ST-SM
CN110298434A (en) A kind of integrated deepness belief network based on fuzzy division and FUZZY WEIGHTED
Ma et al. A survey of sparse-learning methods for deep neural networks
CN101833670B (en) Image matching method based on lateral inhibition and chaos quantum particle swarm optimization
CN117726939A (en) Hyperspectral image classification method based on multi-feature fusion
CN117154256A (en) Electrochemical repair method for lithium battery
CN117590173A (en) Cable partial discharge pattern recognition method based on convolutional neural network
Chin et al. A high-performance adaptive quantization approach for edge CNN applications
Hollósi et al. Improve the accuracy of neural networks using capsule layers

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination