CN109472352A - A deep neural network model tailoring method based on feature map statistical features - Google Patents

A deep neural network model tailoring method based on feature map statistical features Download PDF

Info

Publication number
CN109472352A
CN109472352A CN201811440153.2A CN201811440153A CN109472352A CN 109472352 A CN109472352 A CN 109472352A CN 201811440153 A CN201811440153 A CN 201811440153A CN 109472352 A CN109472352 A CN 109472352A
Authority
CN
China
Prior art keywords
feature
layer
batch
feature layer
convolution kernel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811440153.2A
Other languages
Chinese (zh)
Inventor
周彦
刘广毅
王冬丽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiangtan University
Original Assignee
Xiangtan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiangtan University filed Critical Xiangtan University
Priority to CN201811440153.2A priority Critical patent/CN109472352A/en
Publication of CN109472352A publication Critical patent/CN109472352A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Mathematical Optimization (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Computational Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Image Analysis (AREA)

Abstract

本发明公开了一种基于特征图统计特征的深度神经网络模型裁剪方法,实现步骤为:步骤1、针对深度神经网络模型中的特征层,计算其各个输出通道对应的特征图的统计特征;其中特征层由卷积层和激活层两部分组成,或是由卷积层、归一化层和激活层三部分组成;步骤2、根据特征层中各输出通道对应的特征图的统计特征,计算特征层中各输出通道的评判指标;步骤3、根据评判指标判断特征层中各输出通道的重要性,将不重要的输出通道及其对应的参数进行移除。本发明能够有效降低神经网络特征层的维度,提高了网络模型的运行效率,同时能够减少网络规模,并且对精度产生的影响较小。

The invention discloses a deep neural network model cutting method based on statistical features of feature maps. The implementation steps are as follows: Step 1. For the feature layers in the deep neural network model, calculate the statistical features of the feature maps corresponding to each output channel; wherein The feature layer consists of two parts: the convolution layer and the activation layer, or consists of three parts: the convolution layer, the normalization layer and the activation layer; step 2, according to the statistical features of the feature maps corresponding to each output channel in the feature layer, calculate The evaluation index of each output channel in the feature layer; Step 3, according to the evaluation index, determine the importance of each output channel in the feature layer, and remove the unimportant output channels and their corresponding parameters. The invention can effectively reduce the dimension of the feature layer of the neural network, improve the operation efficiency of the network model, and at the same time, the network scale can be reduced, and the impact on the accuracy is small.

Description

一种基于特征图统计特征的深度神经网络模型裁剪方法A deep neural network model tailoring method based on feature map statistical features

技术领域technical field

本发明属于人工智能、模式识别领域,具体涉及深度神经网络模型压缩。The invention belongs to the fields of artificial intelligence and pattern recognition, and in particular relates to deep neural network model compression.

背景技术Background technique

深度学习(deep learning)作在解决高级抽象认知问题上有着显著的成果,使人工智能上了一个新台阶,为高精度、多种类的目标检测、识别与跟踪提供了技术基础。但是复杂的运算,庞大的资源需求,使得神经网络只能在高性能计算平台上部署,限制了在移动设备上的应用。2015年,Han发表的Deep Compression将网络模型裁剪、权值共享和量化、编码等方式运用在模型压缩上,使得模型存储实现了很好的效果,也引起了研究人员对网络压缩方法的研究。目前深度学习模型压缩方法的研究主要可以分为以下几个方向:Deep learning has achieved remarkable results in solving high-level abstract cognitive problems, bringing artificial intelligence to a new level and providing a technical foundation for high-precision, multi-type target detection, recognition and tracking. However, complex computing and huge resource requirements make neural networks only deployed on high-performance computing platforms, limiting their application on mobile devices. In 2015, Deep Compression published by Han applied network model clipping, weight sharing, quantization, and encoding to model compression, which achieved good results in model storage and led researchers to study network compression methods. At present, the research on deep learning model compression methods can be mainly divided into the following directions:

(1)更精细模型的设计,使用更加细致、高效的模型设计,能够很大程度的减少模型尺寸,并且也具有不错的性能。(1) The design of finer models, using more detailed and efficient model design, can greatly reduce the size of the model, and also has good performance.

(2)模型裁剪,结构复杂的网络具有非常好的性能,其参数也存在冗余,因此对于已训练好的模型网络,通常是寻找一种有效的评判手段,来判断参数的重要性,将不重要的连接或者卷积核进行裁剪来减少模型的冗余。(2) Model tailoring. Networks with complex structures have very good performance, and their parameters are also redundant. Therefore, for a trained model network, an effective evaluation method is usually found to judge the importance of parameters. Unimportant connections or convolution kernels are cropped to reduce model redundancy.

(3)核的稀疏化,在训练过程中,对权重的更新进行诱导,使其更加稀疏,对于稀疏矩阵,可以使用更加紧致的存储方式,但是使用稀疏矩阵操作在硬件平台上运算效率不高,容易受到带宽的影响,因此加速并不明显。(3) The sparseness of the kernel. During the training process, the update of the weight is induced to make it more sparse. For the sparse matrix, a more compact storage method can be used, but the operation efficiency of the sparse matrix operation on the hardware platform is not high. High, vulnerable to bandwidth, so the speedup is not noticeable.

对预训练好的网络模型进行裁剪的方法,是目前模型压缩中使用最多的方法,通常是寻找一种有效的评判指标,来判断神经元或是特征图的重要性,将不重要的连接或者卷积核进行裁剪来减少模型的冗余。Li提出了基于量级的裁剪方式,用权重的绝对值之和来评判其重要性,以卷积核中所有权重的绝对值之和来作为该卷积核的评价指标。Hu定义了一个变量APoZ(Average Percentage of Zeros)来衡量每一个卷积核中激活为0的值的数量,作为评价一个卷积核是否重要的标准。Luo提出了一种基于熵值的裁剪方式,利用熵值来判定卷积核的重要性。Anwar采取一种随机裁剪的方式,然后对于每一种随机方式统计模型的性能,来确定局部最优的裁剪方式。Tian经过LDA分析发现对于每一个类别,有很多卷积核之间的激活是高度不相关的,因此可以利用这点来剔除大量的只具有少量信息的filter而不影响模型的性能。The method of trimming the pre-trained network model is currently the most used method in model compression. It is usually to find an effective evaluation index to judge the importance of neurons or feature maps, and connect unimportant or The convolution kernel is pruned to reduce the redundancy of the model. Li proposed a clipping method based on magnitude, using the sum of the absolute values of the weights to judge its importance, and using the sum of the absolute values of all the weights in the convolution kernel as the evaluation index of the convolution kernel. Hu defines a variable APoZ (Average Percentage of Zeros) to measure the number of values activated to 0 in each convolution kernel, as a criterion for evaluating whether a convolution kernel is important. Luo proposed a cropping method based on entropy value, which uses entropy value to determine the importance of convolution kernels. Anwar adopts a random clipping method, and then counts the performance of the model for each random method to determine the locally optimal clipping method. Tian's LDA analysis found that for each category, the activations between many convolution kernels are highly irrelevant, so this can be used to eliminate a large number of filters with only a small amount of information without affecting the performance of the model.

综上所述,现有方案的局限性如下:To sum up, the limitations of existing solutions are as follows:

a.核的稀疏化方法仅仅考虑到网络的压缩存储,在运行过程中的压缩效果不明显,速度也没有明显的提升;a. The sparse method of the kernel only considers the compressed storage of the network, the compression effect during operation is not obvious, and the speed is not significantly improved;

b.用权重的大小作为评判指标仅仅考虑了权重本身的数值特性,没有考虑网络层的数据特征,压缩的效果不高;b. Using the size of the weight as the evaluation index only considers the numerical characteristics of the weight itself, and does not consider the data characteristics of the network layer, and the compression effect is not high;

c.评判指标的计算的复杂程度较高,耗费了较多的计算能力;c. The calculation of the evaluation index is more complicated and consumes a lot of computing power;

d.随机裁剪的方法随机性太强,容易破坏网络本身的参数特征;d. The random cropping method is too random, and it is easy to destroy the parameter characteristics of the network itself;

因此,有必要提供一种计算方法简单,能充分考虑网络中冗余,适用性强,不依赖专用加速库,实现深度神经网络压缩和加速的方法。Therefore, it is necessary to provide a method that is simple in calculation method, can fully consider the redundancy in the network, has strong applicability, and does not rely on a dedicated acceleration library to achieve deep neural network compression and acceleration.

发明内容SUMMARY OF THE INVENTION

为了解决现有技术的缺陷,本发明公开了一种基于特征图统计的深度神经网络模型裁剪方法。与其他压缩方法相比,本发明通过将神经网络特征层的多种统计特征作为评判标准,对网络的参数层进行裁剪,充分考虑了网络的数值特征和统计特性,取得了很好的压缩效率,同时提升了运行速度。In order to solve the defects of the prior art, the present invention discloses a deep neural network model trimming method based on feature map statistics. Compared with other compression methods, the present invention cuts the parameter layer of the network by using various statistical features of the feature layer of the neural network as the evaluation criteria, fully considers the numerical and statistical characteristics of the network, and achieves good compression efficiency. , while improving the running speed.

本发明所采用的技术方案为:The technical scheme adopted in the present invention is:

一种基于特征图统计特征的深度神经网络模型裁剪方法,包括以下步骤:A deep neural network model clipping method based on statistical features of feature maps, comprising the following steps:

步骤1、针对深度神经网络模型中的特征层,计算其各个输出通道对应的特征图的统计特征;其中特征层由卷积层和激活层两部分组成,或是由卷积层、归一化层(BatchNorm层)和激活层三部分组成;本发明只对后面还有储存参数的层(特征层/全连接层)的特征层进行计算;Step 1. For the feature layer in the deep neural network model, calculate the statistical features of the feature map corresponding to each output channel; the feature layer is composed of two parts: the convolution layer and the activation layer, or the convolution layer, normalization layer The layer (BatchNorm layer) and the activation layer are composed of three parts; the present invention only calculates the feature layer of the layer (feature layer/full connection layer) that stores parameters later;

步骤2、根据特征层中各输出通道对应的特征图的统计特征,计算特征层中各输出通道的评判指标;Step 2. Calculate the evaluation index of each output channel in the feature layer according to the statistical features of the feature map corresponding to each output channel in the feature layer;

步骤3、根据评判指标判断特征层中各输出通道的重要性,将不重要的输出通道及其对应的参数进行移除。Step 3: Determine the importance of each output channel in the feature layer according to the evaluation index, and remove the unimportant output channels and their corresponding parameters.

在深度神经网络的一个迭代步(epoch)中,样本分不同的批次送入神经网络进行计算。所述步骤2中,对网络模型中各特征层各个输出通道对应的特征图,分批次进行特征统计;对于第i个特征层,其所有输出通道对应的特征图的统计特征包括一个均值向量和一个标准差向量S_vi,计算步骤如下:In an iterative step (epoch) of the deep neural network, the samples are sent to the neural network in different batches for calculation. In the step 2, the feature maps corresponding to each output channel of each feature layer in the network model are subjected to feature statistics in batches; for the i-th feature layer, the statistical features of the feature maps corresponding to all output channels include a mean vector. and a standard deviation vector S_v i , the calculation steps are as follows:

S11:初始化;设置Nsum用来统计已经处理过的样本数量,初始化为0;Nbatch为批次数量,Nbatch=ceil(样本总数/N),N为一个批次中样本的数量,ceil(·)是向上取整函数;nbatch是当前批次的计数,初始化为1;初始化均值向量和标准差向量S_vi为一个1×Ci的零向量,其中,Ci为第i个特征层的输出通道数量;S11: Initialization; set N sum to count the number of samples that have been processed, initialized to 0; N batch is the number of batches, N batch = ceil (total number of samples/N), N is the number of samples in a batch, ceil ( ) is the round-up function; n batch is the count of the current batch, initialized to 1; initialized mean vector The sum standard deviation vector S_v i is a 1×C i zero vector, where C i is the number of output channels of the ith feature layer;

S12:将第nbatch个批次样本对应的第i个特征层的输出(features maps)Xi表示为一个大小为N×Ci×Hi×Wi的四维张量,其中,Hi和Wi分别为第i个特征层的输出通道对应的特征图(feature map)的高和宽;该批次中第k个样本对应的该特征层第j个输出通道特征图(feature map)Xikj为一个大小为Hi×Wi的二维矩阵,k=1,2,…,N,j=1,2,…,CiS12: Represent the output (features maps) X i of the i-th feature layer corresponding to the n-th batch of samples as a four-dimensional tensor of size N×C i ×H i ×W i , where H i and Wi are the height and width of the feature map (feature map) corresponding to the output channel of the ith feature layer; the feature map (feature map) X of the jth output channel of the feature layer corresponding to the kth sample in the batch ikj is a two-dimensional matrix of size H i ×W i , k=1, 2, . . . , N, j=1, 2, . . . , C i ;

S13:将Xi进行维度变换(view或reshape),得到大小为N×Ci×(Hi×Wi)的三维张量X* i,即把Xi中每一个特征图Xikj由二维矩阵拉伸为一维向量X* ikjS13: Perform dimension transformation (view or reshape) on X i to obtain a three-dimensional tensor X * i of size N×C i ×(H i ×W i ), that is, transform each feature map X ikj in X i by two The dimensional matrix is stretched to a one-dimensional vector X * ikj ;

S14:计算X* ikj的统计特征,包括均值和标准差Sikj(两者均为标量):S14: Compute statistical features of X * ikj , including mean and standard deviation S ikj (both scalars):

其中,表示X* kj中第m个元素;in, represents the mth element in X * kj ;

S15:由k=1,2,…,N,j=1,2,…,Ci构成一个大小N×Ci为均值矩阵由Sikj,k=1,2,…,N,j=1,2,…,Ci构成一个大小N×Ci为标准差矩阵S_miS15: by k=1, 2, ..., N, j=1, 2, ..., C i forms a matrix of size N × C i is the mean value A size N×C i is a standard deviation matrix S_m i formed by S ikj , k=1, 2,..., N, j=1, 2,..., C i ;

S16:对均值矩阵和标准差矩阵S_mi按通道进行均值化处理(均值滤波),处理公式如下:S16: For the mean matrix And the standard deviation matrix S_m i is averaged by channel (average filter), and the processing formula is as follows:

Nsum=Nsum+N (5)N sum = N sum + N (5)

其中,中第k行对应的行向量;是S_mi中第k行对应的行向量;in, Yes The row vector corresponding to the kth row in ; is the row vector corresponding to the kth row in S_m i ;

S17:判断当前批次是否为最后一个批次,即是否有nbatch=Nbatch,若是,则终止批次循环,当前的均值向量和标准差向量S_vi即为该特征层所有输出通道对应的特征图的统计特征;否则更新当前批次的计数:nbatch=nbatch+1,跳转到S12继续执行。S17: Determine whether the current batch is the last batch, that is, whether there are n batch = N batch , if so, terminate the batch cycle, and the current mean vector The sum standard deviation vector S_v i is the statistical feature of the feature map corresponding to all output channels of the feature layer; otherwise, update the count of the current batch: n batch = n batch +1, and jump to S12 to continue execution.

进一步地,所述步骤2中,第i个特征层中第j个输出通道的评判指标的计算公式如下:Further, in the step 2, the evaluation index of the jth output channel in the ith feature layer The calculation formula is as follows:

其中,和S_vij是该特征层对应的均值向量和标准差向量S_vi中第j个元素,表示该特征层中第j个输出通道对应的特征图的均值和标准差,α和β是两个比例因子(超参数),α是均值的阈值,当均值 随着值变小朝着负无穷的方向移动;反之当均值随着值变大向零方向移动;β是标准差S_vij的阈值,当标准差S_vij<β,随着S_vij值变小向零方向移动;反之,则随着S_vij值变大向正无穷移动。当均值和标准差S_vij的值较小时,α-子项起主导作用;当均值和标准差S_vij的值较大时,β-子项起主导作用。α和β的确定有两种方法,其一为根据经验值指定,超参数值的范围由低到高设定,根据每次设定的值计算评判指标,从而对神经网络模型进行裁剪,并对裁剪之后的模型重新训练以恢复精度,逐渐达到一个最优的效果(即在网络精度下降不超过设定阈值的情况下裁剪掉的通道数目最多);其二为比例缩放,即和β=η∑S_vij/Ci,μ和η是比例因子,取值范围为(0,0.4),该方法可以根据网络的参数进行动态的调整。ε是一个极小值,防止除数为零的情况。in, and S_v ij is the mean vector corresponding to the feature layer and the jth element in the standard deviation vector S_v i , representing the mean and standard deviation of the feature map corresponding to the jth output channel in the feature layer, α and β are two scale factors (hyperparameters), α is the mean threshold, when the mean along with The value becomes smaller and moves in the direction of negative infinity; otherwise, it is the mean value but along with The value becomes larger and moves to the zero direction; β is the threshold of the standard deviation S_v ij , when the standard deviation S_v ij <β, As the value of S_v ij becomes smaller, it moves to the zero direction; otherwise, the Move toward positive infinity as the value of S_v ij increases. when the mean When the value of and standard deviation S_v ij is small, the α-subterm plays a leading role; when the mean When the value of the standard deviation S_v ij is large, the β-subterm plays a leading role. There are two ways to determine α and β. One is to specify according to the empirical value. The range of hyperparameter values is set from low to high, and the evaluation index is calculated according to the value set each time, so as to tailor the neural network model, and Retrain the cropped model to restore the accuracy, and gradually achieve an optimal effect (that is, the maximum number of channels cropped when the network accuracy does not drop beyond the set threshold); the second is scaling, that is and β=η∑S_v ij /C i , μ and η are scale factors, the value range is (0, 0.4), the method can be dynamically adjusted according to the parameters of the network. ε is a minimal value that prevents division by zero.

进一步地,所述步骤3中,对于第i个特征层Li,若则判定该特征层中第j个输出通道不重要,将该通道及其对应的参数进行移除。Further, in the step 3, for the i -th feature layer Li, if Then it is determined that the jth output channel in the feature layer is not important, and the channel and its corresponding parameters are removed.

进一步地,对于第i个特征层Li,移除其中不重要的通道及其对应的参数的步骤如下:Further, for the ith feature layer Li, the steps of removing unimportant channels and their corresponding parameters are as follows:

S31:记录对应的评判标准的通道的集合Ri,集合Ri中的元素个数记为length(Ri);S31: Record the corresponding evaluation criteria The set R i of the channels of , the number of elements in the set R i is recorded as length(R i );

S32:将特征层Li的卷积核Wi表示为一个大小为Ci-1×Ci×Khi×Kwi的四维张量,卷积核Wi对应的偏置Bi为一个大小为1×Ci的向量,其中Ci-1表示上一个特征层Li-1的输出通道数,若Li为第一个特征层,则Ci-1表示样本输入的通道数,Khi和Kwi分别为卷积核的高度和宽度;移除卷积核Wi中属于集合Ri的通道对应的元素,形成一个新的卷积核其大小为用新的卷积核来代替卷积核Wi;移除偏置Bi中属于集合Ri的通道对应的元素,形成一个新的偏置其大小为 S32: Represent the convolution kernel Wi of the feature layer Li as a four-dimensional tensor of size C i-1 ×C i ×K hi ×K wi , and the bias B i corresponding to the convolution kernel Wi is a size is a 1×C i vector, where C i-1 represents the number of output channels of the previous feature layer Li-1 , if Li is the first feature layer, then C i -1 represents the number of sample input channels, K hi and Kwi are the height and width of the convolution kernel respectively; remove the elements corresponding to the channels belonging to the set Ri in the convolution kernel Wi to form a new convolution kernel Its size is with a new convolution kernel to replace the convolution kernel Wi ; remove the elements corresponding to the channels belonging to the set Ri in the bias Bi to form a new bias Its size is

S33:若特征层Li的下一层Li+1还是特征层,则将下一层Li+1的卷积核Wi+1表示为一个大小为Ci×Ci+1×Kh(i+1)×Kw(i+1)的四维张量,其中Ci+1表示下一层Li+1的输出通道数,Kh(i+1)和Kw(i+1)分别为卷积核Wi+1的高度和宽度;;移除卷积核Wi+1中属于集合Ri的通道对应的元素,形成一个新的卷积核其大小为用新的卷积核来代替卷积核Wi+1S33: If the next layer Li+1 of the feature layer Li is still a feature layer, express the convolution kernel Wi +1 of the next layer Li +1 as a size C i ×C i +1 ×K A four-dimensional tensor of h(i+1) ×K w(i+1) , where C i+1 represents the number of output channels of the next layer L i+1 , K h(i+1) and K w(i+ 1) are the height and width of the convolution kernel Wi +1 respectively; ; Remove the elements corresponding to the channels belonging to the set R i in the convolution kernel Wi +1 to form a new convolution kernel Its size is with a new convolution kernel to replace the convolution kernel Wi +1 ;

S34:若特征层Li的下一层Li+1是全连接层,则将下一层Li+1的参数Vi+1表示为一个大小(Ci×Khi×Kwi)×Ci+1的矩阵;移除参数Vi+1中属于集合Ri的通道对应的元素,形成一个新的参数其大小为用新的参数来代替参数Vi+1S34: If the next layer Li+1 of the feature layer Li is a fully connected layer, express the parameter Vi +1 of the next layer Li+1 as a size (C i × K hi ×K wi )× The matrix of C i+1 ; remove the elements corresponding to the channels belonging to the set R i in the parameter V i+1 to form a new parameter Its size is with new parameters to replace the parameter V i+1 .

深度神经网络在完成了模型裁剪之后,需要经过几次迭代来重新训练使得网络的精度得以恢复,迭代的次数与修剪的特征层、以及评判准则有关。修剪的特征层靠近输入层,迭代的次数较少;修剪的特征层靠近输出层,迭代的次数较多。评判准则中,α和β的值越高,需要更多的迭代次数来恢复网络的精度。After completing the model pruning, the deep neural network needs to undergo several iterations to retrain to restore the accuracy of the network. The number of iterations is related to the pruned feature layer and the evaluation criteria. The pruned feature layer is close to the input layer, and the number of iterations is less; the pruned feature layer is close to the output layer, and the number of iterations is more. In the evaluation criteria, the higher the values of α and β, the more iterations are required to restore the accuracy of the network.

有益效果:Beneficial effects:

与已有技术相比,本发明充分利用了深度神经网络的统计特征,构建了基于均值和标准差的评判指标,能够有效降低神经网络特征层的维度,提升了深度神经网络的训练速度,减少了深度神经网络的框架规模和权重数量,提高了深度神经网络的运行速度/效率,并且对精度产生的影响较小。具体有以下的特点与效果:Compared with the prior art, the present invention makes full use of the statistical features of the deep neural network, and constructs an evaluation index based on the mean and standard deviation, which can effectively reduce the dimension of the feature layer of the neural network, improve the training speed of the deep neural network, and reduce the The frame size and number of weights of the deep neural network are improved, and the running speed/efficiency of the deep neural network is improved, and the impact on the accuracy is small. It has the following features and effects:

第一、本发明在构造卷积核裁剪的评判指标时,考虑了神经网络的统计特征,利用特征层的均值和标准差来兼顾神经网络值的特征和特征层内的特征。通过特征层的数据特征反映卷积核参数的作用效果,进而将表现不良的特征图及相对应的卷积核裁剪掉,实现了网络模型框架的缩小和参数量的压缩。First, the present invention considers the statistical characteristics of the neural network when constructing the evaluation index for convolution kernel clipping, and uses the mean and standard deviation of the feature layer to take into account the features of the neural network value and the features in the feature layer. The data features of the feature layer reflect the effect of the parameters of the convolution kernel, and then the feature maps and corresponding convolution kernels with poor performance are cut out, which realizes the reduction of the network model framework and the compression of the parameter amount.

第二、评判准则公式中,可以灵活的设定超参数α和β,来改变移除的通道数目;当统计特征落在靠近0的地方,α-子项会起到主导作用;相反,当统计特征落在远离0的区域,β-子项会起到主导作用。Second, in the criterion formula, hyperparameters α and β can be flexibly set to change the number of removed channels; when the statistical features fall close to 0, the α-sub-item will play a leading role; on the contrary, when Statistical features fall in the region far from 0, and the β-subterm will play a dominant role.

第三、本发明在充分考虑了神经网络的统计特征的基础上,提出了新的评判准则,算法的复杂度较低,性能较好,可以在实时网络以及嵌入式设备上进行部署。Third, the present invention proposes a new evaluation criterion on the basis of fully considering the statistical characteristics of the neural network. The algorithm has lower complexity and better performance, and can be deployed on real-time networks and embedded devices.

附图说明Description of drawings

图1是本发明的流程示图;Fig. 1 is the flow chart of the present invention;

图2是特征层的结构示意图;FIG. 2 is a schematic structural diagram of a feature layer;

图3是特征层内部结构示意图;3 is a schematic diagram of the internal structure of the feature layer;

图4是神经网络中特征层的存在形式示例;图4(a)是多个连续的特征层,图4(b)是单个特征层;Fig. 4 is an example of the existence form of the feature layer in the neural network; Fig. 4(a) is a plurality of continuous feature layers, and Fig. 4(b) is a single feature layer;

图5是本发明设计的总体框图;Fig. 5 is the overall block diagram designed by the present invention;

图6是本发明模型挑选与裁剪的示意图;Fig. 6 is the schematic diagram of model selection and cutting of the present invention;

具体实施方式Detailed ways

下面结合具体实施例对本发明进行详细说明,以下实施例将有助于本领域的技术人员进一步理解本发明。下面通过参考附图描述的实例是示例性的,旨在用于解释本发明,而不能理解为对本发明的限制。The present invention will be described in detail below with reference to specific embodiments, and the following embodiments will help those skilled in the art to further understand the present invention. The examples described below with reference to the accompanying drawings are exemplary and are intended to explain the present invention, but should not be construed as limiting the present invention.

图1是本实例的一种基于特征图统计的深度神经网络模型裁剪方法流程示意图,通过对特定的特征图及相对应的卷积核进行移除,实现网络模型框架的缩小和参数量的压缩,针对模型的裁剪方法,具体实现步骤为:Figure 1 is a schematic flowchart of a deep neural network model clipping method based on feature map statistics in this example. By removing specific feature maps and corresponding convolution kernels, the network model framework is reduced and the parameter amount is compressed. , for the model cutting method, the specific implementation steps are:

(1)针对深度神经网络中的特征层,依次计算特征层中每个特征图的统计特征;(1) For the feature layer in the deep neural network, calculate the statistical features of each feature map in the feature layer in turn;

(2)根据统计特征构造评判准则;(2) Construct judging criteria according to statistical characteristics;

(3)将不符合评判准则的特征图及其相对应的卷积核进行移除。(3) The feature maps that do not meet the evaluation criteria and their corresponding convolution kernels are removed.

需要说明的是,本发明实例的操作对象是已经训练收敛的深度神经网络的特征层,其中特征层是由卷积层和激活层(或可描述为激活函数、非线性层)两部分组合而成,或是由卷积层、BatchNorm层(归一化层)和激活层三部分组合而成,如图2所示。对于网络的类型及模块,包括但不限于卷积层,批归一化层,激活层,全连接层,Resnet模块。在深度神经网络框架中,特征层的内部结构示意图如图3所示,第i个特征层为Li,第i个特征层的卷积核为Wi,卷积核Wi对应的偏置为Bi。本发明只对后面还有储存参数的层(特征层/全连接层)的特征层进行计算,比如图4(a)中除最后一个特征层之外的其它特征层,而不对后面没有储存参数的层的特征层进行计算,比如图4(b)中的特征层,即若特征层后面只有池化层、归一化层、激活层或softmax层,则不对这种特征层进行操作。It should be noted that the operation object of the example of the present invention is the feature layer of the deep neural network that has been trained and converged, wherein the feature layer is a combination of the convolution layer and the activation layer (or can be described as an activation function, a nonlinear layer) two parts. It is composed of three parts: convolution layer, BatchNorm layer (normalization layer) and activation layer, as shown in Figure 2. For network types and modules, including but not limited to convolutional layers, batch normalization layers, activation layers, fully connected layers, and Resnet modules. In the deep neural network framework, the schematic diagram of the internal structure of the feature layer is shown in Figure 3. The ith feature layer is Li, the convolution kernel of the ith feature layer is Wi , and the bias corresponding to the convolution kernel Wi is B i . The present invention only calculates the feature layer of the layer (feature layer/fully connected layer) that has stored parameters in the back, such as other feature layers except the last feature layer in Fig. 4(a), but does not store parameters later. The feature layer of the layer is calculated, such as the feature layer in Figure 4(b), that is, if there is only a pooling layer, a normalization layer, an activation layer or a softmax layer behind the feature layer, this feature layer will not be operated.

在深度神经网络的一个迭代步(epoch)中,样本分不同的批次送入神经网络进行计算。对网络模型中特征图的统计特征,分批次进行特征统计,以下以第i个特征层Li为例进行说明,实现步骤如下:In an iterative step (epoch) of the deep neural network, the samples are sent to the neural network in different batches for calculation. For the statistical features of the feature map in the network model, feature statistics are performed in batches. The following takes the i -th feature layer Li as an example to illustrate, and the implementation steps are as follows:

S31:初始化中间变量。Nsum用来统计已经处理过的样本数量,初始化为0;Nbatch为批次数量,Nbatch=ceil(样本总数/N),N是一个批次中样本的数量,ceil(·)是向上取整函数;nbatch是当前批次的计数,初始化为0;均值和标准差Sikj是标量,k=1,2,…,N,j=1,2,…,Ci,初始化为0;均值矩阵和标准差矩阵S_mi的初始化为大小是(N,Ci)的零矩阵,而均值向量和标准差向量S_vi的初始化为大小是(1,Ci)的零向量,Ci为特征层Li的输出通道数量。S31: Initialize intermediate variables. N sum is used to count the number of samples that have been processed, initialized to 0; N batch is the number of batches, N batch = ceil (total number of samples/N), N is the number of samples in a batch, ceil( ) is upward Rounding function; n batch is the count of the current batch, initialized to 0; mean and standard deviation S ikj are scalars, k = 1, 2, ..., N, j = 1, 2, ..., C i , initialized to 0; mean matrix and the standard deviation matrix S_m i is initialized as a zero matrix of size (N, C i ), while the mean vector The sum standard deviation vector S_v i is initialized as a zero vector of size (1, C i ), where C i is the number of output channels of the feature layer Li .

S32:将第nbatch个批次样本对应的第i个特征层的Li输出进行view或reshape(维度变换)得到大小由(N,Ci,Hi,Wi)变为(N,Ci,Hi*Wi),相当于把二维的特征图Xikj拉伸为一维的表示X* ikj,k∈[1,N]是第i个特征层Li中样本的索引,j∈[1,Ci]是第i个特征层Li中第j个通道的特征图的索引,需要强调的是特征图Xikj表示第i个特征层Li中第j个通道的(Hi,Wi)大小的元素集合,特征图X* ikj表示特征层中第j个通道的(Hi*Wi)大小的元素集合;S32: Perform view or reshape (dimension transformation) on the Li output of the i -th feature layer corresponding to the n-th batch of samples to obtain The size is changed from (N, C i , H i , Wi ) to (N, C i , H i *W i ) , which is equivalent to stretching the two-dimensional feature map X ikj into a one-dimensional representation X * ikj , k∈[1,N] is the index of the sample in the ith feature layer Li, j∈[1,C i ] is the index of the feature map of the jth channel in the ith feature layer Li , it needs to be emphasized is the feature map X ikj represents the element set of size (H i , Wi ) of the j-th channel in the i - th feature layer Li, and the feature map X * ikj represents the feature layer The set of elements of size (H i *W i ) of the jth channel in ;

S33:计算特征层中第j个通道特征图X* ikj的统计特征:均值和标准差SikjS33: Calculate the feature layer Statistical features of the jth channel feature map X * ikj in: mean and standard deviation S ikj :

针对特征层其中的任一特征图X* ikj,均采用均值和标准差Sikj作为特征图X* ikj的统计特征。在特征层的一个批次中,可产生对应大小为(N,Ci)的均值矩阵和标准差矩阵S_miFor feature layer For any feature map X * ikj , the mean value is used and standard deviation S ikj as statistical features of feature map X * ikj . at the feature level In a batch of , a mean matrix of size (N, C i ) can be generated and standard deviation matrix S_m i ;

S34:对均值矩阵和标准差矩阵S_mi按通道进行均值化处理,其实现如下:S34: For the mean matrix and the standard deviation matrix S_m i are averaged by channel, which is implemented as follows:

Nsum=Nsum+N (5)N sum = N sum + N (5)

其中,Nsum是前nbatch批样本数量的叠加,用来统计已经处理过的样本数量;N是第nbatch个批次的样本数量;是均值矩阵进行均值滤波后的结果,是第k个样本对应的所有通道的均值;S_vi是标准差矩阵S_mi进行均值滤波后的结果,是第k个样本对应的所有通道的标准差。Among them, N sum is the superposition of the number of samples in the first n batches, which is used to count the number of samples that have been processed; N is the number of samples in the nth batch ; is the mean matrix The result after mean filtering, is the mean of all channels corresponding to the kth sample; S_v i is the result of mean filtering by the standard deviation matrix S_m i , is the standard deviation of all channels corresponding to the kth sample.

S35:更新当前批次:nbatch=nbatch+1,如果nbatch=Nbatch,那么终止批次循环;否则均值和标准差Sikj置为0;均值矩阵和标准差矩阵S_mi重置为零矩阵,而均值向量和标准差向量S_vi也重置为零向量,跳转到S32继续执行。根据上述批次迭代得到的均值向量和标准差向量S_vi,对特征层Li的第j个通道的评判指标计算过程如下:S35: Update the current batch: n batch = n batch +1, if n batch = N batch , then terminate the batch cycle; otherwise, the mean and standard deviation S ikj set to 0; mean matrix and standard deviation matrix S_m i reset to zero matrix, while mean vector The sum standard deviation vector S_v i is also reset to the zero vector, and jumps to S32 to continue the execution. The mean vector obtained by iterating over the above batches and the standard deviation vector S_v i , the evaluation index for the jth channel of the feature layer Li The calculation process is as follows:

其中,和S_vij是特征层Li中第j个通道的均值和标准差,α和β是两个超参数,这两个比例因子用来限定均值和标准差S_vij的起作用范围,ε是一个极小值,防止除数为零的存在。在超参数较小的情况下,采用比例缩放,即和β=η∑S_vij/Ci,μ和η是比例因子,取值范围为(0,0.4),当超参数变大之后,采用逐步逼近的方法,超参数值的范围由低到高,逐渐达到一个最优的效果。in, and S_v ij are the mean and standard deviation of the jth channel in the feature layer Li, α and β are two hyperparameters, these two scale factors are used to define the mean And the working range of standard deviation S_v ij , ε is a minimum value, preventing the existence of division by zero. In the case of small hyperparameters, scaling is used, i.e. and β=η∑S_v ij /C i , μ and η are scale factors, the value range is (0, 0.4), when the hyperparameter becomes larger, the step-by-step approximation method is adopted, and the range of the hyperparameter value is from low to high , and gradually achieve an optimal effect.

移除不符合评判准则的特征图及其相关联的参数,针对Ci个输出通道的特征层Li而言,计算得到评判指标记录对应的评判指标的通道的集合Ri,移除相对应的通道的步骤如下:Remove the feature maps that do not meet the evaluation criteria and their associated parameters, and calculate the evaluation index for the feature layer Li of the C i output channels Record the corresponding evaluation indicators The set of channels R i of , the steps of removing the corresponding channel are as follows:

S71:特征层Li的卷积核Wi的大小为(Ci-1,Ci,Khi,Kwi),Ci-1表示上一个特征层Li-1的输出通道数或样本输入的通道数(如果特征层Li为第一个特征层),Ci为当前特征层Li的输出通道数,Khi,Kwi为卷积核的尺寸。构建一个新的卷积核其大小为 表示从Ci中减去集合Ri包含的元素数量后的通道数(Ci-length(Ri))。将卷积核Wi的第1个维度上切取不属于集合Ri的通道对应的元素,复制到新的卷积核中,然后用新的卷积核来代替卷积核Wi;构建一个新的偏置其大小为将偏置Bi中切取不属于集合Ri的通道对应的元素,复制到新的偏置然后用新的偏置来代替偏置BiS71: The size of the convolution kernel Wi of the feature layer Li is (C i -1 , C i , K hi , K wi ), and C i -1 represents the number of output channels or samples of the previous feature layer Li-1 The number of input channels (if the feature layer Li is the first feature layer), C i is the number of output channels of the current feature layer Li, K hi , K wi are the size of the convolution kernel. Build a new convolution kernel Its size is Indicates the number of channels (C i -length(R i )) after subtracting the number of elements contained in the set R i from C i . Cut out the elements corresponding to the channels that do not belong to the set Ri from the first dimension of the convolution kernel Wi, and copy them to the new convolution kernel , and then use the new convolution kernel to replace the convolution kernel Wi ; construct a new bias Its size is Cut out the elements corresponding to the channels that do not belong to the set Ri from the offset B i and copy them to the new offset then use the new bias instead of bias Bi ;

S72:如果特征层Li之后还存在特征层的话,对于下一个特征层Li+1的卷积核Wi+1的大小为(Ci,Ci+1,Kh(i+1)×Kw(i+1)),Ci+1表示下一个特征层Li+1的输出通道数。构建一个新的卷积核其大小为将卷积核Wi+1的第1个维度上切取不属于集合Ri的通道对应的元素,复制到新的卷积核中,然后用新的卷积核来代替卷积核Wi+1S72: If there is a feature layer after the feature layer Li, the size of the convolution kernel Wi+1 for the next feature layer Li+1 is (C i , C i +1 , K h(i+1) ×K w(i+1) ), C i+1 represents the number of output channels of the next feature layer Li+1 . Build a new convolution kernel Its size is Cut out the elements corresponding to the channels that do not belong to the set R i from the first dimension of the convolution kernel Wi +1 , and copy them to the new convolution kernel , and then use the new convolution kernel to replace the convolution kernel Wi +1 ;

S73:如果特征层Li之后是全连接层,连接层的输出通道数量为Ci+1,对应的参数大小为(Ci×Khi×Kwi,Ci+1)。构建一个新的参数其大小为将参数Vi+1的第1个维度上切取不属于集合Ri的通道对应的元素,复制到新的参数中,然后用新的参数来代替参数Vi+1S73: If the feature layer Li is followed by a fully connected layer, the number of output channels of the connection layer is C i +1 , and the corresponding parameter size is (C i ×K hi ×K wi , C i+1 ). build a new parameter Its size is Cut out the elements corresponding to the channels that do not belong to the set R i from the first dimension of the parameter V i+1 , and copy them to the new parameter , then use the new parameters to replace the parameter V i+1 ;

进一步地,深度神经网络在完成了模型裁剪之后,需要经过几次迭代来重新训练使得网络的精度得以恢复,迭代的次数与修剪的特征层、以及评判准则有关。修剪的特征层靠近输入层,迭代的次数较少;修剪的特征层靠近输出层,迭代的次数较多。评判准则中,α和β的值越高,需要更多的迭代次数来恢复网络的精度。Further, after completing the model pruning, the deep neural network needs to undergo several iterations to retrain to restore the accuracy of the network. The number of iterations is related to the pruned feature layer and the evaluation criteria. The pruned feature layer is close to the input layer, and the number of iterations is less; the pruned feature layer is close to the output layer, and the number of iterations is more. In the evaluation criteria, the higher the values of α and β, the more iterations are required to restore the accuracy of the network.

本技术领域的普通技术人员可以理解实现上述实施方法携带的全部或部分步骤是可以通过程序指令来实现,所述的程序可以存储于一种计算机可读存储介质中。以上对本发明的具体实施进行了描述。应当理解的是,本发明对特征层和全连接层进行压缩操作,因此,凡在本发明的精神实质与原理之内所做的任何修改、等同替换、改进等,均应包含在本发明保护的范围之内。Those skilled in the art can understand that all or part of the steps carried by the above implementation method can be implemented by program instructions, and the program can be stored in a computer-readable storage medium. The specific implementation of the present invention has been described above. It should be understood that the present invention compresses the feature layer and the fully connected layer. Therefore, any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention shall be included in the protection of the present invention. within the range.

Claims (5)

1.一种基于特征图统计特征的深度神经网络模型裁剪方法,其特征在于,为了提高网络的压缩效率与加速性能,针对深度神经网络进行的优化,实施以下步骤:1. a deep neural network model clipping method based on feature map statistical features, is characterized in that, in order to improve the compression efficiency and acceleration performance of the network, for the optimization carried out by deep neural network, the following steps are implemented: 步骤1、针对深度神经网络模型中的特征层,计算其各个输出通道对应的特征图的统计特征;其中特征层由卷积层和激活层两部分组成,或是由卷积层、归一化层和激活层三部分组成;Step 1. For the feature layer in the deep neural network model, calculate the statistical features of the feature map corresponding to each output channel; the feature layer is composed of two parts: the convolution layer and the activation layer, or the convolution layer, normalization layer The layer and the activation layer are composed of three parts; 步骤2、根据特征层中各输出通道对应的特征图的统计特征,计算特征层中各输出通道的评判指标;Step 2. Calculate the evaluation index of each output channel in the feature layer according to the statistical features of the feature map corresponding to each output channel in the feature layer; 步骤3、根据评判指标判断特征层中各输出通道的重要性,将不重要的输出通道及其对应的参数进行移除。Step 3: Determine the importance of each output channel in the feature layer according to the evaluation index, and remove the unimportant output channels and their corresponding parameters. 2.根据权利要求1所述的基于特征图统计特征的深度神经网络模型裁剪方法,其特征在于,所述步骤2中,对深度神经网络模型中各特征层各个输出通道对应的特征图,分批次进行特征统计;对于第i个特征层,其所有输出通道对应的特征图的统计特征包括一个均值向量和一个标准差向量S_vi,计算步骤如下:2. the deep neural network model clipping method based on feature map statistical feature according to claim 1, is characterized in that, in described step 2, to the feature map corresponding to each output channel of each feature layer in the deep neural network model, divide Perform feature statistics in batches; for the i-th feature layer, the statistical features of the feature maps corresponding to all output channels include a mean vector and a standard deviation vector S_v i , the calculation steps are as follows: S11:初始化;设置Nsum用来统计已经处理过的样本数量,初始化为0;Nbatch为批次数量,Nbatch=ceil(样本总数/N),N为一个批次中样本的数量,ceil(·)是向上取整函数;nbatch是当前批次的计数,初始化为1;初始化均值向量和标准差向量S_vi为一个1×Ci的零向量,其中,Ci为第i个特征层的输出通道数量;S11: Initialization; set N sum to count the number of samples that have been processed, initialized to 0; N batch is the number of batches, N batch = ceil (total number of samples/N), N is the number of samples in a batch, ceil ( ) is the round-up function; n batch is the count of the current batch, initialized to 1; initialized mean vector The sum standard deviation vector S_v i is a 1×C i zero vector, where C i is the number of output channels of the ith feature layer; S12:将第nbatch个批次样本对应的第i个特征层的输出Xi表示为一个大小为N×Ci×Hi×Wi的四维张量,其中,Hi和Wi分别为第i个特征层的输出通道对应的特征图的高和宽;该批次中第k个样本对应的该特征层第j个输出通道特征图Xikj为一个大小为Hi×Wi的二维矩阵,k=1,2,…,N,j=1,2,…,CiS12: Represent the output X i of the i-th feature layer corresponding to the n-th batch of samples as a four-dimensional tensor of size N×C i ×H i ×W i , where H i and Wi are respectively The height and width of the feature map corresponding to the output channel of the i-th feature layer; the feature map X ikj of the j-th output channel of the feature layer corresponding to the k-th sample in the batch is a two-dimensional H i ×W i dimensional matrix, k=1,2,...,N,j=1,2,...,C i ; S13:把Xi中每一个特征图Xikj由二维矩阵拉伸为一维向量X* ikjS13: Stretch each feature map X ikj in Xi from a two-dimensional matrix into a one-dimensional vector X * ikj ; S14:计算X* ikj的统计特征,包括均值和标准差SikjS14: Compute statistical features of X * ikj , including mean and standard deviation S ikj : 其中,表示X* ikj中第m个元素;in, represents the mth element in X * ikj ; S15:由构成一个大小N×Ci为均值矩阵由Sikj,k=1,2,…,N,j=1,2,…,Ci构成一个大小N×Ci为标准差矩阵S_miS15: by Form a matrix of size N × C i as the mean value A size N×C i is a standard deviation matrix S_m i formed by S ikj , k=1, 2,..., N, j=1, 2,..., C i ; S16:对均值矩阵和标准差矩阵S_mi按通道进行均值化处理,处理公式如下:S16: For the mean matrix And the standard deviation matrix S_m i is averaged by channel, and the processing formula is as follows: Nsum=Nsum+N (5)N sum = N sum + N (5) 其中,中第k行对应的行向量;是S_mi中第k行对应的行向量;in, Yes The row vector corresponding to the kth row in ; is the row vector corresponding to the kth row in S_m i ; S17:判断当前批次是否为最后一个批次,即是否有nbatch=Natch,若是,则终止批次循环,当前的均值向量和标准差向量S_vi即为该特征层所有输出通道对应的特征图的统计特征;否则更新当前批次的计数:nbatch=nbatch+1,跳转到S12继续执行。S17: Determine whether the current batch is the last batch, that is, whether there is n batch =N atch , if so, terminate the batch cycle, and the current mean vector The sum standard deviation vector S_v i is the statistical feature of the feature map corresponding to all output channels of the feature layer; otherwise, update the count of the current batch: n batch = n batch +1, and jump to S12 to continue execution. 3.根据权利要求2所述的基于特征图统计特征的深度神经网络模型裁剪方法,其特征在于,所述步骤2中,第i个特征层中第j个输出通道的评判指标的计算公式如下:3. the deep neural network model clipping method based on feature map statistical feature according to claim 2, is characterized in that, in described step 2, the evaluation index of the jth output channel in the ith feature layer The calculation formula is as follows: 其中,和S_vij是该特征层对应的均值向量和标准差向量S_vi中第j个元素,表示该特征层中第j个输出通道对应的特征图的均值和标准差,α和β是两个比例因子,ε是一个极小值。in, and S_v ij is the mean vector corresponding to the feature layer The jth element in the sum standard deviation vector S_v i represents the mean and standard deviation of the feature map corresponding to the jth output channel in the feature layer, α and β are two scale factors, and ε is a minimum value. 4.根据权利要求3所述的基于特征图统计特征的深度神经网络模型裁剪方法,其特征在于,所述步骤3中,对于第i个特征层Li,若则判定该特征层中第j个输出通道不重要,移除该通道及其对应的参数。4. the deep neural network model clipping method based on feature map statistical feature according to claim 3, is characterized in that, in described step 3, for the i -th feature layer Li, if Then it is determined that the jth output channel in the feature layer is not important, and the channel and its corresponding parameters are removed. 5.根据权利要求4所述的基于特征图统计特征的深度神经网络模型裁剪方法,其特征在于,对于第i个特征层Li,移除其中不重要的通道及其对应的参数的步骤如下:5. the deep neural network model clipping method based on feature map statistical feature according to claim 4, is characterized in that, for the i -th feature layer Li, the step of removing unimportant channel and its corresponding parameter is as follows : S31:记录对应的评判标准的通道的集合Ri,集合Ri中的元素个数记为length(Ri);S31: Record the corresponding evaluation criteria The set R i of the channels of , the number of elements in the set R i is recorded as length(R i ); S32:将特征层Li的卷积核Wi表示为一个大小为Ci-1×Ci×Khi×Kwi的四维张量,卷积核Wi对应的偏置Bi为一个大小为1×Ci的向量,其中Ci-1表示上一个特征层Li-1的输出通道数,若Li为第一个特征层,则Ci-1表示样本输入的通道数,Khi和Kwi分别为卷积核Wi的高度和宽度;移除卷积核Wi中属于集合Ri的通道对应的元素,形成一个新的卷积核Wi *,其大小为用新的卷积核Wi *来代替卷积核Wi;移除偏置Bi中属于集合Ri的通道对应的元素,形成一个新的偏置其大小为 S32: Represent the convolution kernel Wi of the feature layer Li as a four-dimensional tensor of size C i-1 ×C i ×K hi ×K wi , and the bias B i corresponding to the convolution kernel Wi is a size is a 1×C i vector, where C i-1 represents the number of output channels of the previous feature layer Li-1 , if Li is the first feature layer, then C i -1 represents the number of sample input channels, K hi and Kwi are the height and width of the convolution kernel Wi , respectively; remove the elements corresponding to the channels belonging to the set Ri in the convolution kernel Wi to form a new convolution kernel Wi * , the size of which is Replace the convolution kernel Wi with a new convolution kernel Wi * ; remove the element corresponding to the channel belonging to the set Ri in the bias B i to form a new bias Its size is S33:若特征层Li的下一层Li+1还是特征层,则将下一层Li+1的卷积核Wi+1表示为一个大小为Ci×Ci+1×Kh(i+1)×Kw(i+1)的四维张量,其中Ci+1表示下一层Li+1的输出通道数,Kh(i+1)和Kw(i+1)分别为卷积核Wi+1的高度和宽度;移除卷积核Wi+1中属于集合Ri的通道对应的元素,形成一个新的卷积核其大小为用新的卷积核来代替卷积核Wi+1S33: If the next layer Li+1 of the feature layer Li is still a feature layer, express the convolution kernel Wi +1 of the next layer Li +1 as a size C i ×C i +1 ×K A four-dimensional tensor of h(i+1) ×K w(i+1) , where C i+1 represents the number of output channels of the next layer L i+1 , K h(i+1) and K w(i+ 1) are the height and width of the convolution kernel Wi +1 respectively; remove the elements corresponding to the channels belonging to the set R i in the convolution kernel Wi +1 to form a new convolution kernel Its size is with a new convolution kernel to replace the convolution kernel Wi +1 ; S34:若特征层Li的下一层Li+1是全连接层,则将下一层Li+1的参数Vi+1表示为一个大小(Ci×Khi×Kwi)×Ci+1的矩阵;移除参数Vi+1中属于集合Ri的通道对应的元素,形成一个新的参数其大小为用新的参数来代替参数Vi+1S34: If the next layer Li+1 of the feature layer Li is a fully connected layer, express the parameter Vi +1 of the next layer Li+1 as a size (C i × K hi ×K wi )× The matrix of C i+1 ; remove the elements corresponding to the channels belonging to the set R i in the parameter V i+1 to form a new parameter Its size is with new parameters to replace the parameter V i+1 .
CN201811440153.2A 2018-11-29 2018-11-29 A deep neural network model tailoring method based on feature map statistical features Pending CN109472352A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811440153.2A CN109472352A (en) 2018-11-29 2018-11-29 A deep neural network model tailoring method based on feature map statistical features

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811440153.2A CN109472352A (en) 2018-11-29 2018-11-29 A deep neural network model tailoring method based on feature map statistical features

Publications (1)

Publication Number Publication Date
CN109472352A true CN109472352A (en) 2019-03-15

Family

ID=65674220

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811440153.2A Pending CN109472352A (en) 2018-11-29 2018-11-29 A deep neural network model tailoring method based on feature map statistical features

Country Status (1)

Country Link
CN (1) CN109472352A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109978069A (en) * 2019-04-02 2019-07-05 南京大学 The method for reducing ResNeXt model over-fitting in picture classification
CN110119811A (en) * 2019-05-15 2019-08-13 电科瑞达(成都)科技有限公司 A kind of convolution kernel method of cutting out based on entropy significance criteria model
CN110232436A (en) * 2019-05-08 2019-09-13 华为技术有限公司 Pruning method, device and the storage medium of convolutional neural networks
CN110309847A (en) * 2019-04-26 2019-10-08 深圳前海微众银行股份有限公司 A model compression method and device
CN112036563A (en) * 2019-06-03 2020-12-04 国际商业机器公司 Deep learning model insights using provenance data
CN117636057A (en) * 2023-12-13 2024-03-01 石家庄铁道大学 Train bearing damage classification and identification method based on multi-branch cross-space attention model

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109978069A (en) * 2019-04-02 2019-07-05 南京大学 The method for reducing ResNeXt model over-fitting in picture classification
CN110309847A (en) * 2019-04-26 2019-10-08 深圳前海微众银行股份有限公司 A model compression method and device
CN110232436A (en) * 2019-05-08 2019-09-13 华为技术有限公司 Pruning method, device and the storage medium of convolutional neural networks
CN110119811A (en) * 2019-05-15 2019-08-13 电科瑞达(成都)科技有限公司 A kind of convolution kernel method of cutting out based on entropy significance criteria model
CN110119811B (en) * 2019-05-15 2021-07-27 电科瑞达(成都)科技有限公司 Convolution kernel cutting method based on entropy importance criterion model
CN112036563A (en) * 2019-06-03 2020-12-04 国际商业机器公司 Deep learning model insights using provenance data
CN117636057A (en) * 2023-12-13 2024-03-01 石家庄铁道大学 Train bearing damage classification and identification method based on multi-branch cross-space attention model
CN117636057B (en) * 2023-12-13 2024-06-11 石家庄铁道大学 Train bearing damage classification and recognition method based on multi-branch cross-space attention model

Similar Documents

Publication Publication Date Title
CN109472352A (en) A deep neural network model tailoring method based on feature map statistical features
US11030528B1 (en) Convolutional neural network pruning method based on feature map sparsification
CN108986470B (en) Travel time prediction method for optimizing LSTM neural network by particle swarm optimization
CN113052211B (en) Pruning method based on characteristic rank and channel importance
CN110119811B (en) Convolution kernel cutting method based on entropy importance criterion model
CN114037844A (en) Global rank-aware neural network model compression method based on filter feature map
CN111063194A (en) A traffic flow forecasting method
CN110555989B (en) Xgboost algorithm-based traffic prediction method
CN108614997B (en) A Remote Sensing Image Recognition Method Based on Improved AlexNet
CN110334580A (en) The equipment fault classification method of changeable weight combination based on integrated increment
CN111461322A (en) A deep neural network model compression method
CN110006650A (en) A fault diagnosis method based on stack pruning sparse denoising autoencoder
CN112070128A (en) A Transformer Fault Diagnosis Method Based on Deep Learning
CN114021811B (en) Traffic prediction method based on attention improvement and computer medium
CN111488917A (en) Garbage image fine-grained classification method based on incremental learning
CN113344182A (en) Network model compression method based on deep learning
CN114154626B (en) Filter pruning method for image classification task
CN109444604A (en) A kind of DC/DC converter method for diagnosing faults based on convolutional neural networks
CN116316573A (en) Short-term power load prediction method based on nonstandard Bayesian algorithm optimization
CN110110915A (en) A kind of integrated prediction technique of the load based on CNN-SVR model
CN117349622A (en) Wind speed prediction method for wind farms based on hybrid deep learning mechanism
CN113947182A (en) Traffic flow prediction model construction method based on double-stage stack graph convolution network
Ma et al. A survey of sparse-learning methods for deep neural networks
CN117726939A (en) Hyperspectral image classification method based on multi-feature fusion
CN117669655A (en) Network intrusion detection deep learning model compression method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination