CN109472352A - A deep neural network model tailoring method based on feature map statistical features - Google Patents
A deep neural network model tailoring method based on feature map statistical features Download PDFInfo
- Publication number
- CN109472352A CN109472352A CN201811440153.2A CN201811440153A CN109472352A CN 109472352 A CN109472352 A CN 109472352A CN 201811440153 A CN201811440153 A CN 201811440153A CN 109472352 A CN109472352 A CN 109472352A
- Authority
- CN
- China
- Prior art keywords
- feature
- layer
- batch
- feature layer
- convolution kernel
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 31
- 238000003062 neural network model Methods 0.000 title claims abstract description 16
- 238000011156 evaluation Methods 0.000 claims abstract description 29
- 238000013528 artificial neural network Methods 0.000 claims abstract description 23
- 230000004913 activation Effects 0.000 claims abstract description 12
- 238000010606 normalization Methods 0.000 claims abstract description 6
- 239000011159 matrix material Substances 0.000 claims description 28
- 230000006835 compression Effects 0.000 claims description 13
- 238000007906 compression Methods 0.000 claims description 13
- 238000004364 calculation method Methods 0.000 claims description 9
- 230000001133 acceleration Effects 0.000 claims description 3
- 238000012545 processing Methods 0.000 claims description 2
- 238000005457 optimization Methods 0.000 claims 1
- 238000001994 activation Methods 0.000 description 8
- 230000000694 effects Effects 0.000 description 6
- 238000010586 diagram Methods 0.000 description 5
- 230000006870 function Effects 0.000 description 3
- 238000013473 artificial intelligence Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000001914 filtration Methods 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000013138 pruning Methods 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 238000009966 trimming Methods 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 231100000870 cognitive problem Toxicity 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 210000002569 neuron Anatomy 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 238000013139 quantization Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/082—Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Mathematical Optimization (AREA)
- Life Sciences & Earth Sciences (AREA)
- Molecular Biology (AREA)
- Pure & Applied Mathematics (AREA)
- Mathematical Analysis (AREA)
- Computational Mathematics (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Evolutionary Computation (AREA)
- Algebra (AREA)
- Databases & Information Systems (AREA)
- Image Analysis (AREA)
Abstract
本发明公开了一种基于特征图统计特征的深度神经网络模型裁剪方法,实现步骤为:步骤1、针对深度神经网络模型中的特征层,计算其各个输出通道对应的特征图的统计特征;其中特征层由卷积层和激活层两部分组成,或是由卷积层、归一化层和激活层三部分组成;步骤2、根据特征层中各输出通道对应的特征图的统计特征,计算特征层中各输出通道的评判指标;步骤3、根据评判指标判断特征层中各输出通道的重要性,将不重要的输出通道及其对应的参数进行移除。本发明能够有效降低神经网络特征层的维度,提高了网络模型的运行效率,同时能够减少网络规模,并且对精度产生的影响较小。
The invention discloses a deep neural network model cutting method based on statistical features of feature maps. The implementation steps are as follows: Step 1. For the feature layers in the deep neural network model, calculate the statistical features of the feature maps corresponding to each output channel; wherein The feature layer consists of two parts: the convolution layer and the activation layer, or consists of three parts: the convolution layer, the normalization layer and the activation layer; step 2, according to the statistical features of the feature maps corresponding to each output channel in the feature layer, calculate The evaluation index of each output channel in the feature layer; Step 3, according to the evaluation index, determine the importance of each output channel in the feature layer, and remove the unimportant output channels and their corresponding parameters. The invention can effectively reduce the dimension of the feature layer of the neural network, improve the operation efficiency of the network model, and at the same time, the network scale can be reduced, and the impact on the accuracy is small.
Description
技术领域technical field
本发明属于人工智能、模式识别领域,具体涉及深度神经网络模型压缩。The invention belongs to the fields of artificial intelligence and pattern recognition, and in particular relates to deep neural network model compression.
背景技术Background technique
深度学习(deep learning)作在解决高级抽象认知问题上有着显著的成果,使人工智能上了一个新台阶,为高精度、多种类的目标检测、识别与跟踪提供了技术基础。但是复杂的运算,庞大的资源需求,使得神经网络只能在高性能计算平台上部署,限制了在移动设备上的应用。2015年,Han发表的Deep Compression将网络模型裁剪、权值共享和量化、编码等方式运用在模型压缩上,使得模型存储实现了很好的效果,也引起了研究人员对网络压缩方法的研究。目前深度学习模型压缩方法的研究主要可以分为以下几个方向:Deep learning has achieved remarkable results in solving high-level abstract cognitive problems, bringing artificial intelligence to a new level and providing a technical foundation for high-precision, multi-type target detection, recognition and tracking. However, complex computing and huge resource requirements make neural networks only deployed on high-performance computing platforms, limiting their application on mobile devices. In 2015, Deep Compression published by Han applied network model clipping, weight sharing, quantization, and encoding to model compression, which achieved good results in model storage and led researchers to study network compression methods. At present, the research on deep learning model compression methods can be mainly divided into the following directions:
(1)更精细模型的设计,使用更加细致、高效的模型设计,能够很大程度的减少模型尺寸,并且也具有不错的性能。(1) The design of finer models, using more detailed and efficient model design, can greatly reduce the size of the model, and also has good performance.
(2)模型裁剪,结构复杂的网络具有非常好的性能,其参数也存在冗余,因此对于已训练好的模型网络,通常是寻找一种有效的评判手段,来判断参数的重要性,将不重要的连接或者卷积核进行裁剪来减少模型的冗余。(2) Model tailoring. Networks with complex structures have very good performance, and their parameters are also redundant. Therefore, for a trained model network, an effective evaluation method is usually found to judge the importance of parameters. Unimportant connections or convolution kernels are cropped to reduce model redundancy.
(3)核的稀疏化,在训练过程中,对权重的更新进行诱导,使其更加稀疏,对于稀疏矩阵,可以使用更加紧致的存储方式,但是使用稀疏矩阵操作在硬件平台上运算效率不高,容易受到带宽的影响,因此加速并不明显。(3) The sparseness of the kernel. During the training process, the update of the weight is induced to make it more sparse. For the sparse matrix, a more compact storage method can be used, but the operation efficiency of the sparse matrix operation on the hardware platform is not high. High, vulnerable to bandwidth, so the speedup is not noticeable.
对预训练好的网络模型进行裁剪的方法,是目前模型压缩中使用最多的方法,通常是寻找一种有效的评判指标,来判断神经元或是特征图的重要性,将不重要的连接或者卷积核进行裁剪来减少模型的冗余。Li提出了基于量级的裁剪方式,用权重的绝对值之和来评判其重要性,以卷积核中所有权重的绝对值之和来作为该卷积核的评价指标。Hu定义了一个变量APoZ(Average Percentage of Zeros)来衡量每一个卷积核中激活为0的值的数量,作为评价一个卷积核是否重要的标准。Luo提出了一种基于熵值的裁剪方式,利用熵值来判定卷积核的重要性。Anwar采取一种随机裁剪的方式,然后对于每一种随机方式统计模型的性能,来确定局部最优的裁剪方式。Tian经过LDA分析发现对于每一个类别,有很多卷积核之间的激活是高度不相关的,因此可以利用这点来剔除大量的只具有少量信息的filter而不影响模型的性能。The method of trimming the pre-trained network model is currently the most used method in model compression. It is usually to find an effective evaluation index to judge the importance of neurons or feature maps, and connect unimportant or The convolution kernel is pruned to reduce the redundancy of the model. Li proposed a clipping method based on magnitude, using the sum of the absolute values of the weights to judge its importance, and using the sum of the absolute values of all the weights in the convolution kernel as the evaluation index of the convolution kernel. Hu defines a variable APoZ (Average Percentage of Zeros) to measure the number of values activated to 0 in each convolution kernel, as a criterion for evaluating whether a convolution kernel is important. Luo proposed a cropping method based on entropy value, which uses entropy value to determine the importance of convolution kernels. Anwar adopts a random clipping method, and then counts the performance of the model for each random method to determine the locally optimal clipping method. Tian's LDA analysis found that for each category, the activations between many convolution kernels are highly irrelevant, so this can be used to eliminate a large number of filters with only a small amount of information without affecting the performance of the model.
综上所述,现有方案的局限性如下:To sum up, the limitations of existing solutions are as follows:
a.核的稀疏化方法仅仅考虑到网络的压缩存储,在运行过程中的压缩效果不明显,速度也没有明显的提升;a. The sparse method of the kernel only considers the compressed storage of the network, the compression effect during operation is not obvious, and the speed is not significantly improved;
b.用权重的大小作为评判指标仅仅考虑了权重本身的数值特性,没有考虑网络层的数据特征,压缩的效果不高;b. Using the size of the weight as the evaluation index only considers the numerical characteristics of the weight itself, and does not consider the data characteristics of the network layer, and the compression effect is not high;
c.评判指标的计算的复杂程度较高,耗费了较多的计算能力;c. The calculation of the evaluation index is more complicated and consumes a lot of computing power;
d.随机裁剪的方法随机性太强,容易破坏网络本身的参数特征;d. The random cropping method is too random, and it is easy to destroy the parameter characteristics of the network itself;
因此,有必要提供一种计算方法简单,能充分考虑网络中冗余,适用性强,不依赖专用加速库,实现深度神经网络压缩和加速的方法。Therefore, it is necessary to provide a method that is simple in calculation method, can fully consider the redundancy in the network, has strong applicability, and does not rely on a dedicated acceleration library to achieve deep neural network compression and acceleration.
发明内容SUMMARY OF THE INVENTION
为了解决现有技术的缺陷,本发明公开了一种基于特征图统计的深度神经网络模型裁剪方法。与其他压缩方法相比,本发明通过将神经网络特征层的多种统计特征作为评判标准,对网络的参数层进行裁剪,充分考虑了网络的数值特征和统计特性,取得了很好的压缩效率,同时提升了运行速度。In order to solve the defects of the prior art, the present invention discloses a deep neural network model trimming method based on feature map statistics. Compared with other compression methods, the present invention cuts the parameter layer of the network by using various statistical features of the feature layer of the neural network as the evaluation criteria, fully considers the numerical and statistical characteristics of the network, and achieves good compression efficiency. , while improving the running speed.
本发明所采用的技术方案为:The technical scheme adopted in the present invention is:
一种基于特征图统计特征的深度神经网络模型裁剪方法,包括以下步骤:A deep neural network model clipping method based on statistical features of feature maps, comprising the following steps:
步骤1、针对深度神经网络模型中的特征层,计算其各个输出通道对应的特征图的统计特征;其中特征层由卷积层和激活层两部分组成,或是由卷积层、归一化层(BatchNorm层)和激活层三部分组成;本发明只对后面还有储存参数的层(特征层/全连接层)的特征层进行计算;Step 1. For the feature layer in the deep neural network model, calculate the statistical features of the feature map corresponding to each output channel; the feature layer is composed of two parts: the convolution layer and the activation layer, or the convolution layer, normalization layer The layer (BatchNorm layer) and the activation layer are composed of three parts; the present invention only calculates the feature layer of the layer (feature layer/full connection layer) that stores parameters later;
步骤2、根据特征层中各输出通道对应的特征图的统计特征,计算特征层中各输出通道的评判指标;Step 2. Calculate the evaluation index of each output channel in the feature layer according to the statistical features of the feature map corresponding to each output channel in the feature layer;
步骤3、根据评判指标判断特征层中各输出通道的重要性,将不重要的输出通道及其对应的参数进行移除。Step 3: Determine the importance of each output channel in the feature layer according to the evaluation index, and remove the unimportant output channels and their corresponding parameters.
在深度神经网络的一个迭代步(epoch)中,样本分不同的批次送入神经网络进行计算。所述步骤2中,对网络模型中各特征层各个输出通道对应的特征图,分批次进行特征统计;对于第i个特征层,其所有输出通道对应的特征图的统计特征包括一个均值向量和一个标准差向量S_vi,计算步骤如下:In an iterative step (epoch) of the deep neural network, the samples are sent to the neural network in different batches for calculation. In the step 2, the feature maps corresponding to each output channel of each feature layer in the network model are subjected to feature statistics in batches; for the i-th feature layer, the statistical features of the feature maps corresponding to all output channels include a mean vector. and a standard deviation vector S_v i , the calculation steps are as follows:
S11:初始化;设置Nsum用来统计已经处理过的样本数量,初始化为0;Nbatch为批次数量,Nbatch=ceil(样本总数/N),N为一个批次中样本的数量,ceil(·)是向上取整函数;nbatch是当前批次的计数,初始化为1;初始化均值向量和标准差向量S_vi为一个1×Ci的零向量,其中,Ci为第i个特征层的输出通道数量;S11: Initialization; set N sum to count the number of samples that have been processed, initialized to 0; N batch is the number of batches, N batch = ceil (total number of samples/N), N is the number of samples in a batch, ceil ( ) is the round-up function; n batch is the count of the current batch, initialized to 1; initialized mean vector The sum standard deviation vector S_v i is a 1×C i zero vector, where C i is the number of output channels of the ith feature layer;
S12:将第nbatch个批次样本对应的第i个特征层的输出(features maps)Xi表示为一个大小为N×Ci×Hi×Wi的四维张量,其中,Hi和Wi分别为第i个特征层的输出通道对应的特征图(feature map)的高和宽;该批次中第k个样本对应的该特征层第j个输出通道特征图(feature map)Xikj为一个大小为Hi×Wi的二维矩阵,k=1,2,…,N,j=1,2,…,Ci;S12: Represent the output (features maps) X i of the i-th feature layer corresponding to the n-th batch of samples as a four-dimensional tensor of size N×C i ×H i ×W i , where H i and Wi are the height and width of the feature map (feature map) corresponding to the output channel of the ith feature layer; the feature map (feature map) X of the jth output channel of the feature layer corresponding to the kth sample in the batch ikj is a two-dimensional matrix of size H i ×W i , k=1, 2, . . . , N, j=1, 2, . . . , C i ;
S13:将Xi进行维度变换(view或reshape),得到大小为N×Ci×(Hi×Wi)的三维张量X* i,即把Xi中每一个特征图Xikj由二维矩阵拉伸为一维向量X* ikj;S13: Perform dimension transformation (view or reshape) on X i to obtain a three-dimensional tensor X * i of size N×C i ×(H i ×W i ), that is, transform each feature map X ikj in X i by two The dimensional matrix is stretched to a one-dimensional vector X * ikj ;
S14:计算X* ikj的统计特征,包括均值和标准差Sikj(两者均为标量):S14: Compute statistical features of X * ikj , including mean and standard deviation S ikj (both scalars):
其中,表示X* kj中第m个元素;in, represents the mth element in X * kj ;
S15:由k=1,2,…,N,j=1,2,…,Ci构成一个大小N×Ci为均值矩阵由Sikj,k=1,2,…,N,j=1,2,…,Ci构成一个大小N×Ci为标准差矩阵S_mi;S15: by k=1, 2, ..., N, j=1, 2, ..., C i forms a matrix of size N × C i is the mean value A size N×C i is a standard deviation matrix S_m i formed by S ikj , k=1, 2,..., N, j=1, 2,..., C i ;
S16:对均值矩阵和标准差矩阵S_mi按通道进行均值化处理(均值滤波),处理公式如下:S16: For the mean matrix And the standard deviation matrix S_m i is averaged by channel (average filter), and the processing formula is as follows:
Nsum=Nsum+N (5)N sum = N sum + N (5)
其中,是中第k行对应的行向量;是S_mi中第k行对应的行向量;in, Yes The row vector corresponding to the kth row in ; is the row vector corresponding to the kth row in S_m i ;
S17:判断当前批次是否为最后一个批次,即是否有nbatch=Nbatch,若是,则终止批次循环,当前的均值向量和标准差向量S_vi即为该特征层所有输出通道对应的特征图的统计特征;否则更新当前批次的计数:nbatch=nbatch+1,跳转到S12继续执行。S17: Determine whether the current batch is the last batch, that is, whether there are n batch = N batch , if so, terminate the batch cycle, and the current mean vector The sum standard deviation vector S_v i is the statistical feature of the feature map corresponding to all output channels of the feature layer; otherwise, update the count of the current batch: n batch = n batch +1, and jump to S12 to continue execution.
进一步地,所述步骤2中,第i个特征层中第j个输出通道的评判指标的计算公式如下:Further, in the step 2, the evaluation index of the jth output channel in the ith feature layer The calculation formula is as follows:
其中,和S_vij是该特征层对应的均值向量和标准差向量S_vi中第j个元素,表示该特征层中第j个输出通道对应的特征图的均值和标准差,α和β是两个比例因子(超参数),α是均值的阈值,当均值 随着值变小朝着负无穷的方向移动;反之当均值则随着值变大向零方向移动;β是标准差S_vij的阈值,当标准差S_vij<β,随着S_vij值变小向零方向移动;反之,则随着S_vij值变大向正无穷移动。当均值和标准差S_vij的值较小时,α-子项起主导作用;当均值和标准差S_vij的值较大时,β-子项起主导作用。α和β的确定有两种方法,其一为根据经验值指定,超参数值的范围由低到高设定,根据每次设定的值计算评判指标,从而对神经网络模型进行裁剪,并对裁剪之后的模型重新训练以恢复精度,逐渐达到一个最优的效果(即在网络精度下降不超过设定阈值的情况下裁剪掉的通道数目最多);其二为比例缩放,即和β=η∑S_vij/Ci,μ和η是比例因子,取值范围为(0,0.4),该方法可以根据网络的参数进行动态的调整。ε是一个极小值,防止除数为零的情况。in, and S_v ij is the mean vector corresponding to the feature layer and the jth element in the standard deviation vector S_v i , representing the mean and standard deviation of the feature map corresponding to the jth output channel in the feature layer, α and β are two scale factors (hyperparameters), α is the mean threshold, when the mean along with The value becomes smaller and moves in the direction of negative infinity; otherwise, it is the mean value but along with The value becomes larger and moves to the zero direction; β is the threshold of the standard deviation S_v ij , when the standard deviation S_v ij <β, As the value of S_v ij becomes smaller, it moves to the zero direction; otherwise, the Move toward positive infinity as the value of S_v ij increases. when the mean When the value of and standard deviation S_v ij is small, the α-subterm plays a leading role; when the mean When the value of the standard deviation S_v ij is large, the β-subterm plays a leading role. There are two ways to determine α and β. One is to specify according to the empirical value. The range of hyperparameter values is set from low to high, and the evaluation index is calculated according to the value set each time, so as to tailor the neural network model, and Retrain the cropped model to restore the accuracy, and gradually achieve an optimal effect (that is, the maximum number of channels cropped when the network accuracy does not drop beyond the set threshold); the second is scaling, that is and β=η∑S_v ij /C i , μ and η are scale factors, the value range is (0, 0.4), the method can be dynamically adjusted according to the parameters of the network. ε is a minimal value that prevents division by zero.
进一步地,所述步骤3中,对于第i个特征层Li,若则判定该特征层中第j个输出通道不重要,将该通道及其对应的参数进行移除。Further, in the step 3, for the i -th feature layer Li, if Then it is determined that the jth output channel in the feature layer is not important, and the channel and its corresponding parameters are removed.
进一步地,对于第i个特征层Li,移除其中不重要的通道及其对应的参数的步骤如下:Further, for the ith feature layer Li, the steps of removing unimportant channels and their corresponding parameters are as follows:
S31:记录对应的评判标准的通道的集合Ri,集合Ri中的元素个数记为length(Ri);S31: Record the corresponding evaluation criteria The set R i of the channels of , the number of elements in the set R i is recorded as length(R i );
S32:将特征层Li的卷积核Wi表示为一个大小为Ci-1×Ci×Khi×Kwi的四维张量,卷积核Wi对应的偏置Bi为一个大小为1×Ci的向量,其中Ci-1表示上一个特征层Li-1的输出通道数,若Li为第一个特征层,则Ci-1表示样本输入的通道数,Khi和Kwi分别为卷积核的高度和宽度;移除卷积核Wi中属于集合Ri的通道对应的元素,形成一个新的卷积核其大小为用新的卷积核来代替卷积核Wi;移除偏置Bi中属于集合Ri的通道对应的元素,形成一个新的偏置其大小为 S32: Represent the convolution kernel Wi of the feature layer Li as a four-dimensional tensor of size C i-1 ×C i ×K hi ×K wi , and the bias B i corresponding to the convolution kernel Wi is a size is a 1×C i vector, where C i-1 represents the number of output channels of the previous feature layer Li-1 , if Li is the first feature layer, then C i -1 represents the number of sample input channels, K hi and Kwi are the height and width of the convolution kernel respectively; remove the elements corresponding to the channels belonging to the set Ri in the convolution kernel Wi to form a new convolution kernel Its size is with a new convolution kernel to replace the convolution kernel Wi ; remove the elements corresponding to the channels belonging to the set Ri in the bias Bi to form a new bias Its size is
S33:若特征层Li的下一层Li+1还是特征层,则将下一层Li+1的卷积核Wi+1表示为一个大小为Ci×Ci+1×Kh(i+1)×Kw(i+1)的四维张量,其中Ci+1表示下一层Li+1的输出通道数,Kh(i+1)和Kw(i+1)分别为卷积核Wi+1的高度和宽度;;移除卷积核Wi+1中属于集合Ri的通道对应的元素,形成一个新的卷积核其大小为用新的卷积核来代替卷积核Wi+1;S33: If the next layer Li+1 of the feature layer Li is still a feature layer, express the convolution kernel Wi +1 of the next layer Li +1 as a size C i ×C i +1 ×K A four-dimensional tensor of h(i+1) ×K w(i+1) , where C i+1 represents the number of output channels of the next layer L i+1 , K h(i+1) and K w(i+ 1) are the height and width of the convolution kernel Wi +1 respectively; ; Remove the elements corresponding to the channels belonging to the set R i in the convolution kernel Wi +1 to form a new convolution kernel Its size is with a new convolution kernel to replace the convolution kernel Wi +1 ;
S34:若特征层Li的下一层Li+1是全连接层,则将下一层Li+1的参数Vi+1表示为一个大小(Ci×Khi×Kwi)×Ci+1的矩阵;移除参数Vi+1中属于集合Ri的通道对应的元素,形成一个新的参数其大小为用新的参数来代替参数Vi+1。S34: If the next layer Li+1 of the feature layer Li is a fully connected layer, express the parameter Vi +1 of the next layer Li+1 as a size (C i × K hi ×K wi )× The matrix of C i+1 ; remove the elements corresponding to the channels belonging to the set R i in the parameter V i+1 to form a new parameter Its size is with new parameters to replace the parameter V i+1 .
深度神经网络在完成了模型裁剪之后,需要经过几次迭代来重新训练使得网络的精度得以恢复,迭代的次数与修剪的特征层、以及评判准则有关。修剪的特征层靠近输入层,迭代的次数较少;修剪的特征层靠近输出层,迭代的次数较多。评判准则中,α和β的值越高,需要更多的迭代次数来恢复网络的精度。After completing the model pruning, the deep neural network needs to undergo several iterations to retrain to restore the accuracy of the network. The number of iterations is related to the pruned feature layer and the evaluation criteria. The pruned feature layer is close to the input layer, and the number of iterations is less; the pruned feature layer is close to the output layer, and the number of iterations is more. In the evaluation criteria, the higher the values of α and β, the more iterations are required to restore the accuracy of the network.
有益效果:Beneficial effects:
与已有技术相比,本发明充分利用了深度神经网络的统计特征,构建了基于均值和标准差的评判指标,能够有效降低神经网络特征层的维度,提升了深度神经网络的训练速度,减少了深度神经网络的框架规模和权重数量,提高了深度神经网络的运行速度/效率,并且对精度产生的影响较小。具体有以下的特点与效果:Compared with the prior art, the present invention makes full use of the statistical features of the deep neural network, and constructs an evaluation index based on the mean and standard deviation, which can effectively reduce the dimension of the feature layer of the neural network, improve the training speed of the deep neural network, and reduce the The frame size and number of weights of the deep neural network are improved, and the running speed/efficiency of the deep neural network is improved, and the impact on the accuracy is small. It has the following features and effects:
第一、本发明在构造卷积核裁剪的评判指标时,考虑了神经网络的统计特征,利用特征层的均值和标准差来兼顾神经网络值的特征和特征层内的特征。通过特征层的数据特征反映卷积核参数的作用效果,进而将表现不良的特征图及相对应的卷积核裁剪掉,实现了网络模型框架的缩小和参数量的压缩。First, the present invention considers the statistical characteristics of the neural network when constructing the evaluation index for convolution kernel clipping, and uses the mean and standard deviation of the feature layer to take into account the features of the neural network value and the features in the feature layer. The data features of the feature layer reflect the effect of the parameters of the convolution kernel, and then the feature maps and corresponding convolution kernels with poor performance are cut out, which realizes the reduction of the network model framework and the compression of the parameter amount.
第二、评判准则公式中,可以灵活的设定超参数α和β,来改变移除的通道数目;当统计特征落在靠近0的地方,α-子项会起到主导作用;相反,当统计特征落在远离0的区域,β-子项会起到主导作用。Second, in the criterion formula, hyperparameters α and β can be flexibly set to change the number of removed channels; when the statistical features fall close to 0, the α-sub-item will play a leading role; on the contrary, when Statistical features fall in the region far from 0, and the β-subterm will play a dominant role.
第三、本发明在充分考虑了神经网络的统计特征的基础上,提出了新的评判准则,算法的复杂度较低,性能较好,可以在实时网络以及嵌入式设备上进行部署。Third, the present invention proposes a new evaluation criterion on the basis of fully considering the statistical characteristics of the neural network. The algorithm has lower complexity and better performance, and can be deployed on real-time networks and embedded devices.
附图说明Description of drawings
图1是本发明的流程示图;Fig. 1 is the flow chart of the present invention;
图2是特征层的结构示意图;FIG. 2 is a schematic structural diagram of a feature layer;
图3是特征层内部结构示意图;3 is a schematic diagram of the internal structure of the feature layer;
图4是神经网络中特征层的存在形式示例;图4(a)是多个连续的特征层,图4(b)是单个特征层;Fig. 4 is an example of the existence form of the feature layer in the neural network; Fig. 4(a) is a plurality of continuous feature layers, and Fig. 4(b) is a single feature layer;
图5是本发明设计的总体框图;Fig. 5 is the overall block diagram designed by the present invention;
图6是本发明模型挑选与裁剪的示意图;Fig. 6 is the schematic diagram of model selection and cutting of the present invention;
具体实施方式Detailed ways
下面结合具体实施例对本发明进行详细说明,以下实施例将有助于本领域的技术人员进一步理解本发明。下面通过参考附图描述的实例是示例性的,旨在用于解释本发明,而不能理解为对本发明的限制。The present invention will be described in detail below with reference to specific embodiments, and the following embodiments will help those skilled in the art to further understand the present invention. The examples described below with reference to the accompanying drawings are exemplary and are intended to explain the present invention, but should not be construed as limiting the present invention.
图1是本实例的一种基于特征图统计的深度神经网络模型裁剪方法流程示意图,通过对特定的特征图及相对应的卷积核进行移除,实现网络模型框架的缩小和参数量的压缩,针对模型的裁剪方法,具体实现步骤为:Figure 1 is a schematic flowchart of a deep neural network model clipping method based on feature map statistics in this example. By removing specific feature maps and corresponding convolution kernels, the network model framework is reduced and the parameter amount is compressed. , for the model cutting method, the specific implementation steps are:
(1)针对深度神经网络中的特征层,依次计算特征层中每个特征图的统计特征;(1) For the feature layer in the deep neural network, calculate the statistical features of each feature map in the feature layer in turn;
(2)根据统计特征构造评判准则;(2) Construct judging criteria according to statistical characteristics;
(3)将不符合评判准则的特征图及其相对应的卷积核进行移除。(3) The feature maps that do not meet the evaluation criteria and their corresponding convolution kernels are removed.
需要说明的是,本发明实例的操作对象是已经训练收敛的深度神经网络的特征层,其中特征层是由卷积层和激活层(或可描述为激活函数、非线性层)两部分组合而成,或是由卷积层、BatchNorm层(归一化层)和激活层三部分组合而成,如图2所示。对于网络的类型及模块,包括但不限于卷积层,批归一化层,激活层,全连接层,Resnet模块。在深度神经网络框架中,特征层的内部结构示意图如图3所示,第i个特征层为Li,第i个特征层的卷积核为Wi,卷积核Wi对应的偏置为Bi。本发明只对后面还有储存参数的层(特征层/全连接层)的特征层进行计算,比如图4(a)中除最后一个特征层之外的其它特征层,而不对后面没有储存参数的层的特征层进行计算,比如图4(b)中的特征层,即若特征层后面只有池化层、归一化层、激活层或softmax层,则不对这种特征层进行操作。It should be noted that the operation object of the example of the present invention is the feature layer of the deep neural network that has been trained and converged, wherein the feature layer is a combination of the convolution layer and the activation layer (or can be described as an activation function, a nonlinear layer) two parts. It is composed of three parts: convolution layer, BatchNorm layer (normalization layer) and activation layer, as shown in Figure 2. For network types and modules, including but not limited to convolutional layers, batch normalization layers, activation layers, fully connected layers, and Resnet modules. In the deep neural network framework, the schematic diagram of the internal structure of the feature layer is shown in Figure 3. The ith feature layer is Li, the convolution kernel of the ith feature layer is Wi , and the bias corresponding to the convolution kernel Wi is B i . The present invention only calculates the feature layer of the layer (feature layer/fully connected layer) that has stored parameters in the back, such as other feature layers except the last feature layer in Fig. 4(a), but does not store parameters later. The feature layer of the layer is calculated, such as the feature layer in Figure 4(b), that is, if there is only a pooling layer, a normalization layer, an activation layer or a softmax layer behind the feature layer, this feature layer will not be operated.
在深度神经网络的一个迭代步(epoch)中,样本分不同的批次送入神经网络进行计算。对网络模型中特征图的统计特征,分批次进行特征统计,以下以第i个特征层Li为例进行说明,实现步骤如下:In an iterative step (epoch) of the deep neural network, the samples are sent to the neural network in different batches for calculation. For the statistical features of the feature map in the network model, feature statistics are performed in batches. The following takes the i -th feature layer Li as an example to illustrate, and the implementation steps are as follows:
S31:初始化中间变量。Nsum用来统计已经处理过的样本数量,初始化为0;Nbatch为批次数量,Nbatch=ceil(样本总数/N),N是一个批次中样本的数量,ceil(·)是向上取整函数;nbatch是当前批次的计数,初始化为0;均值和标准差Sikj是标量,k=1,2,…,N,j=1,2,…,Ci,初始化为0;均值矩阵和标准差矩阵S_mi的初始化为大小是(N,Ci)的零矩阵,而均值向量和标准差向量S_vi的初始化为大小是(1,Ci)的零向量,Ci为特征层Li的输出通道数量。S31: Initialize intermediate variables. N sum is used to count the number of samples that have been processed, initialized to 0; N batch is the number of batches, N batch = ceil (total number of samples/N), N is the number of samples in a batch, ceil( ) is upward Rounding function; n batch is the count of the current batch, initialized to 0; mean and standard deviation S ikj are scalars, k = 1, 2, ..., N, j = 1, 2, ..., C i , initialized to 0; mean matrix and the standard deviation matrix S_m i is initialized as a zero matrix of size (N, C i ), while the mean vector The sum standard deviation vector S_v i is initialized as a zero vector of size (1, C i ), where C i is the number of output channels of the feature layer Li .
S32:将第nbatch个批次样本对应的第i个特征层的Li输出进行view或reshape(维度变换)得到大小由(N,Ci,Hi,Wi)变为(N,Ci,Hi*Wi),相当于把二维的特征图Xikj拉伸为一维的表示X* ikj,k∈[1,N]是第i个特征层Li中样本的索引,j∈[1,Ci]是第i个特征层Li中第j个通道的特征图的索引,需要强调的是特征图Xikj表示第i个特征层Li中第j个通道的(Hi,Wi)大小的元素集合,特征图X* ikj表示特征层中第j个通道的(Hi*Wi)大小的元素集合;S32: Perform view or reshape (dimension transformation) on the Li output of the i -th feature layer corresponding to the n-th batch of samples to obtain The size is changed from (N, C i , H i , Wi ) to (N, C i , H i *W i ) , which is equivalent to stretching the two-dimensional feature map X ikj into a one-dimensional representation X * ikj , k∈[1,N] is the index of the sample in the ith feature layer Li, j∈[1,C i ] is the index of the feature map of the jth channel in the ith feature layer Li , it needs to be emphasized is the feature map X ikj represents the element set of size (H i , Wi ) of the j-th channel in the i - th feature layer Li, and the feature map X * ikj represents the feature layer The set of elements of size (H i *W i ) of the jth channel in ;
S33:计算特征层中第j个通道特征图X* ikj的统计特征:均值和标准差Sikj:S33: Calculate the feature layer Statistical features of the jth channel feature map X * ikj in: mean and standard deviation S ikj :
针对特征层其中的任一特征图X* ikj,均采用均值和标准差Sikj作为特征图X* ikj的统计特征。在特征层的一个批次中,可产生对应大小为(N,Ci)的均值矩阵和标准差矩阵S_mi;For feature layer For any feature map X * ikj , the mean value is used and standard deviation S ikj as statistical features of feature map X * ikj . at the feature level In a batch of , a mean matrix of size (N, C i ) can be generated and standard deviation matrix S_m i ;
S34:对均值矩阵和标准差矩阵S_mi按通道进行均值化处理,其实现如下:S34: For the mean matrix and the standard deviation matrix S_m i are averaged by channel, which is implemented as follows:
Nsum=Nsum+N (5)N sum = N sum + N (5)
其中,Nsum是前nbatch批样本数量的叠加,用来统计已经处理过的样本数量;N是第nbatch个批次的样本数量;是均值矩阵进行均值滤波后的结果,是第k个样本对应的所有通道的均值;S_vi是标准差矩阵S_mi进行均值滤波后的结果,是第k个样本对应的所有通道的标准差。Among them, N sum is the superposition of the number of samples in the first n batches, which is used to count the number of samples that have been processed; N is the number of samples in the nth batch ; is the mean matrix The result after mean filtering, is the mean of all channels corresponding to the kth sample; S_v i is the result of mean filtering by the standard deviation matrix S_m i , is the standard deviation of all channels corresponding to the kth sample.
S35:更新当前批次:nbatch=nbatch+1,如果nbatch=Nbatch,那么终止批次循环;否则均值和标准差Sikj置为0;均值矩阵和标准差矩阵S_mi重置为零矩阵,而均值向量和标准差向量S_vi也重置为零向量,跳转到S32继续执行。根据上述批次迭代得到的均值向量和标准差向量S_vi,对特征层Li的第j个通道的评判指标计算过程如下:S35: Update the current batch: n batch = n batch +1, if n batch = N batch , then terminate the batch cycle; otherwise, the mean and standard deviation S ikj set to 0; mean matrix and standard deviation matrix S_m i reset to zero matrix, while mean vector The sum standard deviation vector S_v i is also reset to the zero vector, and jumps to S32 to continue the execution. The mean vector obtained by iterating over the above batches and the standard deviation vector S_v i , the evaluation index for the jth channel of the feature layer Li The calculation process is as follows:
其中,和S_vij是特征层Li中第j个通道的均值和标准差,α和β是两个超参数,这两个比例因子用来限定均值和标准差S_vij的起作用范围,ε是一个极小值,防止除数为零的存在。在超参数较小的情况下,采用比例缩放,即和β=η∑S_vij/Ci,μ和η是比例因子,取值范围为(0,0.4),当超参数变大之后,采用逐步逼近的方法,超参数值的范围由低到高,逐渐达到一个最优的效果。in, and S_v ij are the mean and standard deviation of the jth channel in the feature layer Li, α and β are two hyperparameters, these two scale factors are used to define the mean And the working range of standard deviation S_v ij , ε is a minimum value, preventing the existence of division by zero. In the case of small hyperparameters, scaling is used, i.e. and β=η∑S_v ij /C i , μ and η are scale factors, the value range is (0, 0.4), when the hyperparameter becomes larger, the step-by-step approximation method is adopted, and the range of the hyperparameter value is from low to high , and gradually achieve an optimal effect.
移除不符合评判准则的特征图及其相关联的参数,针对Ci个输出通道的特征层Li而言,计算得到评判指标记录对应的评判指标的通道的集合Ri,移除相对应的通道的步骤如下:Remove the feature maps that do not meet the evaluation criteria and their associated parameters, and calculate the evaluation index for the feature layer Li of the C i output channels Record the corresponding evaluation indicators The set of channels R i of , the steps of removing the corresponding channel are as follows:
S71:特征层Li的卷积核Wi的大小为(Ci-1,Ci,Khi,Kwi),Ci-1表示上一个特征层Li-1的输出通道数或样本输入的通道数(如果特征层Li为第一个特征层),Ci为当前特征层Li的输出通道数,Khi,Kwi为卷积核的尺寸。构建一个新的卷积核其大小为 表示从Ci中减去集合Ri包含的元素数量后的通道数(Ci-length(Ri))。将卷积核Wi的第1个维度上切取不属于集合Ri的通道对应的元素,复制到新的卷积核中,然后用新的卷积核来代替卷积核Wi;构建一个新的偏置其大小为将偏置Bi中切取不属于集合Ri的通道对应的元素,复制到新的偏置然后用新的偏置来代替偏置Bi;S71: The size of the convolution kernel Wi of the feature layer Li is (C i -1 , C i , K hi , K wi ), and C i -1 represents the number of output channels or samples of the previous feature layer Li-1 The number of input channels (if the feature layer Li is the first feature layer), C i is the number of output channels of the current feature layer Li, K hi , K wi are the size of the convolution kernel. Build a new convolution kernel Its size is Indicates the number of channels (C i -length(R i )) after subtracting the number of elements contained in the set R i from C i . Cut out the elements corresponding to the channels that do not belong to the set Ri from the first dimension of the convolution kernel Wi, and copy them to the new convolution kernel , and then use the new convolution kernel to replace the convolution kernel Wi ; construct a new bias Its size is Cut out the elements corresponding to the channels that do not belong to the set Ri from the offset B i and copy them to the new offset then use the new bias instead of bias Bi ;
S72:如果特征层Li之后还存在特征层的话,对于下一个特征层Li+1的卷积核Wi+1的大小为(Ci,Ci+1,Kh(i+1)×Kw(i+1)),Ci+1表示下一个特征层Li+1的输出通道数。构建一个新的卷积核其大小为将卷积核Wi+1的第1个维度上切取不属于集合Ri的通道对应的元素,复制到新的卷积核中,然后用新的卷积核来代替卷积核Wi+1;S72: If there is a feature layer after the feature layer Li, the size of the convolution kernel Wi+1 for the next feature layer Li+1 is (C i , C i +1 , K h(i+1) ×K w(i+1) ), C i+1 represents the number of output channels of the next feature layer Li+1 . Build a new convolution kernel Its size is Cut out the elements corresponding to the channels that do not belong to the set R i from the first dimension of the convolution kernel Wi +1 , and copy them to the new convolution kernel , and then use the new convolution kernel to replace the convolution kernel Wi +1 ;
S73:如果特征层Li之后是全连接层,连接层的输出通道数量为Ci+1,对应的参数大小为(Ci×Khi×Kwi,Ci+1)。构建一个新的参数其大小为将参数Vi+1的第1个维度上切取不属于集合Ri的通道对应的元素,复制到新的参数中,然后用新的参数来代替参数Vi+1;S73: If the feature layer Li is followed by a fully connected layer, the number of output channels of the connection layer is C i +1 , and the corresponding parameter size is (C i ×K hi ×K wi , C i+1 ). build a new parameter Its size is Cut out the elements corresponding to the channels that do not belong to the set R i from the first dimension of the parameter V i+1 , and copy them to the new parameter , then use the new parameters to replace the parameter V i+1 ;
进一步地,深度神经网络在完成了模型裁剪之后,需要经过几次迭代来重新训练使得网络的精度得以恢复,迭代的次数与修剪的特征层、以及评判准则有关。修剪的特征层靠近输入层,迭代的次数较少;修剪的特征层靠近输出层,迭代的次数较多。评判准则中,α和β的值越高,需要更多的迭代次数来恢复网络的精度。Further, after completing the model pruning, the deep neural network needs to undergo several iterations to retrain to restore the accuracy of the network. The number of iterations is related to the pruned feature layer and the evaluation criteria. The pruned feature layer is close to the input layer, and the number of iterations is less; the pruned feature layer is close to the output layer, and the number of iterations is more. In the evaluation criteria, the higher the values of α and β, the more iterations are required to restore the accuracy of the network.
本技术领域的普通技术人员可以理解实现上述实施方法携带的全部或部分步骤是可以通过程序指令来实现,所述的程序可以存储于一种计算机可读存储介质中。以上对本发明的具体实施进行了描述。应当理解的是,本发明对特征层和全连接层进行压缩操作,因此,凡在本发明的精神实质与原理之内所做的任何修改、等同替换、改进等,均应包含在本发明保护的范围之内。Those skilled in the art can understand that all or part of the steps carried by the above implementation method can be implemented by program instructions, and the program can be stored in a computer-readable storage medium. The specific implementation of the present invention has been described above. It should be understood that the present invention compresses the feature layer and the fully connected layer. Therefore, any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention shall be included in the protection of the present invention. within the range.
Claims (5)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811440153.2A CN109472352A (en) | 2018-11-29 | 2018-11-29 | A deep neural network model tailoring method based on feature map statistical features |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811440153.2A CN109472352A (en) | 2018-11-29 | 2018-11-29 | A deep neural network model tailoring method based on feature map statistical features |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109472352A true CN109472352A (en) | 2019-03-15 |
Family
ID=65674220
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811440153.2A Pending CN109472352A (en) | 2018-11-29 | 2018-11-29 | A deep neural network model tailoring method based on feature map statistical features |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109472352A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109978069A (en) * | 2019-04-02 | 2019-07-05 | 南京大学 | The method for reducing ResNeXt model over-fitting in picture classification |
CN110119811A (en) * | 2019-05-15 | 2019-08-13 | 电科瑞达(成都)科技有限公司 | A kind of convolution kernel method of cutting out based on entropy significance criteria model |
CN110232436A (en) * | 2019-05-08 | 2019-09-13 | 华为技术有限公司 | Pruning method, device and the storage medium of convolutional neural networks |
CN110309847A (en) * | 2019-04-26 | 2019-10-08 | 深圳前海微众银行股份有限公司 | A model compression method and device |
CN112036563A (en) * | 2019-06-03 | 2020-12-04 | 国际商业机器公司 | Deep learning model insights using provenance data |
CN117636057A (en) * | 2023-12-13 | 2024-03-01 | 石家庄铁道大学 | Train bearing damage classification and identification method based on multi-branch cross-space attention model |
-
2018
- 2018-11-29 CN CN201811440153.2A patent/CN109472352A/en active Pending
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109978069A (en) * | 2019-04-02 | 2019-07-05 | 南京大学 | The method for reducing ResNeXt model over-fitting in picture classification |
CN110309847A (en) * | 2019-04-26 | 2019-10-08 | 深圳前海微众银行股份有限公司 | A model compression method and device |
CN110232436A (en) * | 2019-05-08 | 2019-09-13 | 华为技术有限公司 | Pruning method, device and the storage medium of convolutional neural networks |
CN110119811A (en) * | 2019-05-15 | 2019-08-13 | 电科瑞达(成都)科技有限公司 | A kind of convolution kernel method of cutting out based on entropy significance criteria model |
CN110119811B (en) * | 2019-05-15 | 2021-07-27 | 电科瑞达(成都)科技有限公司 | Convolution kernel cutting method based on entropy importance criterion model |
CN112036563A (en) * | 2019-06-03 | 2020-12-04 | 国际商业机器公司 | Deep learning model insights using provenance data |
CN117636057A (en) * | 2023-12-13 | 2024-03-01 | 石家庄铁道大学 | Train bearing damage classification and identification method based on multi-branch cross-space attention model |
CN117636057B (en) * | 2023-12-13 | 2024-06-11 | 石家庄铁道大学 | Train bearing damage classification and recognition method based on multi-branch cross-space attention model |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109472352A (en) | A deep neural network model tailoring method based on feature map statistical features | |
US11030528B1 (en) | Convolutional neural network pruning method based on feature map sparsification | |
CN108986470B (en) | Travel time prediction method for optimizing LSTM neural network by particle swarm optimization | |
CN113052211B (en) | Pruning method based on characteristic rank and channel importance | |
CN110119811B (en) | Convolution kernel cutting method based on entropy importance criterion model | |
CN114037844A (en) | Global rank-aware neural network model compression method based on filter feature map | |
CN111063194A (en) | A traffic flow forecasting method | |
CN110555989B (en) | Xgboost algorithm-based traffic prediction method | |
CN108614997B (en) | A Remote Sensing Image Recognition Method Based on Improved AlexNet | |
CN110334580A (en) | The equipment fault classification method of changeable weight combination based on integrated increment | |
CN111461322A (en) | A deep neural network model compression method | |
CN110006650A (en) | A fault diagnosis method based on stack pruning sparse denoising autoencoder | |
CN112070128A (en) | A Transformer Fault Diagnosis Method Based on Deep Learning | |
CN114021811B (en) | Traffic prediction method based on attention improvement and computer medium | |
CN111488917A (en) | Garbage image fine-grained classification method based on incremental learning | |
CN113344182A (en) | Network model compression method based on deep learning | |
CN114154626B (en) | Filter pruning method for image classification task | |
CN109444604A (en) | A kind of DC/DC converter method for diagnosing faults based on convolutional neural networks | |
CN116316573A (en) | Short-term power load prediction method based on nonstandard Bayesian algorithm optimization | |
CN110110915A (en) | A kind of integrated prediction technique of the load based on CNN-SVR model | |
CN117349622A (en) | Wind speed prediction method for wind farms based on hybrid deep learning mechanism | |
CN113947182A (en) | Traffic flow prediction model construction method based on double-stage stack graph convolution network | |
Ma et al. | A survey of sparse-learning methods for deep neural networks | |
CN117726939A (en) | Hyperspectral image classification method based on multi-feature fusion | |
CN117669655A (en) | Network intrusion detection deep learning model compression method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |