CN113065640A

CN113065640A - Image classification network compression method based on convolution kernel shape automatic learning

Info

Publication number: CN113065640A
Application number: CN202110283921.3A
Authority: CN
Inventors: 张科; 刘广哲
Original assignee: Northwestern Polytechnical University
Current assignee: Northwestern Polytechnical University
Priority date: 2021-03-17
Filing date: 2021-03-17
Publication date: 2021-07-02
Anticipated expiration: 2041-03-17
Also published as: CN113065640B

Abstract

The invention relates to an image classification network compression method based on convolution kernel shape automatic learning, and belongs to the technical field of image processing and recognition. By applying multiple sparse regular constraints on the parameters of each position in the conventional convolution kernel, the internal parameters of the convolution kernel are thinned in the network training process, and the automatic learning convolution kernel shape can be obtained by setting a clipping threshold according to the compression ratio, so that the redundant parameters in the convolution kernel can be effectively eliminated. The method is applied to the image classification task, the compression rate of the network model can be further improved while the classification accuracy is ensured, the parameter quantity and the calculated quantity of the model are reduced, and the method is convenient to deploy and apply in mobile equipment with limited resources.

Description

Image classification network compression method based on convolution kernel shape automatic learning

Technical Field

The invention belongs to the technical field of image processing and recognition, and particularly relates to an image classification network compression method based on convolution kernel shape automatic learning.

Background

Image classification and recognition are important subjects in the field of machine vision, and early image recognition methods mainly rely on manual feature extraction, and are low in accuracy and limited in applicability to different scenes. With the occurrence of deep learning methods, convolutional neural networks have achieved tremendous achievement in the machine vision field such as image recognition, target detection and the like, deep neural networks can effectively extract high-level semantic features in images, and the recognition capability of people can be surpassed.

However, while the network performance is improved, the network structure is more and more complex, the requirements on the storage capacity and the computing capacity of the computing device are higher and higher, and the application and development of the computing device in the mobile device with limited resources are limited. The interior of a large-scale neural network model often has larger redundancy, not all parameters play an effective role in network performance, and excessive parameters can cause the problems of slow network convergence, overfitting of the parameters and the like. In order to facilitate the deployment and application of the neural network, the neural network compression method is increasingly emphasized.

Parameter pruning is an effective neural network compression method, and achieves the effect of reducing the complexity of a model by cutting out redundant or unimportant parameters in a network. Wevier, Chengshi super, Zhufenghua and the like (model pruning method based on sparse convolutional neural network, computer engineering, DOI.https:// doi.org/10.19678/j.issn.1000-3428.0059375) provide a model pruning algorithm based on sparse convolutional neural network, sparse regular constraint is applied to a convolutional layer and a Batch Normaposition (BN) layer in the training process, the network weight is thinned, a pruning threshold is set, filter channels with lower importance in the network are pruned, and the accuracy of the model is restored through fine tuning training, so that the aim of compressing the convolutional neural network is fulfilled. This method belongs to a structured pruning method, and pruning is performed with the convolution channel as the smallest unit, but redundant parameters inside the convolution kernel cannot be removed. Smaller pruning units need to be used if higher compression ratios are to be achieved.

Disclosure of Invention

Technical problem to be solved

The existing convolution neural network sparsification pruning method conducts sparsification training on the whole convolution channel, redundant parameters inside a convolution kernel cannot be eliminated, and therefore the compression rate of a network model is low, and the final image classification accuracy rate is affected. The invention provides an image classification network compression method based on convolution kernel shape automatic learning.

Technical scheme

An image classification network compression method based on convolution kernel shape automatic learning is characterized by comprising the following steps:

step 1: building a convolutional neural network for image classification;

step 2: introducing a coefficient matrix F to the traditional convolution process, and adding a weight sparsification regular term to a loss function

Distribution equalization regularization term

Between-group equalization regularization term

In the formula, λ₁、λ₂、λ₃Is a coefficient for balancing the terms;

in the network training process, the three loss functions are solved for each item f in the coefficient matrix_ijPartial derivatives of (a) for back-propagating the update coefficient matrix F; obtaining a sparse coefficient matrix F after training is finished;

and step 3: setting a threshold value according to the expected model compression ratio required to be achieved, and setting f below the threshold value_ijRemoving the convolution kernel parameters of the corresponding positions to obtain the convolution kernel shapes of the convolution layers;

and 4, step 4: and replacing the original conventional convolution kernel by the convolution kernel with the sparse shape obtained by automatic learning, and carrying out network training again to obtain the final image classification neural network model.

Preferably: the convolutional neural network in step 1 is VGG.

Preferably: the convolutional neural network in step 1 is ResNet.

Advantageous effects

According to the image classification network compression method based on automatic learning of the convolution kernel shape, disclosed by the invention, multiple items of sparse regular constraints are applied to parameters of all positions in a conventional convolution kernel, the internal parameters of the convolution kernel are thinned in the network training process, and the automatic learning convolution kernel shape can be obtained by setting a clipping threshold according to the compression ratio, so that the redundant parameters in the convolution kernel can be effectively eliminated. The method is applied to the image classification task, the compression rate of the network model can be further improved while the classification accuracy is ensured, the parameter quantity and the calculated quantity of the model are reduced, and the method is convenient to deploy and apply in mobile equipment with limited resources.

The image classification network compression method based on convolution kernel shape automatic learning can automatically learn the convolution kernel shapes of all convolution layers in the network training process, so that the feeling of the convolution kernels is adaptive to the network depth, meanwhile, redundant parameters in the convolution kernels are eliminated, and a good network compression effect is achieved.

The automatic learning method of the convolution kernel shape provides a new idea for efficient network structure design, and the convolution kernel shape is added into a search space of a neural structure search (NAS) to obtain a larger search space, so that more abundant target characteristics are extracted, and the network performance is favorably improved.

The image classification network compression method based on convolution kernel shape automatic learning can effectively compress parameter quantity in the neural network convolution layer, for example, 59.07% of parameter quantity and 51.91% of calculated quantity of a VGG-16 network can be reduced on the premise of not reducing accuracy, and the image classification network can be conveniently deployed on terminal mobile equipment.

The image classification network compression method based on convolution kernel shape automatic learning provided by the invention can reduce redundant parameters in a convolution layer, thereby reducing the over-fitting risk of the image classification network, and being beneficial to improving the classification accuracy of the network, such as improving the classification accuracy by 0.72% while compressing a VGG-16 network.

Drawings

FIG. 1 is a diagram of a convolution calculation process incorporating a matrix of convolution kernel coefficients.

Fig. 2 is a diagram of the number of each parameter of the 3 × 3 convolution kernel and the division of the packet.

Fig. 3 is a flow chart of convolution kernel shape auto-learning.

FIG. 4 is a convolution kernel shape for each convolution layer automatically learned on a CIFAR-10 dataset using a VGG-16 network.

Detailed Description

The invention will now be further described with reference to the following examples and drawings:

the invention provides an image classification network compression method based on convolution kernel shape automatic learning, and the convolution kernel shape automatic learning process is shown in figure 3. The following describes an embodiment of the present invention with reference to an image classification example, but the technical content of the present invention is not limited to the scope, and the embodiment includes the following steps:

(1) and (3) building a convolutional neural network for image classification, and building an image data set with a large number of training samples and labels.

(2) For convolutional layers in a neural network, the convolution calculation process of a conventional convolutional kernel is as follows:

Y＝X*w

in the formula (I), the compound is shown in the specification,

is the tensor of the input eigen-map,

is the output eigen-map tensor,

is the convolution weight parameter, c and n are the number of input and output channels, respectively, h and w are the height and width of the input feature map, h 'and w' are the height and width of the output feature map, respectively, k × k is the size of the convolution kernel, and x is the image convolution operation.

And averagely dividing convolution kernels of the n output channels into d groups, wherein each group comprises n/d convolution channels. In order to be able to sparsify the internal parameters of the convolution kernel, a coefficient matrix is introduced

Multiplying the convolution weight w point by point with each group, and performing convolution operation on the input X, namely:

Y＝X*(F⊙w)

in the equation,. is a multiplication operation by dots. The whole calculation process refers to fig. 1.

The loss function during the conventional convolutional neural network training process is:

in the formula (I), the compound is shown in the specification,

is a classification loss item, is related to input images and prediction labels in the network training process,

is a regular term for weight decay, which can reduce network overfitting.

Introducing a coefficient matrix F to the traditional convolution process, and adding a weight sparsification regular term to a loss function

Distribution equalization regularization term

Between-group equalization regularization term

In the network training process, the three loss functions are solved for each item f in the coefficient matrix_ijFor propagating the update coefficient matrix F backward. And obtaining a sparse coefficient matrix F after the training is finished.

In order to achieve the purpose of automatically learning the convolution kernel shape, a sparse regularization constraint is applied to the coefficient matrix F, and the loss function is as follows:

in the formula (I), the compound is shown in the specification,

is a regularization term that makes the convolution kernel weights sparse,

is a regular term that equalizes the distribution of the convolution kernel parameters,

is a regularizing term, λ, which equalizes the parameters between the groups₁、λ₂、λ₃Are coefficients used to balance the terms.

The regularization terms are constructed separately below. Taking a 3 × 3 convolution kernel as an example, 9 parameters in the convolution kernel are numbered and divided into corners (G)_corner) Side (G)_edge) Heart (G)_center) Three groups, referring specifically to fig. 2, each group is numbered as:

1)

the weight is a regular term which tends to be sparse, and the calculation method is as follows:

in the formula, k_jFor coefficients of different positions, which form a vector k ∈ R^1×9For applying different regular constraints to parameters at different positions of corners, edges and centers, for example, taking k ═ 4,2,4,2,1,2,4]Represents a pair G_cornerPosition applied 4 times G_centerConstraint of G_edgeApplying 2 times of G_centerAboutThe beam, and thus more emphasis is placed on preserving parameters near the center of the convolution kernel.

g (-) is a regularized norm, e.g. with L₁Norm, then:

during the training process, the reason is that

Independent of the training samples, it can be solved in advance for each coefficient f_ijFor propagating the update coefficient matrix F backward. The partial derivatives are:

in the formula, sgn (. cndot.) is a sign function.

2)

The convolution kernel parameter distribution is a regular term which enables the convolution kernel parameters to be distributed uniformly, and is used for giving consideration to convolution parameters in all directions and avoiding the situation of characteristic diagram deviation. Coefficient f at the same position j for all d groups_ijAnd calculating the sum of absolute values to obtain:

for G_cornerAnd G_edgeIn each case F_jAnd (3) making difference between every two, and solving the square sum to obtain:

according to the chain-type derivation rule,

about eachCoefficient f_ijThe partial derivatives of (a) are:

in the formula (I), the compound is shown in the specification,

3)

the method is a regular term for equalizing parameters among groups, and avoids the overlarge difference of the number of the parameters among the d groups. Calculate G in d groups separately_corner、G_edge、G_centerF at each position_ijThe sum of absolute values gives:

in the formula, F_i ^cornerRepresenting a position at G in the i-th set of convolution kernels_cornerCoefficient of position f_ijSum of absolute values, F_i ^edgeRepresenting a position at G in the i-th set of convolution kernels_edgeCoefficient of position f_ijSum of absolute values, F_i ^centerRepresenting a position at G in the i-th set of convolution kernels_centerCoefficient of position f_ijSum of absolute values.

For each F in d groups_i ^corner、F_i ^edge、F_i ^centerAnd (3) making difference between every two, and solving the square sum to obtain:

in the formula (I), the compound is shown in the specification,

represents G_cornerThe loss of inter-group balance due to position,

represents G_edgeThe loss of inter-group balance due to position,

represents G_centerPosition-generated loss of inter-group balance. The total interclass balance loss is:

according to the chain-type derivation rule,

about each coefficient f_ijThe partial derivatives of (a) are:

in the formula (I), the compound is shown in the specification,

(3) carrying out sparse training on the coefficient matrix F in the step (1) by using the loss function in the step (2), obtaining sparse F after training is finished, setting a clipping threshold value, and setting F lower than the threshold value_ijAnd removing the convolution kernel parameters of the corresponding position to obtain the automatically-learned convolution kernel shape.

(4) And replacing the original conventional convolution kernel by the convolution kernel with the sparse shape obtained by automatic learning, and carrying out network training again to obtain the final neural network model. The network parameters and the calculated amount of the model are lower than those of the original model, so that the effect of network compression can be achieved while the correct classification result is ensured.

Based on the automatic learning method of the convolution kernel shape, the convolution kernel shape which is adapted to each convolution layer can be obtained, and redundant parameters in the convolution kernel can be effectively removed, so that the purpose of model compression is achieved.

Fig. 4 shows the convolution kernel shape of each convolution layer obtained by the VGG-16 network through automatic learning on the CIFAR-10 data set, where d is 2 in the learning process, that is, the convolution kernels of n output channels of each convolution layer are averagely divided into 2 groups, and 60% of parameters in the network are removed during clipping. The left-most column (solution one) is that only sparsification regularization terms are added

And the same constraint coefficient k is taken at each position of the corner, the edge and the center_jThe results obtained were. The second row (scheme two) of the left number is that different constraint coefficients k are taken at each position of the angle, the edge and the center on the basis of the first row_jThe results obtained retain more of the parameters near the center of the convolution kernel than in the first column. The third column from the left (scheme three) is to add a distribution equalization regular term on the basis of the second column

As a result, the convolution kernel obtained combines the convolution parameters in each direction, particularly the layer 1 and the layer 10, as compared with the second column, and thus the problem of the extracted feature map being shifted in a certain direction can be avoided. The fourth column from the left (scheme four) is to add an intergroup equalization regularization term on the basis of the third column

The results obtained, compared to the third column, better balance the number of parameters of the two sets of convolution kernels, especially at level 5.

Table 1 shows the compression results of the present invention, where the original VGG-16 model contains 15.0M parameters and 314M calculated amounts, and the accuracy on CIFAR-10 is 93.45%; the number of model parameters obtained by adopting a traditional structured pruning method is 5.4M, the calculated amount is 206M, and the accuracy rate is reduced to 93.40%; by adopting the automatic learning method of the convolution kernel shape, the accuracy of the model can be improved while the network is compressed by adding each regular constraint, which shows that each constraint can play a beneficial role in the compression result, the parameter number of the finally obtained model (scheme four) is 6.14M, the calculated amount is 151M, and the accuracy is improved by 94.17%.

Table 1 table of network compression results of the present invention

Claims

1. An image classification network compression method based on convolution kernel shape automatic learning is characterized by comprising the following steps:

step 1: building a convolutional neural network for image classification;

Distribution equalization regularization term

Between-group equalization regularization term

In the formula, λ₁、λ₂、λ₃Is a coefficient for balancing the terms;

2. The image classification network compression method based on convolution kernel shape automatic learning of claim 1, characterized in that the convolution neural network in step 1 is VGG.

3. The image classification network compression method based on convolution kernel shape automatic learning of claim 1, characterized in that the convolution neural network in step 1 is ResNet.