CN110197217B

CN110197217B - Image classification method based on deep interleaving fusion packet convolution network

Info

Publication number: CN110197217B
Application number: CN201910437505.7A
Authority: CN
Inventors: 王雪松; 吕恩辉; 程玉虎
Original assignee: China University of Mining and Technology CUMT
Current assignee: China University of Mining and Technology CUMT
Priority date: 2019-05-24
Filing date: 2019-05-24
Publication date: 2020-12-18
Anticipated expiration: 2039-05-24
Also published as: CN110197217A

Abstract

The invention discloses an image classification method based on a depth interleaving fusion packet convolution network, which comprises the steps of firstly selecting image data to be classified, and carrying out image preprocessing to obtain amplified image data; then, constructing a template module by using the same topological structure and a staggered fusion strategy; then, a small convolution kernel and a grouping convolution strategy are used for constructing a structured sparse convolution kernel; and finally, forming a deep interleaving fusion packet convolution network by stacking a plurality of template modules and a structured sparse convolution kernel. And training the deep interleaved fusion packet convolution network by adopting a small batch random gradient descent method with momentum coefficients, analyzing to obtain network parameters, and finishing image classification. The method can reduce the model and improve the calculation efficiency on the premise of no loss of performance.

Description

Image classification method based on deep interleaving fusion packet convolution network

Technical Field

The invention relates to a method for realizing image classification by using a deep interleaving fusion packet convolution network model, belonging to the field of pattern recognition.

Background

The image classification achieved by using the deep convolutional network has become a research hotspot in the field of computer vision at present, and unlike the traditional convolutional neural network, the deep convolutional network generally has very deep nonlinear convolutional kernel stacking. In the image classification task, in order to achieve higher classification accuracy, the deep convolutional network needs to be trained by large-scale data, and the depth and the width are amplified to different degrees. Taking the example of the depth of the amplified network, the classic network VGG constructs a 19-layer deep convolutional network by stacking modules of the same shape and fully connected layers. Although experimental results show that the classification performance of the network can be effectively improved by the design strategy, training problems such as overfitting and gradient disappearance can be caused along with the deep amplification of the network, so that information behind the network cannot be well fed back to the front, and the performance is reduced. Based on this, various new convolutional neural networks have been proposed in succession for better performance. He demonstrates a simple and effective strategy for building deep convolutional networks by referencing VGG: modules with the same topological structure are stacked, and a Deep Residual Network, ResNet, is provided. Based on a Residual module structure, Zagoruyko provides a new Network structure Wide Residual Network, Wide ResNet, by reducing the Network depth and increasing the Network width, and also obtains unsophisticated classification performance. Experimental results show that in the image classification method based on the deep convolutional network, the classification performance of the network can be remarkably improved by reconstructing the structure of the convolutional neural network in a manner of easily learning the deep features. However, for a proper deep convolution network, with the increase of the structure scale, not only the parameter number and the computational complexity of the network are increased, but also the redundancy problem in the basic convolution unit of the neural network is caused.

Disclosure of Invention

The purpose of the invention is as follows: in order to overcome the defects in the prior art, the invention provides an image classification method based on a deep interleaving fusion packet convolution network, which can reduce the model and improve the calculation efficiency on the premise of lossless performance.

The technical scheme is as follows: an image classification method based on a depth interleaving fusion packet convolution network comprises the following steps:

step 1: selecting image data to be classified, and carrying out image preprocessing to obtain amplified image data;

step 2: constructing a template module by using the same topological structure and a staggered fusion strategy;

and step 3: constructing a structured sparse convolution kernel by using a small convolution kernel and a grouping convolution strategy;

and 4, step 4: forming a deep interleaving fusion packet convolution network by stacking a plurality of template modules and a structured sparse convolution kernel; if the spatial sizes of the feature maps output by the template modules are the same, the template modules share the same hyper-parameter; the width of the template module is doubled each time the spatial dimension of the feature map is double down sampled;

and 5: and training the deep interleaved fusion packet convolution network by adopting a small batch random gradient descent method with momentum coefficients, analyzing to obtain network parameters, and finishing image classification.

Further, the template module comprises three layers: scaling layer, bottleneck module, pooling layer, said step 2 includes the following concrete step:

step 2.1: in the scaling layer, down-sampling is directly performed by a convolution layer with a step size of 1 and a convolution kernel of 1 × 1, and the scaling layer is repeatedly performed every time the spatial size of the feature map is halved;

step 2.2: the template module comprises two branch channels, each branch channel is formed by connecting a plurality of bottleneck modules in series, and the two branch channels are connected in a staggered manner and are spliced and fused by characteristic diagrams;

step 2.3: the output of the two branch channels and the splicing and fusing result of the last layer of feature map are subjected to integral fusion output, and the establishment of the template module is completed; if the spatial sizes of the feature maps output by the template modules are the same, the template modules share the same hyper-parameter; the width of the template module is doubled each time the spatial size of the feature map is double down sampled.

Further, the step 3 comprises the following specific steps:

step 3.1: in the template module, the bottleneck module is composed of a stack of 1 × 1 convolutional layers and 3 × 3 convolutional layers;

step 3.2: given that the number of output feature maps of the previous layer is N, that is, the number of channels is equal to N, and the number of packets of the packet convolution is G, the operation process of the packet convolution layer includes:

step 3.2.1: dividing the channel into G parts, wherein each group corresponds to N/G branch channels and is independently connected with the group, and only executing group convolution on a 3 multiplied by 3 convolution layer in the bottleneck module;

step 3.2.2: fusing the packet convolutions of different branch channels and transmitting to the next layer of convolution;

step 3.2.3: and repeating the step 3.2.1 and the step 3.2.2 until the output layer of the network.

Further, the step 4 comprises the following specific steps:

step 4.1: after an image is input into a first template module, firstly, convolution operation with the step length of 1 and the convolution kernel size of 1 multiplied by 1 is carried out on the input image, and the result is used as a scaling layer; for a convolution layer with a convolution kernel size of 3 × 3, each boundary of the input feature map is filled with a 0 pixel value to keep the feature map size fixed;

step 4.2: sequentially using pooling layers with the size of 2 multiplied by 2 and the step length of 2 to carry out down-sampling on the output of each template module, and adding a random inactivation layer to prevent overfitting; the first pooling layer of the deep interleaving fusion packet convolution network adopts maximum pooling, and the rest is uniform pooling;

step 4.3: after the tail end of the last template module performs pooling, a classifier is attached.

Has the advantages that: the method combines the advantages of a high-degree modularization and a lightweight convolution network structure, and has the following advantages: (1) the highly modularized network structure design reduces the free selection of hyper-parameters and achieves the effect of model reduction; (2) on the premise of lossless performance, the lightweight network formed by the structured sparse convolution kernel further simplifies the network structure and reduces the parameter number.

Drawings

FIG. 1 is a schematic structural view of a template module structure;

FIG. 2 is a schematic diagram of the packet convolution principle;

FIG. 3 is a schematic diagram of a deep-interleaved converged packet convolutional network;

fig. 4 shows the classification error rate and the calculation complexity variation for different packet numbers.

Detailed Description

The invention is further explained below with reference to the drawings.

An image classification method based on a depth interleaving fusion packet convolution network comprises the following steps:

step 1: and selecting image data to be classified, and carrying out image preprocessing to obtain the amplified image data. Taking CIFAR data as an example, the image preprocessing comprises the following steps:

step 1.1: the boundary of each image in the data is filled with 40 pixels and expanded to 40 × 40 pixels.

Step 1.2: image blocks of size 32 x 32 were randomly cropped and half of the images were randomly horizontally mirrored for data amplification.

Step 1.3: normalized image pre-processing was performed using channel-subtracted means.

Step 2: and constructing a template module by using the same topological structure and a staggered fusion strategy. Wherein, the template module comprises three layers: scaling layer, bottleneck module, pooling layer, as shown in fig. 1, the construction of the template module comprises the following specific steps:

step 2.1: in the scaling layer, down-sampling is directly performed by the convolution layer having a step size of 1 and a convolution kernel of 1 × 1, and the scaling layer is repeatedly performed every time the spatial size of the feature map is halved.

Step 2.2: the template module comprises two branch channels, each branch channel is formed by connecting a plurality of bottleneck modules in series, and the two branch channels are connected in a staggered mode and are subjected to characteristic diagram splicing and fusion. The staggered connection is used for realizing the cross-channel interaction and transmission of the characteristic information between different convolution layers; the characteristic diagram splicing and fusion are used for optimizing the flow of information, improving the diversity of the series connection integrated components and improving the expression capacity of the network.

Step 2.3: and the output of the two branch channels and the splicing and fusion result of the last layer of feature map are subjected to integral fusion output to complete the establishment of the template module.

And step 3: and constructing a structured sparse convolution kernel by using a small convolution kernel and a grouping convolution strategy. Wherein, the step 3 comprises the following specific steps:

step 3.1: in the template module, the bottleneck module is composed of a stack of 1 × 1 convolutional layers and 3 × 3 convolutional layers.

Step 3.2: as shown in fig. 2, given that the number of output feature maps of the previous layer is N, i.e. the number of channels is equal to N, and the number of packets of the packet convolution is G, the operation process of the packet convolution layer includes:

step 3.2.1: the channels are divided into G parts, each group is independently connected with the group corresponding to N/G branch channels, and the group convolution is executed on a 3 multiplied by 3 convolution layer in the bottleneck module.

Step 3.2.2: and fusing the grouped convolutions of different branch channels, and transmitting the fused grouped convolutions to the next layer of convolution so as to realize information exchange between different groups and different convolution layers.

And 4, step 4: a Deep interleaved fused packet convolutional Network (DN-IFR) is formed by stacking a plurality of template modules and structured sparse convolution kernels. The present embodiment is implemented for the practice of CIFAR data, and the constructed convolution network is composed of a plurality of template modules, and only 3 template modules with the same structure are illustrated in fig. 3. The construction process comprises the following specific steps:

step 4.2: down-sampling the output of each template module by using pooling layers with the size of 2 x 2 and the step size of 2 in sequence, and adding a random inactivation layer (Dropout) to prevent overfitting; the first pooling layer of the deep interleaving fusion packet convolution network adopts maximum pooling, and the rest is uniform pooling;

step 4.3: after performing pooling at the tail end of the last template module, a classifier (softmax) is added, and the feature map sizes in the three template modules are 32 × 32, 16 × 16 and 8 × 8, respectively.

The convolutional network of the present invention consists of a series of template modules that have the same topology and follow two simple rules: 1) if the spatial sizes of the feature maps output by the template modules are the same, the template modules share the same hyper-parameters (width, convolution kernel size, etc.); 2) the width of the template module (i.e., the number of channels) is doubled each time the spatial size of the feature map is double down sampled. According to the rule, only one template module needs to be designed, and all modules in the network can be determined. This simple, highly modular network architecture design allows for the scale of the transformation to be adjusted at will without specially tailored modules, and reduces the freedom of choice of hyper-parameters, acting as a reduced model. A highly modular deep network can be formed by repeatedly stacking template modules, the performance of the deep network in turn typically depends on the capacity of each serially integrated component and how many template modules, and the depth of the network helps to guarantee how many template modules.

In this embodiment, consider that there is one image x₀Through the convolution network, the network comprises l layers, and each layer of the network can realize nonlinear transformation H_l(. cndot.). Here the non-linear transformation H_l(. cndot.) is defined as a composite function of three successive operations: the Convolution operation (Convolution), Batch regularization (BN), and Linear rectification activation function (ReLU) are included in this order.

In order to verify the effectiveness and superiority of the deep interleaving fusion packet convolution network, through a CIFAR data experiment, DN-IFR of the invention is compared with the existing ResNet and Wide ResNe. The CIFAR dataset contains CIFAR-10 and CIFAR-100, both datasets having 60000 images of 32X 32 pixels, of which 50000 images were used for training and 10000 images were used for testing. The CIFAR-10 dataset has 10 classes, each class containing 6000 images, 5000 for training and 1000 for testing. The CIFAR-100 dataset has 100 classes, each containing 600 images, 500 for training and 100 for testing.

In conjunction with fig. 4, the effect of different packet numbers on the DN-IFR classification error rate and computational complexity is analyzed, as shown in fig. 4. First, the number of branch paths L of the network is fixed to 2, and the number of network layers is fixed to 22. As can be seen from fig. 4, when the number of packets is (4,4,4), DN-IFR can achieve a lower classification error rate than the original model. However, as the number of packets increases, the computational complexity of the network tends to decrease and compress approximately 50% of the computational load, but the classification error rate tends to increase. Therefore, (4,4,4) will be adopted as the standard packet number in the subsequent experiments on the premise of no damage to the network performance.

Analyzing the influence of different branch channel numbers on the classification error rate and the calculation complexity of DN-IFR, as shown in Table 1, wherein "-" indicates that no relevant calculation is performed, "fail" indicates that the calculation exceeds the memory of the computer, and "FLOPs" indicates the calculation complexity. From table 1, it can be seen that: (1) when the networks have the same depth, the network computation complexity tends to increase along with the increase of the number of the branch channels, and the lowest classification error rate is positioned on the network with 3 branch channels; (2) as networks increase with depth, the computational complexity of all networks tends to increase, but the classification error rate tends to decrease except for networks with 2-branch channels. This shows that networks with 2 branch channels can achieve a balance between the number of branch channels and the depth, with approximately the same computational complexity.

TABLE 1 DN-IFR Performance comparison (CIFAR-10) for different numbers of tributary channels

To evaluate the performance comparison of various deep convolutional networks for image classification, as shown in table 2, where "width × 4" represents a network width factor of 4 and "L" represents the number of branch channels. From table 2, it can be seen that: (1) in CIFAR-10 data, DN-IFR of 34 layers is compressed by 10 times and 2 times in parameter quantity compared with Wide ResNet of 40 layers and ResNet of 110 layers respectively, and the lowest classification error rate is obtained; (2) DN-IFR reduced the error rate by nearly 10% compared to ResNet on CIFAR-100 data. Therefore, the DN-IFR is adopted to classify the image data, so that the network parameter quantity is reduced, and the classification accuracy is improved.

TABLE 2 comparison of Performance of various deep convolutional networks for image classification

The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims

1. An image classification method based on a depth interleaving fusion packet convolution network is characterized by comprising the following steps:

and 4, step 4: forming a deep interleaved fused packet convolutional network by stacking a plurality of template modules and a structured sparse convolutional kernel; if the spatial sizes of the feature maps output by the template modules are the same, the template modules share the same hyper-parameter; the width of the template module is doubled each time the spatial dimension of the feature map is double down sampled;

and 5: training the deep interleaved fusion packet convolution network by adopting a small batch random gradient descent method with momentum coefficients, analyzing to obtain network parameters, and finishing image classification;

the template module comprises three layers: scaling layer, bottleneck module, pooling layer, said step 2 includes the following concrete step:

step 2.3: and the output of the two branch channels and the splicing and fusing result of the last layer of feature map are subjected to integral fusion output, and the establishment of the template module is completed.

2. The image classification method based on the deep interleaved fused packet convolutional network of claim 1, wherein the step 3 comprises the following specific steps:

3. The image classification method based on the deep interleaved fused packet convolutional network of claim 2, wherein the step 4 comprises the following specific steps: