CN112001431B

CN112001431B - Efficient image classification method based on comb convolution

Info

Publication number: CN112001431B
Application number: CN202010803534.3A
Authority: CN
Inventors: 周圆; 李丹丹; 霍树伟; 曹涛
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2020-08-11
Filing date: 2020-08-11
Publication date: 2022-06-28
Anticipated expiration: 2040-08-11
Also published as: CN112001431A

Abstract

The invention relates to a high-efficiency image classification method based on comb convolution, which comprises the following steps: preparing a data set and preprocessing the data; training a comb convolution-based deep convolutional neural network, comprising the following aspects: (1) comb convolutional layer internal operation: the linear mapping with low calculation cost is inserted into the standard convolution mapping in a parallel and staggered manner, so that one part of feature points on the output feature map are obtained by the standard convolution mapping, and the other part of feature points are obtained by the linear mapping; (2) interleaving mapping operation between comb convolution layers; (3) the whole network structure of the high-efficiency network model based on comb convolution; (4) training; and testing the classification effect of the model.

Description

Efficient image classification method based on comb convolution

Technical Field

The invention relates to the field of image processing and computer vision, in particular to a comb convolution-based efficient image classification method.

Background

Image classification belongs to the basic problem of various related research works in the field of computer vision, and is the basic composition of computer vision tasks. Image classification refers to the classification of an input image into a plurality of predefined classes, i.e. the input image is labeled with corresponding labels. In recent years, deep learning performs image classification by multi-layer information processing of a hierarchical structure, and compared with a conventional image classification method of learning shallow layers or obtaining image bottom-layer features, deep learning completely learns the hierarchical structural features of images from training data by using a set network structure, and can extract abstract features closer to high-level semantics of the images, so that the expression of image classification is far superior to that of the conventional method.

The image classification model based on the deep convolutional neural network has strong capability of hierarchically learning data characteristics. At present, the methodImage classification algorithms based on deep convolutional neural network models have become the main research algorithms in the field. However, as the performance of convolutional neural network models continues to approach the accuracy limit of computer vision tasks, the computational resources consumed also increase dramatically. For example, AlexNet^[1]Require 1.4X 10¹⁰FLOPs to process a single 224 x 224 size image, and ResNet-152^[2]Requires 2.26X 10¹¹FLOPs. Therefore, in order to meet the requirements of the industry on the aspects of calculation overhead, real-time performance and the like, designing an efficient image classification method becomes a problem to be solved at present.

Recent studies of acceleration of convolution operations have focused on reducing redundancy of connections between feature map channels. Previous work, such as ResNeXt, has been done by building a structured sparse relationship of feature maps between channel dimensions^[3]、Xception^[4]、Shufflenet^[5]、MobileNets^[6]、Deep roots^[7]、CondenseNet^[8]And IGCNets^[9]And the model efficiency is improved consistently. However, the above work ignores that there is a large amount of connection redundancy in the spatial dimension of the feature maps of adjacent layers, and further improvement of the model efficiency is limited.

Reference documents:

[1]Krizhevsky A,Sutskever I,Hinton G.ImageNet Classification with Deep Convolutional Neural Networks[C]//NIPS.Curran Associates Inc.2012.

[2]He K,Zhang X,Ren S,et al.Deep Residual Learning for Image Recognition[J].2015.

[3]Xie S,Girshick R,Dollár,Piotr,et al.Aggregated Residual Transformations for Deep Neural Networks[J].2016.

[4]Chollet F.Xception:Deep Learning with Depthwise Separable Convolutions[J].2016.

[5]Zhang X,Zhou X,Lin M,et al.ShuffleNet:An Extremely Efficient Convolutional Neural Network for Mobile Devices[J].2017.

[6]Howard A G,Zhu M,Chen B,et al.MobileNets:Efficient Convolutional Neural Networks for Mobile Vision Applications[J].2017.

[7]Ioannou Y,Robertson D,Cipolla R,et al.Deep Roots:Improving CNN Efficiency with Hierarchical Filter Groups[J].2016.

[8]Huang G,Liu S,Laurens V D M,et al.CondenseNet:An Efficient DenseNet using Learned Group Convolutions[J].2017.

[9]Zhang T,Qi G J,Xiao B,et al.Interleaved Group Convolutions for Deep Neural Networks[J].2017.

Disclosure of Invention

In order to solve the above problems, the present invention provides an efficient image classification method based on comb convolution, which can significantly reduce the computation overhead generated by a network model during image classification by inserting linear mapping into standard convolution mapping in parallel and in an interlaced manner, so that the network model can classify images more efficiently and obtain a considerable classification effect. Meanwhile, the method provided by the invention can carry out end-to-end training, and is simple and efficient. Compared with the prior art, the method can solve the problem of intra-layer calculation redundancy, is compatible with an image classification method for reducing the inter-layer calculation redundancy, and has better expansibility. The technical scheme is as follows:

a high-efficiency image classification method based on comb convolution comprises the following steps:

firstly, preparing a data set and preprocessing the data;

secondly, training the comb convolution-based deep convolutional neural network, which comprises the following aspects:

(1) comb convolutional layer internal operation: the parallel and staggered insertion of the linear mapping with low calculation cost into the standard convolution mapping enables the feature points of one part on the output feature graph to be obtained by the standard convolution mapping and the feature points of the other part to be obtained by the linear mapping, and the method comprises the following steps:

The characteristic diagram x of the l-1 layer^l-1While passing through standard convolution mapping

And linearly mapping mu (-) to obtain the l layer characteristic diagramx^lThe characteristic diagram of the jth channel of the ith layer is shown as

Which is compared with the l-1 layer characteristic diagram x^l-1The mapping relationship of (c) is defined as follows:

wherein the content of the first and second substances,

representing a profile corresponding to the jth channel of the ith layer

The convolution kernel of (a) is performed,

representation feature diagram x^l-1And convolution kernel

Performing standard convolution mapping operation, wherein 1 represents an all-1 matrix with the same resolution as the characteristic diagram of the jth channel of the ith layer;

representing a profile corresponding to the jth channel of the ith layer

The binary chessboard mask matrix of (2) is defined as follows:

wherein, (p, q) represents the characteristic diagram of the jth channel of the ith layer

Coordinates and signs of each point on

Representing a set of non-negative integers; at the same time, mu (x)^l-1) A linear mapping operation is represented, defined as follows:

wherein, C^l-1Representing the total number of channels of the l-1 level feature map;

when convolution operation is performed

Denoted as a standard depth separable convolution mapping operation and linear mapping as μ (x)^l-1)＝x^l-1Then, the comb convolutional layer is converted into a comb convolutional layer fused with the depth separable convolution;

negating the binary chessboard mask matrix, repeating the calculation process of the formula (1) to obtain the characteristic diagram of the j +1 th channel of the l-th layer

And layer l-1 feature map x ^l-1The mapping relationship of (c); by analogy, the binary chessboard mask matrix is continuously inverted to finally obtain the first layer characteristic diagram x^lAnd layer l-1 feature map x^l-1The overall mapping relationship of (2);

(2) inter-layer interleaving mapping operation of comb convolutional layers: the comb convolution operations are stacked while continuously inverting the binary mask tensor of the previous layer, i.e.

Mask^l+1＝1-Mask^l (4)

Wherein, Mask^lAnd Mask^l+1Mask tensors of l-th and l + 1-th layers, 1 being Mask^lAll 1 tensors of the same size dimension; repeating the operation process of (1) comb-type convolution in the middle layer;

(3) the comb convolution-based high-efficiency network model overall network structure is as follows: based on the Xception network, 28 convolution layers are included in total, namely 2 layers of comb convolution layers and 26 layers of comb convolution layers which are fused with depth separable convolution, and an input image is input into a plurality of mutually stacked modules after the characteristics of the 2 layers of comb convolution layers are extracted; each module comprises 2 cascaded comb convolution layers fused with the depth separable convolution and 12 multiplied by 2 maximum pooling layers, 12 modules are stacked in total, and each module adopts residual connection operation in ResNet; finally, after 2 layers of comb-shaped convolution layers which are combined with depth separable convolution are processed, global average pooling is carried out, and then classification results are output through a full-connection layer; all the used convolution kernels have the size of 3 multiplied by 3, the sliding step length of the convolution kernels is 1, and a ReLU activation function is adopted;

(4) And inputting the image after data preprocessing into a constructed efficient network model based on comb convolution, setting parameters of the neural network model, updating the parameter weight of the network once after forward conduction and backward conduction, and iterating to obtain the trained efficient image classification network.

And thirdly, testing the classification effect of the model.

Drawings

FIG. 1 is a schematic diagram of an algorithm

FIG. 2 is a flow chart of TensorFlow-based implementation

FIG. 3 Overall network architecture diagram

FIG. 4 algorithm execution results

Detailed Description

In order to make the technical scheme of the invention clearer, the invention is further explained with reference to the attached drawings. The invention is realized by the following steps:

first, a data set is prepared and data is preprocessed.

The proposed method uses the CIFAR-100 classification dataset to train the image classification model. The CIFAR-100 dataset consists of 60000 32 × 32 color images, divided into 100 small categories according to image content, including: airplan, automobile, bird, cat, deer, dog, frog, horse, ship, hip, and truck, 10 subclasses under the 10 major classes, and the intersection between the classes is empty. Each class has 600 images per class, 500 training images and 100 test images per class. The experimental implementation used a standard data enhancement scheme, i.e., zero-padding the image by 4 pixels per edge, randomly cropping to generate an image of 32 × 32 size, and horizontally flipping with a probability of 0.5. During the training process, 5% of the training images were randomly drawn as the validation set.

And secondly, training the comb convolution-based deep convolutional neural network.

The proposed comb convolution principle is mainly divided into two major parts, see fig. 1, and its principle details are as follows:

1. comb convolutional layer internal operation:

linear mapping with low calculation cost is parallelly and alternately inserted into the standard convolution mapping by the operation in the comb convolution layer, so that half of feature points on an output feature map are obtained by the standard convolution mapping, and the other half of feature points are obtained by the linear mapping, thereby greatly reducing the operation expense of the convolution operation. The specific design is as follows:

(1) the l-1 layer characteristic diagram x^l-1While passing through standard convolution mapping

And linear mapping mu (-) to obtain the l-th layer feature map x^l. Wherein, the characteristic diagram of the jth channel of the ith layer is shown as

Which is related to the l-1 layer characteristic diagram x^l-1The mapping relationship of (2) is defined as follows:

wherein the content of the first and second substances,

representing a profile corresponding to the jth channel of the ith layer

The convolution kernel of (a) is performed,

representation feature diagram x^l-1And convolution kernel

A standard convolution mapping operation is performed with 1 representing the full 1 matrix at the same resolution as the profile for the jth channel of the ith layer.

Representing a profile corresponding to the jth channel of the ith layer

The binary chessboard mask matrix of (2) is defined as follows:

Wherein (p, q) represents the characteristic diagram of the jth channel of the ith layer

Coordinates and signs of each point on

Representing a set of non-negative integers. At the same time, mu (x)^l-1) A linear mapping operation is represented, defined as follows:

wherein, C^l-1The total number of channels of the l-1 level profile is shown.

In particular, when convolution operations

Denoted as a standard depth separable convolution mapping operation and linear mapping as μ (x)^l-1)＝x^l-1The comb convolutional layer is then converted into a comb convolutional layer fused with the depth separable convolution.

(2) Negating the binary chessboard mask matrix, repeating the calculation process of the formula (1) to obtain the characteristic diagram of the j +1 th channel of the l-th layer

And layer l-1 feature map x^l-1The mapping relationship of (2). By analogy, continuously negating the binary chessboard mask matrix to finally obtain the characteristic diagram x of the l-th layer^lAnd layer l-1 feature map x^l-1The overall mapping relationship of (1).

2. Comb convolutional layer inter-layer interleaving mapping operation:

stacking the comb convolution operation obtained in the above process, and continuously inverting the binary mask tensor of the previous layer, that is

Mask^l+1＝1-Mask^l (4)

Wherein, Mask^lAnd Mask^l+1Mask tensors of l-th and l + 1-th layers, 1 being Mask^lAll 1 tensors with the same size dimension. And repeating the operation process of the comb convolution in the layer 1 to finally obtain the comb convolution-based efficient network model. An equivalent TensorFlow framework based efficient comb convolution implementation flow is shown in FIG. 2.

The overall network structure of the experiment is shown in figure 3. The overall network structure is inspired by the Xception network and comprises 28 convolutional layers in total, namely 2 comb convolutional layers and 26 comb convolutional layers fused with depth separable convolutions. After the characteristics of the input image are extracted by 2 layers of comb convolution layers, the input image is input into a plurality of mutually stacked modules. Each module comprises 2 cascaded comb convolution layers fused with the depth separable convolution and 12 multiplied by 2 maximum pooling layers, 12 modules are stacked in total, and each module adopts residual connection operation in ResNet. And finally, after 2 layers of comb-shaped convolution layers with separable convolution and depth fusion are carried out, global average pooling is carried out, and then a classification result is output through a full-connection layer. The convolution kernels used are all 3 x 3 in size, the convolution kernel sliding step is 1, and the ReLU activation function is used.

In specific operation, the image after data preprocessing is input into the well-built neural network model and is setSetting the initial learning rate of the neural network model to be 0.1, adopting a Momentum optimization method with the parameter of 0.9, and setting the weight attenuation rate to be 10^-4And after the training process reaches 50% and 75%, the learning rate is reduced by 10 times, the He initialization method is adopted for parameter initialization, and the size of the training batch is 100. And updating the parameter weight of the network once after forward conduction and backward conduction, and obtaining the trained efficient image classification network after 300 iterations.

And thirdly, testing the classification effect of the model.

During testing, the test set image is input into the network model, and please refer to fig. 4 for the test result. The invention compares the model classification results before and after the high-efficiency comb convolution. From the experimental effect, the method can effectively reduce the operation complexity of the model and obtain a considerable classification effect. Meanwhile, the method provided by the invention can carry out end-to-end training, and is simple and efficient. In addition, the method can solve the problem of intra-layer calculation redundancy, is compatible with an image classification method for reducing the inter-layer calculation redundancy, and has better expansibility.

Claims

1. A high-efficiency image classification method based on comb convolution comprises the following steps:

firstly, preparing a data set and preprocessing the data;

The l-1 layer characteristic diagram x^l-1While passing through standard convolution mapping

And linear mapping mu (-) to obtain the l-th layer feature map x^lCharacteristic diagram of jth channel of ith layerShown as

wherein the content of the first and second substances,

representing a profile corresponding to the jth channel of the ith layer

The convolution kernel of (a) is performed,

representation feature diagram x^l-1And convolution kernel

representing a profile corresponding to the jth channel of the ith layer

The binary chessboard mask matrix of (2) is defined as follows:

Coordinates and signs of each point on

when convolution operation is performed

Denoted as a standard depth separable convolution mapping operation and linear mapping as μ (x)^l-1)＝x^l ^-1Then, the comb convolutional layer is converted into a comb convolutional layer fused with the depth separable convolution;

(2) comb convolutional layer inter-layer interleaving mapping operation: the comb convolution operations are stacked while the binary mask tensor of the previous layer is continuously inverted, i.e.

Mask^l+1＝1-Mask^l (4)

(3) establishing a comb convolution-based efficient network model;

(4) training a high-efficiency image classification network;

and thirdly, testing the classification effect of the model.

2. The method of claim 1, wherein the efficient network model based on comb convolution is established by: based on the Xception network, 28 convolution layers are included in total, namely 2 layers of comb convolution layers and 26 layers of comb convolution layers which are fused with depth separable convolution, and an input image is input into a plurality of mutually stacked modules after the characteristics of the 2 layers of comb convolution layers are extracted; each module comprises 2 cascaded comb convolution layers fused with the depth separable convolution and 12 multiplied by 2 maximum pooling layers, 12 modules are stacked in total, and each module adopts residual connection operation in ResNet; finally, after 2 layers of comb-shaped convolution layers which are combined with depth separable convolution are processed, global average pooling is carried out, and then classification results are output through a full-connection layer; the convolution kernels used are all 3 x 3 in size, the convolution kernel sliding step is 1, and the ReLU activation function is used.

3. The method of claim 1, wherein training the efficient image classification network is performed by: and inputting the image after data preprocessing into a constructed efficient network model based on comb convolution, setting parameters of the neural network model, updating the parameter weight of the network once after forward conduction and backward conduction, and iterating to obtain the trained efficient image classification network.