CN114565792A

CN114565792A - Image classification method and device based on lightweight convolutional neural network

Info

Publication number: CN114565792A
Application number: CN202210189921.1A
Authority: CN
Inventors: 王天江; 张量奇; 沈海波; 罗逸豪; 曹翔; 潘蕾西兰
Original assignee: Huazhong University of Science and Technology
Current assignee: Huazhong University of Science and Technology
Priority date: 2022-02-28
Filing date: 2022-02-28
Publication date: 2022-05-31

Abstract

The invention discloses an image classification method and device based on a lightweight convolutional neural network, and belongs to the field of image classification in computer deep learning. The invention comprises the following steps: s1: constructing a lightweight convolutional neural network model, wherein the lightweight neural network model comprises the following components in sequential connection: standard convolution layer, a plurality of sampling concatenation unit, global pooling layer and full tie layer, the sampling concatenation unit is including connecting gradually: a downsampling layer, a plurality of generic layers, and a splicing layer; s2: and inputting the picture to be classified into the lightweight convolutional neural network model to obtain a classification result. According to the method, the lightweight convolutional neural network model with low parameter number, low calculation amount and high inference speed is constructed, and then the lightweight convolutional neural network model is used for image classification, so that compared with the prior lightweight convolutional neural network model, the method greatly reduces the parameter number and greatly improves the classification speed of the model under the similar classification accuracy.

Description

Image classification method and device based on lightweight convolutional neural network

Technical Field

The invention belongs to the field of deep learning image classification, and particularly relates to an image classification method and device based on a lightweight convolutional neural network.

Background

In recent years, Convolutional Neural Networks (CNNs) have been widely used in computer vision, such as image classification, and in order to improve the accuracy of classification, the depth and width of CNN models are rapidly increased, so that the number of model parameters and the amount of computation are rapidly increased, which hinders the application of CNNs to devices with weak computing power.

For applications on mobile or embedded devices, lightweight models are an important approach. At present, a method for constructing a model by a layer of deep Convolution and a layer of point-by-point Convolution based on a Deep Separable Convolution (DSCs) algorithm and searching for an optimal Architecture by using a Neural Architecture Search (NAS) algorithm has achieved remarkable success, such as a MobileNet series and an EfficientNet series. These models all use depth separable convolution instead of standard convolution, and the parameters and the calculation amount of the depth separable convolution are reduced by several times compared with the standard convolution; after the backbone of the network architecture is determined, the NAS algorithm is used for searching the optimal model again. Such as the width of each layer, etc., so that the classification accuracy of the model is guaranteed with a reduced number of model parameters and computations.

The method greatly reduces the parameter amount and the calculation amount, but the inference speed on the GPU is not improved or even reduced compared with the classical network such as ResNet. The reason for this is that the GPU resources cannot be better utilized in the deep separable convolution, and these networks use more network layers than the classical networks with the same accuracy, including more nonlinear activation layers and batch normalization layers, which all affect the inference speed of the model.

Disclosure of Invention

In view of the above drawbacks or needs for improvement of the prior art, the present invention provides an image classification method and apparatus based on a lightweight convolutional neural network, which aims to design a lightweight neural network model comprising sequentially connected: standard convolution layer, a plurality of sampling concatenation unit, global pooling layer and full tie layer, the sampling concatenation unit is including connecting gradually: a downsampling layer, a plurality of generic layers, and a splicing layer; inputting the pictures to be classified into the lightweight convolutional neural network model to obtain a classification result; therefore, the inference speed of the model is improved while the parameter quantity of the convolutional neural network model is reduced.

To achieve the above object, according to one aspect of the present invention, there is provided a method of designing a lightweight convolutional neural network architecture, comprising:

s1: constructing a lightweight convolutional neural network model, wherein the lightweight neural network model comprises the following components in sequential connection: standard convolution layer, a plurality of sampling concatenation unit, global pooling layer and full-link layer, the sampling concatenation unit is including connecting gradually: a downsampling layer, a plurality of generic layers, and a splicing layer;

s2: inputting the picture to be classified into the lightweight convolutional neural network model to obtain a classification result, wherein the classification result comprises the following steps:

s21: expanding the channels of the pictures to be classified into the specified channel number by using the standard convolution layer so as to obtain an original characteristic diagram;

s22: utilizing a downsampling layer in a first sampling splicing unit to downsample the original feature map to obtain two groups of first feature maps, then utilizing a plurality of general layers to respectively perform feature extraction on the two groups of first feature maps to obtain respectively corresponding second feature maps, and utilizing the splicing layer to splice the two groups of second feature maps to obtain a first target feature map; inputting the first target feature map into a second adjacent sampling and splicing unit so as to perform downsampling, feature extraction and splicing on the first target feature map to obtain a second target feature map; inputting the second target feature map into a third adjacent sampling splicing unit, and so on until the final sampling splicing unit outputs a final target feature map;

s23: and inputting the final target characteristic diagram output by the last sampling and splicing unit into the global pooling layer, reducing the dimensionality, and inputting the final target characteristic diagram into a full-connection layer, so that the full-connection layer outputs the classification result corresponding to the classification picture.

In one embodiment, the down-sampling layer is configured to include, connected in sequence: a Gaussian down-sampling layer, a point-by-point convolution layer, a nonlinear activation layer and a batch normalization layer; the down-sampling layer outputs two groups of output characteristic graphs;

the input feature map is subjected to Gaussian down-sampling layer to obtain a group of output feature maps;

and sequentially inputting the input characteristic diagram into a Gaussian down-sampling layer, a point-by-point convolution layer, a nonlinear activation layer and a batch normalization layer in the down-sampling layer to obtain another group of output characteristic diagrams.

In one embodiment, a gaussian down-sampling layer in the down-sampling layers performs convolution operation on the feature map output by the previous layer, and the resolution of the output feature map is half of that of the input feature map.

In one embodiment, the point-by-point convolution layer in the down-sampling layer expands or contracts the input feature map according to the number of feature channels input and output.

In one embodiment, a ReLU is used as the activation function in a non-linear activation layer in the downsampling layer.

In one embodiment, the generic layer comprises, connected in sequence: the device comprises a depth convolutional layer, a splicing layer, a point-by-point convolutional layer, a nonlinear activation layer and a batch standardization layer;

inputting a group of characteristic graphs and a group of characteristic graphs into each universal layer; b, inputting the characteristic maps into the depth convolution layer to carry out convolution to obtain c characteristic maps; inputting the a group of characteristic diagrams and the c group of characteristic diagrams into a splicing layer, a point-by-point convolution layer, a nonlinear activation layer and a batch standardization layer in the general layer to obtain a new b group of characteristic diagrams;

and the b group feature map is used as a new a group feature map and the new b group feature map is used as the input of the next adjacent general layer.

According to another aspect of the present invention, there is provided an image classification apparatus based on a lightweight convolutional neural network, including:

the building module is used for building a lightweight convolutional neural network model, and the lightweight neural network model comprises the following components in sequential connection: standard convolution layer, a plurality of sampling concatenation unit, global pooling layer and full tie layer, the sampling concatenation unit is including connecting gradually: a downsampling layer, a plurality of generic layers, and a splicing layer;

the classification module is used for inputting the picture to be classified into the lightweight convolutional neural network model to obtain a classification result, and specifically comprises the following steps:

expanding the channels of the pictures to be classified into the specified channel number by utilizing the standard convolutional layer so as to obtain an original characteristic diagram;

utilizing a downsampling layer in a first sampling splicing unit to downsample the original feature map to obtain two groups of first feature maps, then utilizing a plurality of general layers to respectively perform feature extraction on the two groups of first feature maps to obtain respectively corresponding second feature maps, and utilizing the splicing layer to splice the two groups of second feature maps to obtain a first target feature map; inputting the first target feature map into a second adjacent sampling and splicing unit so as to perform downsampling, feature extraction and splicing on the first target feature map to obtain a second target feature map; inputting the second target characteristic diagram into a third adjacent sampling splicing unit, and repeating the steps until the last sampling splicing unit outputs a final target characteristic diagram;

and inputting the final target characteristic diagram output by the last sampling and splicing unit into the global pooling layer, and inputting the final target characteristic diagram into the full connection layer after the global pooling layer has low dimensionality so that the full connection layer outputs the classification result corresponding to the classification picture.

Generally, compared with the prior art, the above technical solution conceived by the present invention has the following beneficial effects:

according to the method, a lightweight convolutional neural network model with low parameter, low calculation amount and high inference speed is constructed by modifying a network unit structure, and then the lightweight convolutional neural network model is used for carrying out picture classification; the image classification method has the advantages that the parameter number and the calculated amount of the model are low, the inference speed on the GPU is higher, and the image classification efficiency can be improved by utilizing the model to classify the images.

Drawings

FIG. 1 is a diagram illustrating a partial convolution kernel of a visualized MobileNet V2 downsampled layer, the convolution kernel having a size of 1 × 3 × 3 in accordance with an embodiment of the present invention;

FIG. 2 is a diagram illustrating a partial convolution kernel of a non-downsampled layer of EfficientNet-B0 visualized in an embodiment of the present invention, where the convolution kernel size is 1 × 5 × 5;

FIG. 3 is a partially similar feature view of

layers

5 and 7 of adjacent layers of RegNetX-400MF visualized in one embodiment of the invention;

FIG. 4 is a schematic diagram of a downsampling layer constructed in an embodiment of the present invention;

FIG. 5 is a schematic diagram of a generic layer constructed in one embodiment of the invention;

fig. 6 is a flowchart of an image classification method based on a lightweight convolutional neural network according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.

The design basis of the lightweight convolutional neural network architecture is provided through the visual characteristic diagram and the convolutional kernel. It should be noted that the visualized network may be any depth separable convolutional neural network with trained parameters; and performing normalization operation on the characteristic diagram and the convolution kernel in the visualization process, and if the central value of the convolution kernel is a negative number, multiplying the whole convolution kernel by-1.

According to 1, as shown in fig. 1, most of convolution kernels in a downsampled layer (a depth convolution layer with a step size of 2) are approximate to a gaussian blur convolution kernel, and a blurring operation performed before downsampling conforms to a sampling theorem, so that in a network constructed by the method, the gaussian blur kernel is used for replacing the convolution kernels in the downsampled layer. It should be noted that fig. 1 is composed of convolution kernels of 3 × 3 size (each smallest square represents one pixel, 9 pixels), and darker pixel values are larger. Many of these 3x3 convolution kernels are similar to gaussian convolution kernels, i.e., two gaussian kernels with different variances. So an attempt can be made to use gaussian convolution kernels instead.

According to 2, as shown in fig. 2, most of the convolution kernels in the depth convolution layers except the downsampling layer are approximate to the identical kernel (namely, only the convolution kernel with a value in the center), so that the identical kernel is used to replace part of the depth convolution kernels in the constructed network, which is equivalent to directly removing the convolution kernels of part of the depth convolution layers. Note that to improve accuracy and computational efficiency, the nonlinear activation layer and batch normalization layer are no longer used after the deep convolutional layer. It should be noted that fig. 2 also includes a plurality of convolution kernels of 3 × 3. Many of these convolution kernels are similar to the Identity Kernel (Identity Kernel, with 1 in the middle and 0 in the others) of 3x3, and the Identity Kernel can be directly removed, thereby reducing the amount of computation.

According to 3, as shown in fig. 3, a large part of feature maps output by adjacent layers are divided into similar repeated feature maps, so that partial feature maps are multiplexed in the adjacent layers through identity mapping, and the multiplexed feature maps do not participate in the next deep convolution operation, which ensures that the network width is not changed and reduces the parameter number and the calculation amount by half.

The invention provides an image classification method based on a lightweight convolutional neural network, which comprises the following steps:

s1: constructing a lightweight convolutional neural network model, wherein the lightweight neural network model comprises the following components in sequential connection: standard convolution layer, a plurality of sampling concatenation unit, global pooling layer and full tie layer, the sampling concatenation unit is including connecting gradually: a downsampling layer, a plurality of generic layers, and a splicing layer;

As shown in fig. 4, the down-sampling layer provided by the present application is composed of 4 layers: gaussian down-sampling layer (depth convolution layer with step length of 2 after being replaced by Gaussian convolution kernel), point-by-point convolution layer, nonlinear activation layer, batch normalization layer. The Gaussian down-sampling layer performs convolution operation on the feature map output by the upper layer, and the resolution of the output feature map is half of that of the input feature map; expanding or contracting the input characteristic diagram by the point-by-point convolution layer according to the number of input and output characteristic channels; the nonlinear activation layer and the batch normalization layer respectively carry out nonlinear activation and normalization operation on the output characteristics of the previous layer; finally, the output characteristic diagram of the Gaussian down-sampling layer and the output characteristic diagram of the batch normalization layer are used together as the output of the unit. In particular, ReLU is used as the activation function in the nonlinear activation layer.

In one embodiment, the point-by-point convolution layer in the down-sampling layer expands or contracts the channel of the input characteristic according to the number of the characteristic channels of the input and output.

inputting a group of characteristic graphs and a group of characteristic graphs into each universal layer; b, inputting the characteristic maps into the depth convolution layer to carry out convolution to obtain c characteristic maps; inputting the a group of characteristic diagrams and the c group of characteristic diagrams into a splicing layer, a point-by-point convolution layer, a nonlinear activation layer and a batch normalization layer in the general layer to obtain a new b group of characteristic diagrams;

and the new b-group feature map is used as the input of the next adjacent general layer as the new round a-group feature map and the input b-group feature map in the new round.

As shown in fig. 5, the common layer is composed of 5 layers in total: deep convolutional layer, splicing layer, point-by-point convolutional layer, nonlinear activation layer, batch normalization layer. According to the criterion 2 in the step one, half of convolution kernels in the depth convolution layer are removed, namely the depth convolution layer only carries out convolution operation on the input 2 nd group of feature maps; the splicing layer splices the 1 st group of feature maps input by the unit and the feature maps output by the depth convolution layer into 1 group as output; according to the reference 3 in the step one, in order to multiplex the characteristic diagrams and reduce the parameter number, the output characteristic diagrams of the splicing layer are processed by the convolution layer point by point and the number of channels is reduced by half, and then the processing is carried out by a nonlinear activation layer and a batch normalization layer; finally, the unit takes the input 2 nd set of feature maps as the output 1 st set of feature maps, and takes the output of the point-by-point convolution layer as the 2 nd set of output feature maps of the unit.

In the present invention, a lightweight neural network is constructed using a downsampled layer (down block) and a generic layer (HalfConvBlock), as shown in fig. 6:

1. expanding the input picture channels to a specified number of channels using standard convolution;

2. downsampling and expanding the number of channels by using a DownBlock, and outputting 2 groups of feature maps;

3. repeatedly using HalfConvBlock to extract features, wherein input and output are 2 groups of feature graphs;

4. splicing 2 groups of characteristic graphs into 1 group;

4. the operations of 2, 3 and 4 were repeated 3 times again;

5. and outputting a final classification result by using the full connection layer.

It should be noted that the final model may set the number of layers of different layers according to specific requirements (such as the number of parameters or the amount of calculation), for example, the specific details of constructing a network with the number of parameters of 1.5M are shown in table 1 (the splicing layer after each general layer is omitted in the table).

TABLE 1

When the parameter number is limited to 1.5M or 2.5M and the similar accuracy of other models is reached, the patent model greatly leads both the parameter number and the inference speed, as shown in Table 2 (accuracy on ImageNet when the patent model uses 1.5M parameters, GPU inference speed is measured on RTX 6000) and Table 3. In addition, the embodiments constructed according to the parameter quantities of 1.5M and 2.5M are only used to illustrate the technical solution of the present invention, and it should be understood by those skilled in the art that only the modifications or equivalent substitutions are made to the technical solution of the present invention, especially only the number of network layers and the number of channels are modified, without departing from the spirit and scope of the technical solution.

TABLE 2

TABLE 3

the building module is used for building a lightweight convolutional neural network model, and the lightweight neural network model comprises the following components in sequential connection: standard convolution layer, a plurality of sampling splice unit and full articulamentum, the sampling splice unit is including connecting gradually: a downsampling layer, a plurality of generic layers, and a splicing layer;

the classification module is used for inputting the pictures to be classified into the lightweight convolutional neural network model to obtain classification results, and specifically comprises the following steps:

expanding the channels of the pictures to be classified into the specified channel number by using the standard convolution layer so as to obtain an original characteristic diagram;

utilizing a downsampling layer in a first sampling splicing unit to downsample the original feature map to obtain two groups of first feature maps, then utilizing a plurality of general layers to respectively perform feature extraction on the two groups of first feature maps to obtain respectively corresponding second feature maps, and utilizing the splicing layer to splice the two groups of second feature maps to obtain a first target feature map; inputting the first target feature map into a second adjacent sampling and splicing unit so as to perform downsampling, feature extraction and splicing on the first target feature map to obtain a second target feature map; inputting the second target feature map into a third adjacent sampling splicing unit, and so on until the final sampling splicing unit outputs a final target feature map;

and inputting the final target characteristic diagram output by the last sampling and splicing unit into the global pooling layer, reducing the dimensionality, and then inputting the final target characteristic diagram into the full connection layer, so that the full connection layer outputs the classification result corresponding to the classification picture.

It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. An image classification method based on a lightweight convolutional neural network, comprising:

s1: constructing a lightweight convolutional neural network model, wherein the lightweight neural network model comprises the following components in sequential connection: standard convolution layer, a plurality of sampling splice unit and full articulamentum, the sampling splice unit is including connecting gradually: a downsampling layer, a plurality of generic layers, and a splicing layer;

s2: inputting the picture to be classified into the lightweight convolutional neural network model to obtain a classification result, and specifically comprises the following steps:

s22: utilizing a downsampling layer in a first sampling splicing unit to downsample the original feature map to obtain two groups of first feature maps, then utilizing a plurality of general layers to respectively perform feature extraction on the two groups of first feature maps to obtain respectively corresponding second feature maps, and utilizing the splicing layer to splice the two groups of second feature maps to obtain a first target feature map; inputting the first target feature map into a second adjacent sampling and splicing unit so as to perform downsampling, feature extraction and splicing on the first target feature map to obtain a second target feature map; inputting the second target characteristic diagram into a third adjacent sampling splicing unit, and repeating the steps until the last sampling splicing unit outputs a final target characteristic diagram;

s23: and inputting the final target characteristic diagram output by the last sampling and splicing unit into the global pooling layer, reducing the dimensionality, and then inputting the final target characteristic diagram into the full connection layer, so that the full connection layer outputs the classification result corresponding to the classification picture.

2. The method for image classification based on a lightweight convolutional neural network as claimed in claim 1,

the down-sampling layer is used for sequentially connecting: a Gaussian down-sampling layer, a point-by-point convolution layer, a nonlinear activation layer and a batch normalization layer; the down-sampling layer outputs two groups of output characteristic graphs;

3. The method for image classification based on a lightweight convolutional neural network as claimed in claim 2,

and a Gaussian down-sampling layer in the down-sampling layers performs convolution operation on the feature map output by the upper layer, and the resolution of the output feature map is half of that of the input feature map.

4. The method for image classification based on a lightweight convolutional neural network according to claim 2,

and the point-by-point convolution layer in the downsampling layer expands or contracts the channel of the input characteristic according to the number of the characteristic channels of input and output.

5. The method for image classification based on a lightweight convolutional neural network as claimed in claim 2,

ReLU is used as an activation function in a non-linear activation layer in the down-sampling layer.

6. The method for image classification based on a lightweight convolutional neural network as claimed in claim 1,

the general layer comprises the following components connected in sequence: the device comprises a depth convolutional layer, a splicing layer, a point-by-point convolutional layer, a nonlinear activation layer and a batch standardization layer;

inputting a group of characteristic graphs and a group of characteristic graphs into each universal layer; b, inputting the characteristic maps into the depth convolution layer to carry out convolution to obtain c characteristic maps; inputting the a group of characteristic diagrams and the c group of characteristic diagrams into a splicing layer, a point-by-point convolution layer, a nonlinear activation layer and a batch standardization layer in the general layer to obtain a b group of characteristic diagrams;

7. An image classification device based on a lightweight convolutional neural network, for performing the method of any of claims 1-6, the image classification device comprising:

the building module is used for building a lightweight convolutional neural network model, and the lightweight neural network model comprises the following components in sequential connection: standard convolution layer, a plurality of sampling splice unit and full articulamentum, the sampling splice unit is including connecting gradually: a downsampling layer, a plurality of generic layers, and a stitching layer;