CN112258431B

CN112258431B - Image classification model based on mixed depth separable expansion convolution and classification method thereof

Info

Publication number: CN112258431B
Application number: CN202011032957.6A
Authority: CN
Inventors: 闫超; 黄俊洁; 陶陶
Original assignee: Chengdu Dongfang Tiancheng Intelligent Technology Co ltd
Current assignee: Chengdu Dongfang Tiancheng Intelligent Technology Co ltd
Priority date: 2020-09-27
Filing date: 2020-09-27
Publication date: 2021-07-20
Anticipated expiration: 2040-09-27
Also published as: CN112258431A

Abstract

The invention discloses an image classification model based on mixed depth separable expansion convolution, which comprises the following construction processes: packaging the depth separable expansion convolution layer, the characteristic connecting layer, the convolution layer, the batch standardization layer and the correction linear unit layer into a mixed depth separable expansion convolution module from front to back; packaging the convolution layer, the batch normalization layer, the correction linear unit layer, the mixed depth separable expansion convolution module, the maximum value pooling layer, the flattening layer, the random inactivation layer and the full connection layer from front to back to form a trunk network of the deep neural network; carrying out random initialization on the parameter weight of the backbone network, and presetting iteration times and momentum parameters of a batch normalization layer; and optimizing parameters of the network model by adopting a random gradient descent method, and repeating iterative calculation until the loss value is converged to obtain the optimal network model. Through the scheme, the method has the advantages of simple structure, less calculation workload, accurate classification and the like.

Description

Image classification model based on mixed depth separable expansion convolution and classification method thereof

Technical Field

The invention relates to the technical field of image processing, in particular to an image classification model based on mixed depth separable expansion convolution and a classification method thereof.

Background

The images can objectively show natural objects, are important resources for people to learn the world, and technicians can obtain beneficial information by analyzing the images and develop related algorithms. Image classification belongs to the computer vision direction, and is widely applied to the fields of medical treatment, food safety and the like.

At present, the main idea of an image classification algorithm in the prior art is to assign corresponding labels to an image set to be classified, and for a computer, an image is a pixel matrix, and effective information in the pixel matrix is extracted by using a related algorithm technology, which is different from a human image recognition mode. The traditional image classification algorithm mainly comprises three steps of manual feature extraction, feature coding and classifier classification; among the commonly used manual features are Local Binary Pattern (LBP), Scale-Invariant Feature Transform (SIFT), Histogram of Oriented Gradient (HOG), and so on. The feature coding is to remove redundant feature information and improve the robustness of the feature information. The general classifiers include a Support Vector Machine (SVM) and an Adaptive Boosting classifier (AdaBoost). Due to the fact that the generalization of the manual features is poor, the application of the traditional image classification algorithm in an actual scene is influenced, and the method has no great advantages in precision and speed. With the improvement of computer computing power, the deep learning algorithm is rapidly developed, and the image classification algorithm for deep learning also starts to rise.

Alex Krizhevsky et al in 2012 proposed an Alexnet network structure to learn the ImageNet data set, to obtain the champion of the ImageNet match in 2012, from which a deep learning algorithm applied to big data image classification has been continuously developed. The image classification algorithm for deep learning has richer semantic information by utilizing the convolution neural network to extract the convolution characteristics of the samples, can enable an algorithm model to learn more efficient expression capability, and is more accurate compared with manual characteristics. However, when the image classification algorithm of the deep learning algorithm is used for training a network model, feature down-sampling operation is added for reducing feature dimensionality, and overfitting phenomena of the network, such as pooling layers, are avoided to a certain extent, so that convolution features lose detail information. Jonathan Long et al propose a full convolution network structure, in which the resolution of the feature map is increased using the deconvolution layer, recovering some detailed information to some extent but roughly. In the same year, Kaiming He et al propose a residual deep neural network, and adopt a jump connection structure to fuse deep network layer features and shallow network layer features, thereby improving the utilization rate of detail information in a feature map.

In the image classification algorithm for deep learning in the prior art, a pooling layer or a convolution layer with a step length of 2 is adopted to perform down-sampling operation on a feature image, so that the number is reduced, the receptive field is enlarged, and the semantic information of the feature image is increased, but the detailed information of the feature image is lost.

In summary, the above methods can only repair the detail information of the feature map to a certain extent, but the parameters of the network model are increased, so that the network model becomes complicated and difficult to optimize, and therefore the problem of loss of the detail information of the feature map still affects the classification detection accuracy.

In addition, the chinese invention patent with patent application number "201910818758.9" and name "industrial product defect image classification method based on lightweight deep neural network" includes: 1. preparing an industrial product image data set; 2. constructing a lightweight deep neural network; 3. inputting a defect image data set of an industrial product into a built lightweight deep neural network, extracting multi-scale features of a polarizer image through network training, and inputting the extracted features into a Softmax layer for classification to obtain a classification model; 4. inputting the test image into a classification model, inputting the probability of the image belonging to a certain class and the label corresponding to the image into an Accuracy layer, and outputting the correct classification result of the image. However, this technique has the following drawbacks:

firstly, the technology mainly fuses the features after deep separable convolution processing with the convolution kernel size of 3x3 and the features after ordinary convolution processing with the convolution kernel size of 1x1, and because the two parts have different characteristic receptive fields and different scales, the purpose of increasing the receptive field is achieved after complementation, although the receptive field size is improved, the receptive field is single, and the classification of small targets is easy to ignore;

secondly, in the technology, the maximum pooling layer is used for down-sampling the feature map, but the resolution of the feature map is not repaired, so that the detail information of the features is greatly lost, and the classification precision is reduced;

thirdly, the parallel depth separable convolution module in the technology improves the semanteme of the feature map through stacking convolution, greatly increases the parameter quantity of the network structure, is not easy to optimize, and limits the classification precision to a certain extent without combining context information for classification.

Therefore, an image classification model based on mixed depth separable expansion convolution and a classification method thereof, which have a simple structure and do not increase the calculation amount of the original model, are urgently needed to be provided, so that the loss of detail information of a feature map is reduced, the generalization and robustness of a network model are improved, and the classification accuracy of the network model is improved.

Disclosure of Invention

In view of the above problems, an object of the present invention is to provide an image classification model based on mixed depth separable dilation convolution and a classification method thereof, and the technical solution adopted by the present invention is as follows:

an image classification model based on mixed depth separable dilation convolution, wherein the image classification model is constructed by the following process:

packaging the depth separable expansion convolution layer, the characteristic connecting layer, the convolution layer, the batch standardization layer and the correction linear unit layer into a mixed depth separable expansion convolution module from front to back;

packaging the convolution layer, the batch normalization layer, the correction linear unit layer, the mixed depth separable expansion convolution module, the maximum value pooling layer, the flattening layer, the random inactivation layer and the full connection layer from front to back to form a trunk network of the deep neural network;

carrying out random initialization on the parameter weight of the backbone network, and presetting iteration times and momentum parameters of a batch normalization layer; and optimizing parameters of the network model by adopting a random gradient descent method, and repeating iterative calculation until the loss value is converged to obtain the optimal network model.

Further, the depth separable expansion convolution layers in the mixed depth separable expansion convolution module are distributed in parallel by adopting 3.

Preferably, the mixed depth separable expansion convolution modules are provided with 8 blocks according to 2, 3 and 3, and a maximum value pooling layer is arranged after any block.

Preferably, the expansion rates of the 3 depth separable expansion convolution layers from top to bottom in the mixed depth separable expansion convolution module are sequentially 1, 2 and 3, the sizes of convolution kernels are all 3 × 3, and the step lengths are all 1.

Further, convolution kernels of convolution layers in the mixed depth separable dilation convolution module are all 1 × 1 in size, and step sizes are all 1.

Further, the deactivation rate of the random deactivation layer was 0.5.

Further, the sampling kernel size of any of the maximum pooling layers is 2 and the step size is 2.

Preferably, the momentum parameter of the batch normalization layer is 0.975, and the learning rate is set to 0.1.

Preferably, the number of iterations is 30000.

An image classification method adopts an image classification model based on mixed depth separable expansion convolution for classification.

Compared with the prior art, the invention has the following beneficial effects:

(1) the invention skillfully adopts the mixed depth separable expansion convolution module which can keep the detail information of the characteristics under the condition of unchanged original network parameters and quickly increase the receptive field; in addition, the invention builds a mixed depth separable expansion convolution module, and adjusts the original convolution kernel by adjusting the expansion rate, so that the parameter quantity of the original network model is not increased, the context information in the characteristic diagram can be supplemented, the information loss caused by operations such as downsampling and the like is compensated, and the expression capability of the network model is improved.

(2) The invention provides a method for expanding the receptive field of a characteristic image by using mixed depth separable expansion convolution in the field of image classification, obtaining more context information, weakening the sensitivity of a network model to the scale change of an image target, being beneficial to distinguishing the category of the image and improving the classification precision of the network model.

(3) The invention skillfully adopts 3 parallel depth separable expansion convolution layers, when the characteristic block is input into the mixed depth separable expansion convolution module, the characteristic block is divided into three parts according to the number of channels and respectively input into the parallel depth separable expansion convolution layers, and the receptive field of the characteristic diagram is expanded by utilizing the characteristics of expansion convolution.

(4) The main part network of the invention extracts the characteristic information and sends the information to the flattening layer, the multidimensional characteristic is unidimensional, and the multi-class classification is carried out from the convolution layer to the full connection layer. The classification loss function of the invention adopts a flexible maximum loss function, and the loss value calculated by continuously optimizing the loss function during training guides the predicted value calculated by the network model to gradually approach the true value.

(5) According to the invention, the expansion rates of the 3 depth separable expansion convolution layers from top to bottom are set to be 1, 2 and 3 according to the saw-tooth structure, and then three parts of feature information are spliced into one feature block so as to enlarge the extraction range of features and mutually compensate the information, so that the network effect can be effectively avoided.

(6) According to the method, the common convolution integration characteristics are set after the depth separable expansion convolution layer, and the relation between characteristic graphs is established, so that the expression capability of the characteristic blocks can be improved, and the number of channels can be increased to increase the dimensionality of the characteristic blocks.

In conclusion, the method has the advantages of simple structure, less calculation workload, accurate classification and the like, and has high practical value and popularization value in the technical field of image processing.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention, and therefore should not be considered as limiting the scope of protection, and it is obvious for those skilled in the art that other related drawings can be obtained according to these drawings without inventive efforts.

FIG. 1 is a schematic diagram of the hybrid depth separable dilation-convolution module of the present invention.

Fig. 2 is a schematic structural diagram of the present invention.

FIG. 3 is a diagram illustrating the classification effect of the present invention.

Detailed Description

To further clarify the objects, technical solutions and advantages of the present application, the present invention will be further described with reference to the accompanying drawings and examples, and embodiments of the present invention include, but are not limited to, the following examples. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Examples

As shown in fig. 1 to 2, in the present embodiment, an image classification method based on mixed depth separable expansion convolution is provided, in which a mixed depth separable expansion convolution module is added to a constructed network model structure, and the module is composed of a plurality of depth separable expansion convolution layers with different expansion rates, so as to expand a range of feature extraction, supplement context information of features, improve an expression capability of a network model, and thereby improve classification accuracy of the model.

Specifically, the image classification model of the present embodiment includes a convolution layer, a batch normalization layer, a modified linear unit layer, 2 mixed-depth separable expansion convolution modules, 1 maximum pooling layer, 3 mixed-depth separable expansion convolution modules, 1 maximum pooling layer, a flattening layer, a random deactivation layer, and a full connection layer, which are packaged from front to back; in this embodiment, the flexible maximum loss function is used as the loss function, and the predicted value calculated by the network model is guided to gradually approach the true value by continuously optimizing the difference value calculated by the loss function during training. And setting a certain iteration number, calculating a loss value between the predicted value and the true value by forward propagation, optimizing network model parameters by backward propagation, and repeating the calculation until the loss value is converged to obtain the optimal network model. Any one mixed depth separable expansion convolution module is formed by packaging 3 depth separable expansion convolution layers which are arranged in parallel and a characteristic connecting layer, a convolution layer, a batch standardization layer and a correction linear unit layer which are connected in sequence.

In this embodiment, when the feature block is input to the module, the feature block is divided into three parts according to the number of channels, and the three parts are respectively input to the 3 depth separable expansion convolution layers, so as to expand the feature extraction range and supplement the context information of the feature map, and the calculation formula of the actual convolution kernel size of the expansion convolution layer is as follows:

K＝k+(k-1)(r-1)

where K represents the original convolution kernel size input by the depth separable inflated convolution layer, r represents the inflation rate, and K is the actual convolution kernel size of the theoretically calculated depth separable inflated convolution layer.

In this embodiment, because the dilation convolution is sampled at equal intervals according to dilation rate, local information on the feature map is lost, correlation between feature points is affected, a grid effect occurs, and expression capability of the feature map is reduced, the dilation rate of each depth separable dilation convolution layer needs to be set differently, that is, set to 1, 2, and 3 according to a zigzag structure, and then three parts of feature information are spliced into a feature block, so that an extraction range of the features is expanded, and information is mutually compensated, so that the network effect can be effectively avoided, and feature maps with different receptive fields are spliced to form a multi-scale feature, thereby reducing sensitivity to multi-scale changes of a target. Secondly, after the characteristic information is processed by the depth separable expansion convolution, a common convolution integrating characteristic with the convolution kernel size of 1 multiplied by 1 is used for establishing the relation between characteristic graphs, improving the expression capability of the characteristic blocks, increasing the number of channels and increasing the dimensionality of the characteristic blocks.

In this embodiment, a mixed depth separable dilated convolution module is built, which adjusts the original convolution kernel by adjusting the dilation rate, for example, the depth separable dilated convolution with the convolution kernel size of 3x3, when the dilation rate is 2, the convolution kernel becomes 5x 5; at a dilation rate of 3, the convolution kernel becomes 7 × 7. In this embodiment, a random gradient descent method is used to optimize the parameters of the network model with a Neisseliverv momentum set to 0.9. In addition, the activation function layer used in the entire network structure is a modified linear unit layer suppressing the maximum value to 6. The deactivation rate of the random deactivation layer was 0.5.

In this example, the convolution component parameters and the receptive field size are compared as shown in the following table:

TABLE 1 convolution partial parameter and receptive field size comparison

In order to verify that the model has good classification performance, the following tests are carried out:

experiments were performed as in this example using the public data set flower _ lights data set, with 5 types of flowers, tulips (tulips), sunflowers (sunflowers), roses (roses), dandelions (dandelion), and brookfield (daisy), respectively; its data set, although less, can also be adapted to conventional classification. In the experiment of the embodiment, the parallel convolution part in the mixed depth separable expansion convolution layer is replaced by the common depth separable convolution layer, and the influence of the two on the accuracy of the network model is compared. In the present embodiment, the accuracy pair of the data set is as follows:

TABLE 2 precision contrast on flow _ Phots dataset

The overall network structure is the same, the convolution kernels are the same in size and are all 3x3, and the accuracy of the network structure is improved by about 1.2% by using the mixed depth separable expansion convolution module compared with the accuracy of the network structure using the common depth separable convolution layer according to the table. The recognition result of this embodiment is shown in fig. 3.

The above-mentioned embodiments are only preferred embodiments of the present invention, and do not limit the scope of the present invention, but all the modifications made by the principles of the present invention and the non-inventive efforts based on the above-mentioned embodiments shall fall within the scope of the present invention.

Claims

1. The image classification model based on the mixed depth separable dilation convolution is characterized in that the construction process of the image classification model is as follows:

packaging the depth separable expansion convolution layer, the characteristic connecting layer, the convolution layer, the batch standardization layer and the correction linear unit layer into a mixed depth separable expansion convolution module from front to back; the depth separable expansion convolution layers in the mixed depth separable expansion convolution module adopt 3 and are arranged in parallel;

2. The mixed depth separable dilated convolution based image classification model of claim 1, wherein the mixed depth separable dilated convolution module is provided with 8 and is partitioned into 2, 3 and 3 blocks, and any partition is followed by a maximum pooling layer.

3. The image classification model based on the mixed depth separable dilation convolution of claim 1, wherein dilation rates of 3 depth separable dilation convolution layers from top to bottom in the mixed depth separable dilation convolution module are 1, 2, and 3 in sequence, convolution kernel sizes are all 3 × 3, and step sizes are all 1.

4. The mixed depth separable dilated convolution based image classification model of claim 3, wherein convolution layers in the mixed depth separable dilated convolution module have convolution kernel sizes of 1x1 and step sizes of 1.

5. The mixed depth separable dilated convolution based image classification model of claim 1, wherein the inactivation rate of the random inactivation layer is 0.5.

6. The mixed depth separable dilated convolution based image classification model of claim 2, wherein the sampling kernel size of any of the maximum pooling layers is 2 and the step size is 2.

7. The mixed depth separable dilated convolution based image classification model of claim 1, wherein the momentum parameter of the batch normalization layer is 0.975 and the learning rate is set to 0.1.

8. The mixed depth separable dilated convolution based image classification model of claim 1, wherein the number of iterations is 30000.

9. An image classification method, characterized in that the image classification model based on the mixed depth separable dilation convolution of any one of claims 1 to 8 is used for classification.