CN108764317B

CN108764317B - Residual convolutional neural network image classification method based on multipath feature weighting

Info

Publication number: CN108764317B
Application number: CN201810485738.XA
Authority: CN
Inventors: 刘义鹏; 李湛青; 陈朋; 蒋莉; 王海霞; 梁荣华
Original assignee: Zhejiang University of Technology ZJUT
Current assignee: Zhejiang University of Technology ZJUT
Priority date: 2018-05-21
Filing date: 2018-05-21
Publication date: 2021-11-23
Anticipated expiration: 2038-05-21
Also published as: CN108764317A

Abstract

A residual convolutional neural network image classification method based on multipath feature weighting comprises the following steps: 1) the input image of the model is a preprocessed original image, and the preprocessed image is cut into a fixed size; 2) performing convolution operation and pooling operation of larger size on the image; 3) sending the features output in the step 2) to a first multipath feature weighted residual error module; 4) continuously sending the output of the multi-path characteristic weighted residual error module in the step 3) to a next multi-path characteristic weighted residual error module, gradually reducing the size of the output characteristic image after passing through the multi-path characteristic weighted residual error modules until the size of the output characteristic image is changed into a smaller size, and finally reducing the size of the output characteristic image into a characteristic point after passing through an average pooling layer; the obtained feature points are directly sent to a classification layer for classification or classified after passing through a full connection layer. The method is applied to a complex image classification task, enriches the feature expression and avoids the problem of gradient disappearance caused by the increase of the depth of the neural network.

Description

Residual convolutional neural network image classification method based on multipath feature weighting

Technical Field

The invention relates to the field of computer vision, in particular to an image classification method, which is one of deep learning technologies and is mainly used for training a deep image classification model.

Background

In recent years, with the exponential growth of computer computing power and the emergence of new neural network architectures, deep learning techniques have begun to be highlighted in the fields of computer vision, speech recognition, natural language processing, and the like. In the field of computer vision, the performance of a computer in image segmentation and image recognition tasks is greatly improved due to the occurrence of a convolutional neural network, and the recognition accuracy is far higher than that of a traditional machine learning algorithm. At present, image recognition technology based on convolutional neural networks has been widely used.

The conventional convolutional neural network comprises a convolutional layer, a pooling layer, a full-link layer and a classification layer, and the basic structure of the convolutional neural network is a series combination of the units. Convolutional layers are used to learn low-level and high-level features in an image, and pooling layers can extract these features and continually reduce the size of the feature map. And the full connection layer and the classification layer are positioned at the tail of the whole neural network and are used for classifying the finally extracted high-level features. In a specific image classification task, the recognized objects in the image have more feature types and more complex feature structures. To cope with this situation, the conventional convolutional neural network needs to be boosted in both width and depth. First, a single convolutional layer often cannot learn all valid features, so the existing network structure solves the problem by increasing the network width, that is, increasing the number of output features of a certain layer in the network. However, the increase of the number of channels may cause the network to learn repeated or useless features, so that the model is redundant, the classification is not facilitated, and the useless calculation amount is increased. In addition, in order to make the neural network learn more complex features, the depth of the network is designed to be deeper. However, as the depth of the network increases, the problems of gradient disappearance and gradient explosion occur more easily in the network training process, so that the learning process cannot be converged. To address this problem, the research team at microsoft asian institute proposed a convolutional neural network based on a residual structure, the basic idea of which is to introduce a fast connection channel between low-level and high-level features. According to the structure, on one hand, low-level features are introduced into high-level features, the diversity of feature extraction is increased, and on the other hand, the problem of gradient disappearance in the backward propagation process is avoided due to the additionally-added forward channel.

The two schemes respectively optimize the conventional convolutional neural network in the aspects of network width and depth. But to better address the complexities of the image classification problem, convolutional neural network models need to learn more complex feature expressions. Meanwhile, the redundancy problem caused by the increase of the network width still needs to be solved.

Disclosure of Invention

In order to overcome the defect of poor performance when the existing image classification method is applied to complex images, the invention provides a residual convolutional neural network image classification method based on multipath feature weighting. The method solves the fusion of the neural network width and depth method. On the basis of increasing the width of the neural network, the multi-path features are fused in a weighting mode, so that the expression of key features is enhanced, redundant features are removed, and the problem of model redundancy caused by output of repeated and useless features is solved. Meanwhile, the network structure also integrates a residual error structure for solving the depth problem. The addition of the residual structure increases more forward channels, further enriches the expression of features and avoids the gradient problem.

The technical scheme adopted by the invention for solving the technical problems is as follows:

a residual convolutional neural network image classification method based on multipath feature weighting, comprising the following steps:

1) firstly, an input image of a model is a preprocessed original image, and the preprocessed image is cut into a fixed size;

2) performing convolution operation and pooling operation of larger size on the image;

3) sending the features output in step 2) to a first multipath feature weighted residual error module, wherein the multipath feature weighted residual error module is composed of a plurality of convolution groups and a weighted combination module: each convolution group comprises one or more convolution layers, and the configuration of the convolution layers among the convolution groups is different so as to ensure the diversity of the sizes of convolution kernels; meanwhile, when the feature images in the network pass through certain multi-path feature weighting residual error modules, the sizes of the feature images are correspondingly reduced, so that in order to ensure that the sizes of the output feature images of each convolution group are kept consistent, some convolution groups also contain pooling layers;

the outputs of all the convolution groups are sent to a weighting combination module for integration, and the specific weighting combination method is to multiply the output characteristic of each convolution group by a different parameter k initialized randomly and output the weighted convolution groups in a characteristic channel for splicing; the spliced combined features are added with the input of the multi-path feature weighted residual error module to be used as the output of the whole multi-path feature weighted residual error module;

4) and continuously sending the output of the multi-path characteristic weighted residual error module in the step 3) to a next multi-path characteristic weighted residual error module. After passing through a plurality of multipath feature weighted residual error modules, the size of the output feature image is gradually reduced until the size of the output feature image is reduced to a smaller size within 10 multiplied by 10, and finally the output feature image is reduced to feature points through an average pooling layer; the obtained feature points are directly sent to a classification layer for classification or classified after passing through a full connection layer.

Further, the method comprises the following steps:

5) in the initial stage of model training, the weighting coefficient k and the coefficient in the convolution kernel are initialized randomly in various modes, and finally, the network parameters for classification are continuously optimized in the back propagation process until an optimal value is obtained;

still further, in step 2), the convolution kernel size may be set to 7 × 7 with a step size of 2 and maximum pooling with a size of 3 × 3 with a step size of 2; and modifying the size according to the input image cut in the step 1). The goal of choosing a larger convolution kernel size is to extract the underlying features with a larger field of view and to prevent loss of detail features as much as possible. Meanwhile, the size of the characteristic image can be further reduced by selecting a larger step length in the convolution and pooling processes so as to reduce the calculation amount;

further, in the step 3), the size of the convolution kernel in the convolution group is selected in a manner that: one is to select as many combinations of convolution kernel sizes as computing resources allow to cover all possible feature types; the other scheme is that according to the image characteristics needing to be classified, more favorable convolution kernel size parameters are designed manually; a combination of the two approaches may also be used.

The invention discloses a residual convolutional neural network image classification method based on multipath feature weighting, which is mainly characterized by fusing and improving a width lifting scheme and a depth lifting scheme in a convolutional neural network. The basic structure of the system comprises a convolution layer, a pooling layer, a full-link layer and a multi-path characteristic weighting residual error module. The specific structure of the method is that a convolution layer, a pooling layer and a plurality of multipath feature weighted residual error modules are used in series, and finally a final classification result is obtained through a full connection layer and a classification layer. The main role of the convolutional layer and the pooling layer is to extract low-level features while reducing the size of the feature image. The core of the network is a multipath feature weighted residual error module, which comprises a plurality of convolution groups connected in parallel, wherein each convolution group comprises one or more convolution layers. The number of convolution layers in a convolution group should be adjusted according to the total number of neural network layers to be learned, and the size of convolution kernel should be selected to ensure that the sizes of convolution kernels in each convolution group are different from each other. The outputs of all convolution groups are sent to weighted combination processing, and the specific processing scheme is to multiply the output of each convolution block by a different learnable parameter k respectively and then splice the weighted outputs in a feature channel. By assigning a learnable weight to the output of each convolution group, the network can enhance the expression of important features and remove redundant or duplicated features, thereby avoiding model redundancy.

The invention has the following beneficial effects: the convolutional neural network can be promoted in width and depth at the same time, so that a better characteristic weight-improving effect is obtained, and the performance of an image classification task when the convolutional neural network is applied to a complex image is improved.

Drawings

Fig. 1 is a general flow chart of a residual convolutional neural network based on multipath feature weighting.

Fig. 2 is a diagram of an internal structure of a multi-path feature weighted residual error module.

Detailed Description

The present invention is further described below with reference to the flow chart.

Referring to fig. 1 and 2, a residual convolutional neural network image classification method based on multipath feature weighting includes the following steps:

1) firstly, the input image of the model is a preprocessed original image, the preprocessed image must be completely cut to a fixed size, and in order to facilitate the model training, the length and width of the fixed size are preferably kept consistent, and the specific size is determined by the specific application of the model and the size of the model. Common input image sizes are: 512. 299, 224, etc.

2) The image is subjected to convolution and pooling operations of larger size, e.g. convolution kernel size 7 x 7 with step size 2 and maximal pooling with size 3 x 3 with step size 2. The significance of choosing a larger convolution kernel size is to extract the underlying features with a larger field of view and to prevent loss of detail features as much as possible. Meanwhile, the size of the characteristic image can be further reduced by selecting a larger step size in the convolution and pooling processes.

3) Sending the features output in the step 2) to a first multipath feature weighted residual error module. The detailed structure of the multipath feature weighted residual module is shown in fig. 2: the module contains a plurality of convolution groups, each convolution group contains one or more convolution layers, and the size parameters of the convolution layers in each convolution group should be different from each other so as to ensure the diversity of feature extraction. For example: the convolution group 1 is configured as one convolution layer of 7 × 1 plus 1 × 7, the convolution group 2 is configured as a convolution layer of 3 × 3 plus 1 × 1, and the convolution group 3 is configured as a convolution of 5 × 5 plus 1 × 1. The specific size design can select more size combinations under the permission of computing resources, and can also manually design the size of the convolution kernel according to a specific classification task. In addition, the image size should be reduced as the feature image passes through some multi-way feature weighted residual modules, so some convolution groups also contain pooling layers for compressing the feature image size.

4) The outputs of all convolution groups in step 3) are sent to a weighted combination module for feature fusion. The specific implementation of the weighted combination module in fig. 2 is as follows: defining a random initialized coefficient value k for the output of each convolution group, multiplying the output of each convolution group by the respective coefficient value, and splicing the weighted features in the feature channel;

and adding the features fused in the step 4) and the input of the multi-path feature weighted residual error module in the step 3) to obtain the output of the whole multi-path feature weighted residual error module.

And continuously sending the output of the multi-path characteristic weighted residual error module in the step 3) to a next multi-path characteristic weighted residual error module. After passing through a plurality of multipath feature weighted residual error modules, the feature image size is continuously reduced to a smaller size within 10 multiplied by 10, and is reduced to feature points through an average pooling layer; the obtained feature points can be directly sent to a classification layer for classification or classified after passing through a full connection layer.

5) In the initial stage of model training, the weighting parameter k and the convolution kernel parameter can be initialized randomly in various ways. Finally the parameters for classification will be continuously optimized in the back propagation process until the optimal values are obtained.

Claims

1. A residual convolutional neural network image classification method based on multipath feature weighting is characterized by comprising the following steps:

2) performing large-size convolution operation and pooling operation on the image;

3) sending the features output in step 2) to a first multipath feature weighted residual error module, wherein the multipath feature weighted residual error module is composed of a plurality of convolution groups and a weighted combination module: each convolution group comprises one or more convolution layers, and the configuration of the convolution layers among the convolution groups is different so as to ensure the diversity of the sizes of convolution kernels; meanwhile, when the feature images in the network pass through the multipath feature weighting residual error module, the sizes of the feature images are correspondingly reduced, so that in order to ensure that the sizes of the output feature images of each convolution group are kept consistent, the convolution groups also contain a pooling layer;

4) continuously sending the output of the multi-path characteristic weighted residual error module in the step 3) to a next multi-path characteristic weighted residual error module, gradually reducing the size of the output characteristic image after passing through the multi-path characteristic weighted residual error modules until the size of the output characteristic image is changed into a small size within 10 multiplied by 10, and finally reducing the size of the output characteristic image into a characteristic point through an average pooling layer; the obtained feature points are directly sent to a classification layer for classification or classified after passing through a full connection layer.

2. The method for classifying images based on multipath feature weighting residual convolutional neural network as claimed in claim 1, wherein the method further comprises the steps of:

5) in the initial stage of model training, the weighting coefficient k and the coefficient in the convolution kernel are initialized randomly in various modes, and finally, the network parameters for classification are continuously optimized in the back propagation process until the optimal values are obtained.

3. The method for classifying the image of the residual convolutional neural network based on the multipath feature weighting as claimed in claim 1 or 2, wherein in the step 2), the convolution kernel size is set to be 7 x 7, the convolution step size is 2, and the maximum pooling is 3 x 3 and the convolution step size is 2; and modifying the size according to the input image cut in the step 1).

4. The method for classifying images of residual convolutional neural network based on multipath feature weighting as claimed in claim 1 or 2, wherein in the step 3), the convolutional kernel size in the convolutional group is selected in a manner that: one is to select a combination of convolution kernel sizes to cover all feature types, if computational resources allow; the other scheme is that convolution kernel size parameters are designed manually according to the image features needing to be classified; the two modes are combined.