CN110321967B

CN110321967B - Image classification improvement method based on convolutional neural network

Info

Publication number: CN110321967B
Application number: CN201910624323.0A
Authority: CN
Inventors: 李跃辉; 赵诚诚
Original assignee: Nanjing University of Posts and Telecommunications
Current assignee: Nanjing University of Posts and Telecommunications
Priority date: 2019-07-11
Filing date: 2019-07-11
Publication date: 2021-06-01
Anticipated expiration: 2039-07-11
Also published as: CN110321967A

Abstract

The invention discloses an improved algorithm for image classification based on a convolutional neural network, which adopts an AlexNet network model as a basic framework, firstly carries out proper preprocessing and data enhancement on an input image so as to reduce the dependence on the number of samples by the network, carries out feature extraction through a neural network convolutional layer, then reserves main features through a pooling layer, and simultaneously reduces the parameters and the calculated amount of the next layer. The image classification improved algorithm based on the convolutional neural network can reduce the dependence of a network model on the number of samples, can further reduce the number of parameters by adopting an LDA algorithm and adopting multi-scale convolution, simplifies the calculated amount and improves the accuracy of image classification.

Description

Image classification improvement method based on convolutional neural network

Technical Field

The invention belongs to the field of deep learning and image processing, and relates to application of an image classification and identification task in an improved deep neural network technology.

Background

Because the input layer of the convolutional neural network can directly process multidimensional data, the convolutional neural network has wide application in the field of computer vision. And the digitization continuously drives the development of the society, the data size is not easy to come, various mass data continuously appear, and the method is a great challenge for a neural network. In order to accelerate the learning of the neural network, various optimization algorithms for CNN are emerging continuously. At present, the convolutional neural network is mainly optimized in the depth and width of the model and the direction of data processing. In 2018, based on a convolutional neural network model proposed by LeCun et al, Gauno et al combine and improve several traditional activation functions aiming at the problems of gradient dispersion, low convergence speed and the like, combine the activation function Sigmoid and Softplus to obtain a new CNN model, and apply the CNN model to the recognition of digital handwriting. The accuracy of the convolutional neural network for recognizing the digital handwriting is improved; meanwhile, training parameters of the network are reduced after improvement, so that the neural network structure becomes simpler and the adaptability is stronger. In the same year, Wang Hua et al propose an image classification model using unsupervised learning algorithm and convolution, randomly extract image blocks with the same size from input unlabeled images to form a data set, perform preprocessing, extract dictionaries from the preprocessed image blocks by using a K-means clustering algorithm twice, extract final image features by using discrete convolution operation, and finally classify the extracted image features by using a Softmax classifier, thereby improving the image classification precision and reducing the training complexity. In summary, most of the optimization design is performed at the structure of the network model, and the speed and accuracy of the neural network model are improved by adopting different activation functions or performing various preprocessing operations on the image.

In order to solve the problem that the convolutional neural network limits the input image, further reduce the training complexity and accelerate the model convergence speed, the invention provides an image classification algorithm based on the improved convolutional neural network, so that the network does not limit the size of the input image any more, the network parameter quantity is reduced better, and higher accuracy and speed are achieved.

Disclosure of Invention

The purpose of the invention is as follows: the invention provides an image classification improvement method based on a convolutional neural network, which is used for accurately, efficiently and quickly identifying and classifying images.

The technical scheme is as follows: an image classification improvement method based on a convolutional neural network specifically comprises the following steps:

the method comprises the following steps: carrying out image enhancement, filtering and noise reduction preprocessing operations on an input image so as to reduce the influence on image feature extraction;

step two: performing convolution operation on the preprocessed image, extracting image features, performing pooling by adopting a maximum pooling method, extracting pixel points with the maximum receiving domain value, discarding other pixel points, keeping key information of the image while the size of the obtained feature map is reduced, reducing the size of a convolution kernel, performing convolution pooling operation again, and outputting a more abstract feature map;

step three: convolving the feature map obtained by the pooling convolution operation through a plurality of continuous convolution layers, fully fusing the features of different channels, and sending the feature map into a pyramid pooling layer for pooling;

step four: carrying out convolution operation on the feature map through the multi-scale convolution layer, so as to obtain a feature map with a fixed size;

step five: and further reducing the dimension and classifying the feature graph by adopting an LDA method, projecting by utilizing the LDA, bringing the projected sample feature information into a probability density function for calculation to obtain probability distribution information, and outputting a calculation result and a prediction category.

Further, in the first step, gaussian filtering is performed on the input image to suppress noise and smooth the image, and at the same time, the image is inverted and color, saturation and contrast are adjusted.

Furthermore, in the third step, four continuous convolution layers are used together to fully fuse the characteristics of different channels of the image.

Further, in the fourth step, three different scales are adopted to map the feature map, and three different convolution operations are respectively adopted to perform convolution, so that the feature map with a fixed size can be obtained finally no matter what the size of the input image is.

Further, in the fifth step, LDA is adopted to reduce the dimension of the feature matrix, and the global divergence matrix S_tIs defined as:

where m is the total number of samples, x_iAnd the ith sample vector is, mu is a mean vector of all samples, and T is a mathematical sign for solving a transpose matrix in the matrix theory.

Within-class dispersion matrix S_ωIs defined as:

wherein N is the total number of classes of the sample, X_iFor the class i sample matrix, x is the vector of each sample of class i, μ_iIs the mean vector of all samples of the ith class.

Inter-class dispersion matrix S_bIs defined as:

S_b＝S_t-S_ω

the optimization objective is thus defined as:

wherein W ∈ R^d×(N-1)And calculating a projection matrix formed by a group of optimal identification vectors for a matrix formed by N-1 characteristic vectors through an optimization target formula, projecting the N-dimensional characteristic space by the matrix, and outputting the N-1-dimensional low-dimensional characteristic space.

Further, the network adopts the overlapped maximum pooling, namely, an overlapping area exists between adjacent pooling windows, so that the richness of the characteristics can be improved, and the over-fitting phenomenon is avoided.

Further, data in the neural network is sent to an activation function for calculation, the used activation function is a modified linear unit Leaky ReLU, and the function can properly retain information of a negative axis, so that after characteristic information is calculated through the function, information of a negative interval cannot be completely lost, and the obtained information is more complete.

Compared with the prior art, the invention adopting the technical scheme has the following technical effects:

1. the invention does not limit the size of the input image, the network can input the image with any size through the multi-scale convolution layer structure in the network, the image characteristic information is greatly reserved, and the final accuracy is improved.

2. The invention further reduces the parameter quantity of the model and accelerates the operation speed of the network. The invention adopts LDA algorithm to learn the similarity, and enhances the discrimination capability of the characteristics. Meanwhile, the method has the function of reducing the dimension, and the main characteristic information of the image is not damaged, so that the network is more efficient and faster.

3. The image classification improved algorithm based on the convolutional neural network can reduce the dependence of a network model on the number of samples, can further reduce the number of parameters by adopting an LDA algorithm and adopting multi-scale convolution, simplifies the calculated amount and improves the accuracy of image classification.

Drawings

Fig. 1 is a diagram of an improved multi-scale convolutional layer structure proposed by the present invention.

Fig. 2 is a network architecture diagram of the improved image classification algorithm of the present invention.

Detailed Description

The invention is further explained below with reference to the drawings.

As shown in FIG. 1, an improved image classification method based on a convolutional neural network, the proposed improved multi-scale convolutional layer structure, maps feature maps with three different scales, 8 × 8, 6 × 6, 4 × 4, and then convolutes the feature maps with convolution kernels of three different sizes, the step sizes are S2, S1, S1, wherein the sizes of the convolution kernels are 2 × 2, 3 × 3, 1 × 1, the number of the convolution kernels is 256, and Leaky ReLU is an activation function, so that a feature map of a fixed size can be obtained finally no matter the size of an input image.

As shown in fig. 2, an image classification improvement method based on a convolutional neural network obtains an image with vivid characteristic information and less interference information by performing an image preprocessing operation on the image. And then inputting the image obtained by preprocessing operation into the neural network, performing feature extraction on the image through two layers of convolution pooling layer sets, wherein the size of a convolution kernel used is larger and is used for extracting more obvious edge feature information of the image, and then, passing the obtained feature map through four continuous layers of convolution layers with the same convolution kernel size and is used for extracting more feature information of different channels of the image, and simultaneously, fully fusing multi-channel information to obtain a more abstract and more representative feature map. In the process, the set convolution kernel size and the proper step size are adopted, so that the image size is not changed in a series of convolution processes. After the pyramid pooling layer in the scheme of the invention, all input images with different sizes are converted into feature maps with fixed sizes, and on the basis, an LDA algorithm is adopted for calculation and classification to obtain the final predicted classification label of the input images. The specific implementation process is as follows:

the method comprises the following steps: carrying out a series of preprocessing operations such as image enhancement, filtering and noise reduction on an input image so as to reduce the influence on image feature extraction;

step two: performing convolution operation on the image obtained after the preprocessing, extracting image characteristics, performing pooling by adopting a maximum pooling method to obtain a characteristic graph, reducing the size of a convolution kernel, performing convolution pooling operation again to obtain a more abstract characteristic graph, and performing next-step characteristic fusion;

step three: convolving the characteristic diagram obtained in the second step through a plurality of continuous convolution layers, fully fusing the characteristics of different channels to ensure that the finally obtained characteristic diagram is more abstract and representative, and sending the characteristic diagram into a pyramid pooling layer for pooling;

step four: after mapping the characteristic graph according to different scales, performing convolution operation by adopting convolution kernels with different sizes respectively, thus obtaining the characteristic graph with fixed size;

step five: and further reducing the dimension and classifying the feature map by adopting an LDA method, projecting by utilizing the LDA, bringing the projected sample feature information into a probability density function, and calculating to obtain the probability of the projected sample feature information belonging to a certain class, wherein the maximum probability is the prediction class of the image.

The multi-scale convolutional layer in the network structure has a plurality of layers, and the specific structure is as follows:

three different scales are adopted to map the characteristic diagram, which are respectively 8 multiplied by 8, 6 multiplied by 6 and 4 multiplied by 4. And performing convolution operation on the corresponding mapping characteristic graphs by adopting three different convolution kernels respectively, wherein the step sizes are S2, S1 and S1 respectively, the sizes of the convolution kernels are 2 x 2, 3 x 3 and 1 x 1 respectively, the numbers of the convolution kernels are 256, 256 and 256 respectively, and Leaky ReLU is an activation function. With this configuration, a feature map of a fixed size can be obtained regardless of the size of the input image, and the size of the input image is not limited.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims

1. An image classification improvement method based on a convolutional neural network is characterized by comprising the following steps: the method specifically comprises the following steps:

step three: convolving the feature map output in the step two and obtained by the pooling convolution operation through a plurality of continuous convolution layers, fully fusing the features of different channels, and sending the feature map into a pyramid pooling layer for pooling;

2. The convolutional neural network-based image classification improving method according to claim 1, wherein: in the first step, Gaussian filtering is performed on an input image to suppress noise and smooth the image, and the image is simultaneously turned over to adjust color, saturation and contrast.

3. The convolutional neural network-based image classification improving method according to claim 1, wherein: in the third step, four continuous convolution layers are used together to fully fuse the characteristics of different channels of the image.

4. The convolutional neural network-based image classification improving method according to claim 1, wherein: in the fourth step, three different scales are adopted to map the characteristic diagram, and three different convolution operations are respectively adopted to perform convolution, so that the characteristic diagram with fixed size can be finally obtained no matter what the size of the input image is.

5. The convolutional neural network-based image classification improving method according to claim 1, wherein: in the fifth step, LDA is adopted to reduce the dimension of the characteristic matrix, and the global divergence matrix S_tIs defined as:

where m is the total number of samples, x_iThe vector is the ith sample vector, mu is the mean vector of all samples, and T is the mathematical sign of the transpose matrix in the matrix theory;

within-class dispersion matrix S_ωIs defined as:

wherein N is the total number of classes of the sample, X_iFor the class i sample matrix, x is the vector of each sample of class i, μ_iThe mean vector of all samples in the ith class is obtained;

degree of inter-class dispersionMatrix S_bIs defined as:

S_b＝S_t-S_ω

the optimization objective is thus defined as:

wherein W ∈ R^d×(N-1)W is a feature matrix composed of N-1 feature vectors, a projection matrix composed of a group of optimal identification vectors is obtained through calculation by optimizing a target formula, the matrix projects the N-dimensional feature space, and the N-1-dimensional low-dimensional feature space is output.

6. The convolutional neural network-based image classification improving method according to claim 1, wherein: the network adopts overlapped maximum pooling, namely, an overlapping area exists between adjacent pooling windows, so that the richness of characteristics is improved, and the over-fitting phenomenon is avoided.

7. The convolutional neural network-based image classification improving method according to claim 1, wherein: the data in the neural network is sent to an activation function for calculation, and the used activation function is a modified linear unit Leaky ReLU.