CN106991440B

CN106991440B - Image classification method of convolutional neural network based on spatial pyramid

Info

Publication number: CN106991440B
Application number: CN201710198700.XA
Authority: CN
Inventors: 王改华; 吕朦; 李涛; 袁国亮
Original assignee: Hubei University of Technology
Current assignee: Hubei University of Technology
Priority date: 2017-03-29
Filing date: 2017-03-29
Publication date: 2019-12-24
Anticipated expiration: 2037-03-29
Also published as: CN106991440A

Abstract

The invention discloses an image classification algorithm of a convolutional neural network based on a spatial pyramid, which is characterized in that global features are extracted by using the spatial pyramid, and then local features are acquired by each pyramid horizontal picture in a grid mode to form the overall features of the spatial pyramid. A new convolutional neural network model is constructed, the first half part of the model is a traditional convolutional network, and 3 convolutional layers and 2 pooling layers are provided; then, the 3 convolutional layers are uniformly pooled in a gridding mode to obtain respective characteristic maps. The feature maps of each layer are connected into a feature vector according to columns, and then the 3 feature vectors are sequentially connected into a total feature vector. The total feature vector covers the features of the classical convolutional layer, and the features of the former convolutional layer are added, so that the loss of important features is avoided, and meanwhile, the weight of each convolutional layer feature map is adjusted by the size of the grid, so that the identification efficiency of the network is improved.

Description

Image classification method of convolutional neural network based on spatial pyramid

Technical Field

The invention belongs to the technical field of image processing and pattern recognition, and particularly relates to an image recognition method of a deep convolutional neural network based on a spatial pyramid.

Background

The spatial pyramid first extracts global features of the original image, then divides the image into fine mesh sequences at each pyramid level, extracts features from each mesh at each pyramid level, and connects them into a large feature vector.

Convolutional neural networks have been widely used in recent years with unusual achievements in image processing. Subsequently, more researchers have modified the classical network. In order to obtain a better image recognition result, the patent uses the thinking of a space pyramid as a reference, provides a new deep convolution neural network, and obtains a better recognition effect compared with the traditional method.

Disclosure of Invention

The invention aims to provide an image classification mode of a deep convolutional neural network based on a space pyramid mode, and the image mode recognition capability is improved.

The technical scheme adopted by the invention is as follows: an image classification method based on a convolutional neural network of a spatial pyramid is characterized by comprising the following steps:

step 1: forward propagation, the specific implementation includes the following substeps:

step 1.1: establishing a convolution neural network with M convolution layers and M-1 pooling layers in the first half part;

step 1.2: pooling the M convolutional layers respectively to obtain M types of features, connecting the M types of features into a large feature vector respectively, and finally connecting the large feature vector into a total feature vector serving as a final feature of the image;

step 1.3: performing primary full connection and softmax classification on the final feature vector to obtain a convolutional neural network;

step 1.4: initializing all weights of the whole convolutional neural network through an empirical formula, inputting a training picture x into the initialized convolutional neural network, and propagating according to a forward propagation formula;

step 2: and (4) reverse regulation.

The invention has the beneficial effects that: a new convolutional neural network algorithm structure is provided, and the recognition efficiency is improved.

Drawings

FIG. 1: a method schematic of an embodiment of the invention.

Detailed Description

In order to facilitate the understanding and implementation of the present invention for those of ordinary skill in the art, the present invention is further described in detail with reference to the accompanying drawings and examples, it is to be understood that the embodiments described herein are merely illustrative and explanatory of the present invention and are not restrictive thereof.

Referring to fig. 1, the image classification method of the convolutional neural network based on the spatial pyramid provided by the present invention includes the following steps:

step 1.1: establishing a first half convolution neural network with 3 convolution layers and 2 pooling layers;

step 1.2: pooling the 3 convolutional layers respectively to obtain 3 types of features, then connecting the 3 types of features into a large feature vector respectively, and finally connecting the large feature vector into a total feature vector serving as a final feature of the image;

after the picture is input, obtaining a feature map of a first convolution layer through convolution kernel and hidden layer bias, wherein the convolution feature map x of the first layer¹The formula is as follows;

wherein:a jth feature map of the 1 st convolutional layer,representing the input picture after preprocessing x⁰The ith picture of (1), n₀Denotes x⁰The number of pictures;the jth two-dimensional convolution kernel representing layer 1,representing the bias of the jth characteristic diagram of the 1 st hidden layer; delta represents a sigmiod function, and mp represents the obtained characteristic diagram;

n1 is the number of convolution kernels in the first layer and is also the number of the 1 st convolution feature map;

the obtained feature map of the convolution layer is sampled by 2 x 2 uniform pooling to obtain a feature map v with half of the original rows and columns¹；

v¹＝mean-pooling{x¹}；

Wherein mean-pooling means uniform pooling

The characteristic diagram of each convolution layer can be obtained by the following formula;

the characteristic diagram of each pooling layer can be obtained by the following formula;

v^l＝mean-pooling{x^l}；

a total of 3 convolutional layers, i.e. x¹,x²,x³Then, by using the method of spatial pyramid, the feature extraction is performed by performing a grid drawing on the feature maps of the 3 convolutional layers, in this embodiment, the 1 st convolutional layer is drawn into a 4 × 4 grid, then each grid is subjected to uniform pooling to extract a feature, and finally the 1 st convolutional layer becomes a 4 × 4 feature map p after feature extraction¹；

Dividing the 1 st convolutional layer into 4 x 4 grids, extracting a feature from each grid through uniform pooling, and finally obtaining a 4 x 4 feature map p after the 1 st convolutional layer is subjected to feature extraction¹；

p¹＝mean-pooling(v¹)；

Obtaining a class-3 feature graph p according to the following formula¹,p²,p³；

p^l＝mean-pooling(v^l)；

Wherein the pooling window size and step size change with a change in input picture size; p is a radical of¹,p²,p³The sizes of the two groups of the Chinese character are respectively 4 × 4, 2 × 2 and 1 × 1 which are preset; then will beGrouped column by column into a column vector of size 16, p¹To form a column vector of 16 x 6-96, and p can be combined in the same way²Aggregating into column vectors of size 2 x 16-64, p³The column vectors are aggregated into column vectors with the size of 1 × 120 ═ 120, and finally the column vectors are aggregated into a column vector with the total size of 280p is characteristic of the input picture.

initializing all weights of the whole convolutional neural network through an empirical formula, and initializing weights w between a random generation input unit and a hidden layer unit according to the empirical formula_kjAnd bias of hidden layer unit b_j，

Setting the initial value of b to be 0;

wherein w represents a weight, l represents the ith layer of the convolutional network, j represents the jth neuron of the ith convolutional layer of the convolutional neural network, k represents the kth layer of a fully-connected layer, layerinput represents the number of input neurons of the layer, and layeroutput represents the number of output neurons of the layer; k is a radical of^lIs the size of the ith convolutional kernel, which can be initialized to a weight between-1 and 1.

Each input picture is represented as x, and the image input into the convolutional neural network is represented as x⁰(ii) a When the input picture is a gray picture, x⁰X; when the input picture is a color picture, graying by the following formula, x⁰＝rgb2gray(x)。

Inputting a training image x and a label thereof, and calculating an output value of each layer by using the following forward conduction formula;

h_w,b(x)＝f(w^Tx+b)；

wherein h is_(w,b)(x) Representing the output value of the neuron, w^TThe transpose of the weight values is represented,b denotes a bias and f denotes an activation function.

Step 2: reverse regulation; the specific implementation comprises the following substeps:

step 2.1: calculating the last layer deviation according to the label value and the last layer output value calculated by utilizing a forward conduction formula through the following formula;

wherein, J_lIs a function of the loss of the l layers,for output layer neurons output values, h_w,b(x⁽ⁱ⁾) Is the output value of the ith picture, y⁽ⁱ⁾A label representing the ith input picture.

Step 2.2: and calculating the deviation of each layer according to the deviation of the last layer so as to obtain the gradient direction, and updating the weight according to the following formula:

this embodiment adjusts convolutional layer x in the reverse adjustment process¹,x²There are gradients coming from two directions, and the algorithm adjusts by adding the gradients in the two directions.

In the embodiment, a certain number of pictures are input into a trained convolutional neural network, a classification result is obtained through forward propagation, and the classification result is compared with a label carried by the user, and the same result is correct. Thereby obtaining the accuracy of the network algorithm.

It should be understood that parts of the specification not set forth in detail are well within the prior art.

It should be understood that the above description of the preferred embodiments is given for clarity and not for any purpose of limitation, and that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. An image classification method based on a convolutional neural network of a spatial pyramid is characterized by comprising the following steps:

in step 1.2, if the convolutional neural network with 3 convolutional layers and 2 pooling layers is established in step 1.1, pooling the 3 convolutional layers respectively to obtain 3 types of characteristics; the specific implementation process of the step 1.2 is as follows:

v¹＝mean-pooling{x¹}；

Wherein mean-pooling means uniform pooling

v^l＝mean-pooling{x^l}；

a total of 3 convolutional layers, i.e. x¹,x²,x³Then, carrying out feature extraction by means of drawing grids on the feature graphs of the 3 convolutional layers;

p¹＝mean-pooling(v¹)；

p^l＝mean-pooling(v^l)；

Wherein the pooling window size and step size change with a change in input picture size; p is a radical of¹,p²,p³The sizes of the two groups of the Chinese character are respectively 4 × 4, 2 × 2 and 1 × 1 which are preset; then will beGrouped column by column into a column vector of size 16, p¹To form a column vector of 16 x 6-96, and p can be combined in the same way²Aggregating into column vectors of size 2 x 16-64, p³Aggregating the column vectors into column vectors with the size of 1 × 120 ═ 120, and finally aggregating the column vectors into a column vector p with the total size of 280 in sequence to serve as the characteristics of the input picture;

step 2: and (4) reverse regulation.

2. The method for classifying images based on the convolutional neural network of claim 1, wherein the step 1.4 initializes all weights of the whole convolutional neural network by an empirical formula, and initializes the weights w between the random generation input unit and the hidden layer unit according to the empirical formula_kjAnd bias of hidden layer unit b_j，

Setting the initial value of b to be 0;

3. The method of image classification based on the convolutional neural network of spatial pyramid as claimed in claim 1, wherein: step 1.4, inputting a training image x and a label thereof, and calculating an output value of each layer by using the following forward conduction formula;

h_w,b(x)＝f(w^Tx+b)；

wherein h is_(w,b)(x) Representing the output value of the neuron, w^TRepresenting the transpose of the weights, b representing the bias, and f representing the activation function.

4. The method for image classification based on the convolutional neural network of the spatial pyramid as claimed in any one of claims 1 to 3, wherein: in step 1.4, each input picture is represented as x, and the image input into the convolutional neural network is represented as x⁰(ii) a When the input picture is a gray picture, x⁰X; when the input picture is a color picture, graying by the following formula, x⁰＝rgb2gray(x)。

5. The method for classifying the image based on the convolutional neural network of the spatial pyramid as claimed in claim 3, wherein the step 2 is implemented by the following steps:

wherein, J_lIs a loss of one layerThe function of the loss is a function of the loss,for output layer neurons output values, h_w,b(x⁽ⁱ⁾) Is the output value of the ith picture, y⁽ⁱ⁾A label representing the ith input picture;

6. the method of image classification based on the convolutional neural network of spatial pyramid as claimed in claim 5, wherein: adjusting convolutional layer x in reverse adjustment process¹,x²There are gradients coming from two directions, and the algorithm adjusts by adding the gradients in the two directions.