CN112115973A

CN112115973A - Convolutional neural network based image identification method

Info

Publication number: CN112115973A
Application number: CN202010829114.2A
Authority: CN
Inventors: 刘航; 白仞祥; 张玉红; 菅秀凯; 刘鸣泰
Original assignee: Jilin Jianzhu University
Current assignee: Jilin Jianzhu University
Priority date: 2020-08-18
Filing date: 2020-08-18
Publication date: 2020-12-22
Anticipated expiration: 2040-08-18
Also published as: CN112115973B

Abstract

The invention belongs to the technology of deep learning and image recognition, and particularly relates to an image recognition method based on a convolutional neural network. The method comprises the following steps: performing model training on the original picture by adopting a convolutional neural network; and inputting the picture to be processed into the trained model, and identifying the picture. According to the method, the training of the neural network is accelerated by adopting a GPU mode in the training process, Dropout regularization is added into a training model to optimize the system so as to prevent the overfitting phenomenon in the training process, and meanwhile, the data set photos are subjected to image set expansion.

Description

Convolutional neural network based image identification method

Technical Field

The invention belongs to the technology of deep learning and image recognition, and particularly relates to an image recognition method based on a convolutional neural network.

Background

Since Rumelhart and others developed learning algorithms at the same time in 1985, the trend of exploring and researching neural networks has been raised worldwide, the development of artificial neural networks has penetrated into the field of research, and particularly, the application of image classification technology for pattern recognition is gradually increased, and characters recognition technology, license plate recognition technology, face recognition technology, various paper money recognition technology, seal recognition technology, recognition of some military targets and the like are researched more at home and abroad. When the artificial neural network completes the task of image recognition, the following problems are mainly caused:

(1) the number of parameters is too large, in CIFAR-10 (one game dataset) the image is only of size 32x32x3(32 wide, 32 high, 3 color channels), so a single fully connected neuron in the first hidden layer of a normal neural network will have a weight of 32x32x 3-3072. This number is still controllable, but it is clear that this fully connected structure does not extend to larger images. For example, an image of a more appreciable size, such as a 200x200x3 image, would result in 120,000 weighted neurons. Furthermore, we almost certainly have several such neurons, so the parameters increase. Obviously, such a full connection is wasteful and the large number of parameters can quickly lead to over-mating.

(2) No position information between pixels is utilized. For image recognition tasks, the association of each pixel with its surrounding pixels is relatively close, and the association of pixels that are far apart may be small. If a neuron is connected to all neurons in the previous layer, it is equivalent to treating all pixels of the image equally for a pixel, which does not conform to the previous assumption. After we complete the learning of each connection weight, we may eventually find that there are a large number of weights, all of which have small values. In an effort to learn a large number of non-trivial weights, such learning would necessarily be very inefficient.

(3) And limiting the network layer number. The more the number of network layers, the stronger the expression ability, but training a deep artificial neural network by a gradient descent method is difficult because the gradient of a fully-connected neural network is difficult to transfer beyond 3 layers. Therefore, it is impossible to obtain a deep fully-connected neural network, which limits its capabilities.

Disclosure of Invention

In order to solve the problems in the prior art, the technical problem to be solved by the invention is to provide an image identification method based on a convolutional neural network.

The present invention is achieved in such a way that,

a convolutional neural network based image recognition method, comprising:

step 1, performing model training on an original picture by adopting a convolutional neural network;

and 2, inputting the picture to be processed into the trained model, and identifying the picture.

Further: the step 2 of performing model training by using the convolutional neural network comprises the following steps: preliminarily extracting image characteristics through the convolution layer; extracting main features through a down-sampling layer;

summarizing the characteristics of all parts through a full connecting layer; generating a classifier for prediction and identification;

the method specifically comprises the following steps:

step 11: initializing a weight value of the convolutional neural network;

step 12: carrying out forward propagation on input picture data through a convolution layer, a down-sampling layer and a full-connection layer to obtain an output value;

the characteristics of each layer output are as follows:

wherein, y^(l)Is the output of the convolutional layer, f (x) is the nonlinear activation function, m is the feature map set input to the layer,

is the weight of the layer of convolution kernel,

is a convolution operation that is performed by a convolution operation,

is a feature vector of the convolutional layer input, b^lIs an offset;

step 13: solving the error between the output value of the convolutional neural network and the target value; when the result output by the convolutional neural network does not accord with the expected value, performing a back propagation process; calculating the error between the result and the expected value, returning the error layer by layer, calculating the error of each layer, and updating the weight; adjusting the network weight through training samples and expected values;

determining parameters inside the model by forward propagating the prediction of the samples and the output of the expected value of the convolutional neural network; defining an objective function of the convolutional neural network:

where L (x) is a loss function, m is the number of samples,

y is the sample output for the desired output. Calculating the partial derivative of the parameters w and b of each layer in the neural network by using a gradient descent method to obtain updated parameter values of the convolutional neural network, so that the actual convolutional neural network output is closer to an expected value;

step 14: when the error is larger than the expected value, the error is transmitted back to the convolutional neural network, and the errors of the full connection layer, the down sampling layer and the convolutional layer are sequentially obtained; when the error is equal to or less than the expected value, finishing the training;

step 15: judging whether the weight is optimal according to the obtained error, and if not, updating the weight;

step 16: and judging whether the epoch times are finished or not, if so, quitting the model training, and otherwise, carrying out the next training.

And step 17: and finishing the training of the training model.

Further: the updating in step 15 includes convolution layer updating and full connection layer updating:

and returning the error layer by using a back propagation algorithm, and updating the weight of each layer by using a gradient descent method.

Further: in the step 13, the process is carried out,

the forward propagation process of the convolution layer is to perform convolution operation on input data through convolution kernel, the convolution kernel convolves the whole input picture by adopting a convolution mode with step length of 1 to form a local receptive field, then the local receptive field performs convolution algorithm, the weighted sum is performed through a weight matrix and a characteristic value of the picture, and then the output is obtained through an activation function;

the forward propagation process of the down-sampling layer is that the features extracted from the convolution layer of the upper layer are used as input and transmitted to the down-sampling layer, the dimensionality of data is reduced through the pooling operation of the down-sampling layer, and the maximum value in the feature map is selected by adopting a maximum pooling method;

the forward propagation process of the full-connection layer is that after the feature map enters the overwinding layer and the feature extraction of the down-sampling layer, the extracted features are transmitted to the full-connection layer, and classification is carried out through the full-connection layer to obtain a classification model and obtain the final result; in the fully-connected layer, the number of parameters is equal to the number of nodes in the fully-connected layer multiplied by the number of input features plus the number of nodes, and after an output matrix is obtained, the output matrix is activated by an excitation function and transmitted to the next layer.

Further: in the step 2, the step of the method is carried out,

step 21: loading the trained optimal weight value stored in the specific file by the training model in the step 1 in an image recognition system;

step 22: obtaining the optimal weight of each layer of convolution kernel in the training model by a weight sharing method, and loading the trained convolution kernel weight into an image recognition system;

step 23: the output of the full connection layer of the last layer of the convolutional neural network in the training model divides the training data set into correct and wrong types through a softmax classifier, and the image labels classified by the training model are loaded in an image recognition system;

step 24: carrying out normalization preprocessing on a picture to be recognized;

step 25: identifying using a convolutional neural network based identification system; and outputs the recognition result.

Compared with the prior art, the invention has the beneficial effects that:

in the method, the training of the neural network is accelerated by adopting a GPU mode in the training process, Dropout regularization is added into a training model to optimize the system so as to prevent the overfitting phenomenon in the training process, and meanwhile, the data set photos are subjected to atlas expansion, such as: rotation, scaling, turning and the like, and the model has no overfitting phenomenon to the extended data set in the training process. It can be known from the loss function graph fig. 8 that, when the training model is trained to the later stage, the loss function also keeps steadily decreasing as the model learning rate gradually decreases, and when the training model of the convolutional neural network reaches 25 iterations, the curve of the loss function starts to gradually trend towards stability. As can be seen from the accuracy graph of model training fig. 9, in the beginning of several times, the accuracy of the training model is low, which is because the model parameters are not optimized due to the small number of model training iterations, but in the process of gradually increasing the number of model training iterations, the recognition rate of the model data set is gradually increased, and when the number of iterations of the convolutional neural network training model reaches 25 times, the accuracy graph of the model gradually tends to be stable. By combining the two graphs, the optimal iteration number of the model is reached when the model is iterated for 25 times. By adopting a training model designed based on a convolutional neural network, the accuracy rate can reach 96%.

Drawings

FIG. 1 is a diagram of an embodiment of the present invention for use as a correct pattern;

FIG. 2 is an image used as an error in an embodiment of the present invention;

FIG. 3 is a first layer convolution structure according to an embodiment of the present invention;

FIG. 4 is a second layer convolution structure according to an embodiment of the present invention;

FIG. 5 is a third layer convolution structure according to an embodiment of the present invention;

FIG. 6 is a fourth layer convolution structure according to an embodiment of the present invention;

FIG. 7 is a fifth layer convolution structure according to an embodiment of the present invention;

FIG. 8 is a loss function droop curve according to an embodiment of the present invention;

FIG. 9 shows the recognition accuracy of the training model according to the embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

An image identification method based on a convolution neural network is characterized by comprising the following steps:

The step 2 of performing model training by using the convolutional neural network comprises the following steps: preliminarily extracting image characteristics through the convolution layer; extracting main features through a down-sampling layer; summarizing the characteristics of all parts through a full connecting layer; generating a classifier for prediction and identification;

the method specifically comprises the following steps:

step 11: initializing a weight value of the convolutional neural network;

the characteristics of each layer output are as follows:

is the weight of the layer of convolution kernel,

is a convolution operation that is performed by a convolution operation,

is a feature vector of the convolutional layer input, b^lIs an offset;

where L (x) is a loss function, m is the number of samples,

And step 17: and finishing the training of the training model.

The updating in step 15 includes convolution layer updating and full connection layer updating:

In the step 13, the forward propagation process of the convolution layer is to perform convolution operation on input data through convolution kernel, the convolution kernel convolves the whole input picture by adopting a convolution mode with step length of 1 to form a local receptive field, then perform convolution algorithm on the local receptive field, perform weighted sum on a weight matrix and a characteristic value of the picture, and then obtain output through an activation function;

In step 2, step 21: loading the trained optimal weight value stored in the specific file by the training model in the step 1 in an image recognition system;

The operation of the convolutional layer is an important component of the convolutional neural network, and the operation of the convolutional layer is mainly used for extracting and abstracting image characteristics. The core of the convolutional layer is convolution operation, and in the convolution operation, an image should be converted into a matrix first and then operated. Assume that there is an image with a size of 6 x 6, and each pixel has information of the image stored therein. A convolution kernel (equivalent to a weight) is defined to extract certain features from the image. And multiplying the convolution kernel by the corresponding bit of the digital matrix and adding to obtain the output result of the convolution layer.

The value of the convolution kernel can be randomly generated by a function without the experience of the past learning, and then is trained and adjusted step by step.

When all the pixels are covered at least once, the output of a convolution layer can be generated (the convolution step length is 1).

The machine does not know at first which features the part to be identified has, and compares the output values obtained by interacting with different convolution kernels to determine which convolution kernel best represents the feature of the picture, for example, to identify a feature (such as a curve) in the image, that is, the convolution kernel has a high output value for the curve and a low output value for other shapes (such as a triangle). The higher the convolution layer output value, the higher the matching degree, and the more the characteristics of the picture can be expressed.

The down-sampling layer is also called as a pooling layer, and the working process is as follows:

the pooling layer mainly has the effects of reducing the number of parameters, improving the calculation speed, enhancing the robustness of the extracted features and preventing the over-fitting phenomenon from happening, and is generally placed behind the convolution layer, so that the size of the model is reduced and the feature dimension is reduced.

The most common two forms of pooling layer:

maximum pooling: max-pooling-the largest number in a given area is chosen to represent the entire area.

And (3) mean value pooling: mean-posing-choosing the average of the values in a given area to represent the whole area.

The task of the convolutional layer and the pooling layer is to extract features and reduce parameters brought by the original image. However, to generate the final output, a fully connected layer needs to be applied to generate one classifier equal to the number of classes required.

The working principle of the fully-connected layer is similar to that of the previous neural network learning, the tensor output by the pooling layer needs to be cut into vectors again, the vectors are multiplied by the weight matrix, the bias value is added, then the ReLU activation function is used for the tensor, and the parameters are optimized by the gradient descent method.

Example (b):

the training model in this embodiment has 40 epochs to update the learning rate, a larger learning rate is set at the beginning of training, the learning rate is gradually reduced along with the reduction of the total error of the system in the learning process, the optimal weight is saved every time the epoch training is completed, so that the later-stage neural network model is deployed, the training system is optimized by using an SGD (sparse dimension) random gradient descent method in the training process, and the convergence of the model is accelerated by using minipatch training. After the 40 epoch training is finished, the optimal weight in the training is saved, and the saved optimal weight is directly called in the model prediction to initialize the model prediction parameters so as to start the prediction of the picture.

Before training begins, loading pictures to be trained, preprocessing a training set, wherein the pictures include picture normalization, picture channels are uniform, and the like, then building and training a model, namely forward propagation and backward propagation are started, the backward propagation adopts a random gradient descent method for optimization, judging whether the result is better once the optimization is completed once, if so, updating related weight, otherwise, judging whether all epoch training is completed, if not, returning to the training model for continuous training, otherwise, finishing the training of the whole model.

In the neural network model prediction, trained model parameters are loaded, label values of image classification are loaded so as to output a subsequent prediction result of the model, then the image to be classified is transmitted to a user side, the image to be recognized is displayed and preprocessed after the system obtains the image to be recognized, related parameters are unified, the loaded neural network is used for prediction, and finally the recognition result of the current image is output, so that the whole image recognition process is completed.

The data set is divided into 2 types, and comprises 70 training sets of training model optimization model parameters and 10 test sets of test model recognition conditions. Selecting two patterns, wherein a red, green and blue tristimulus is used as a correct pattern, as shown in FIG. 1; non-rgb-blue tristimulus patterns are used as the error patterns, as shown in fig. 2. And respectively taking 40 photos at different angles, taking 35 photos taken in each pattern as a training set for optimizing network parameters, and taking the remaining 5 photos of each pattern as a verification set. The process is as follows: (1) and preprocessing and normalizing the input picture matrix, and sending the pictures with the size of 128x128 into a network.

(2) The first layer of convolution structure uses 96 20 × 20 convolution kernels, the convolution step is 2, the padding operation is valid, and the output signature is 55 × 96. After normalization and PReLU activation, the maximal pooling operation is performed with a local sensing area of 3 × 3, a pooling step of 2, padding operation of valid, and an output signature of 27 × 96. As shown in fig. 3.

(3) The second layer convolution structure takes 27 × 96 characteristic diagram as input, 256 convolution kernels of 5 × 5 are used, the convolution step is 1, the padding operation is same, and the output characteristic diagram is 27 × 256. After normalization and PReLU activation, the maximal pooling operation is performed with local sensing area of 5 × 5, pooling step of 2, padding operation of valid, and output signature of 13 × 256. As shown in fig. 4.

(4) The third layer of convolution structure takes 13 × 256 characteristic diagram as input, 384 convolution kernels of 3 × 3 are used, the convolution step is 1, the padding operation is same, the layer only performs normalization and PReLU activation processing without pooling, and the output characteristic diagram is 13 × 384. As shown in fig. 5.

(5) With 13 × 384 signature as input, 384 convolution kernels of 3 × 3, convolution step 1, padding operation same as same, this layer only normalizes and the PReLU activation process does not pool, and output signature is 13 × 384. As shown in fig. 6.

(6) The fifth layer convolution structure takes 13 × 384 signature as input, 256 convolution kernels of 3 × 3 are used, the convolution step is 1, the padding operation is same, and the output signature is 13 × 256. After normalization and PReLU activation, the maximal pooling operation is performed with a local sensing area of 3 × 3, a pooling step of 2, padding operation of valid, and an output signature of 6 × 256. As shown in fig. 7.

(7) And in the structure of the first fully-connected layer, the characteristic diagram output by the fifth convolutional layer is compressed into a one-dimensional characteristic diagram through the fully-connected layer, the output parameter is 4096, the parameter of the Dropout layer is 0.2, so that the occurrence of overfitting is prevented, and the output characteristic diagram is 4096 x 1.

(8) The structure of the second layer fully-connected layer takes the output characteristic diagram of the first layer fully-connected layer as input, the output parameter is 4096, and the Dropout layer parameter is 0.25. The output characteristic of this layer is therefore 4096 x 1.

(9) And the third layer of fully-connected layer structure takes 4096 × 1 characteristic diagram as input, the output parameter of the layer is 2, and the output characteristic diagram is 2 × 1.

(10) And finally, inputting the 2x 1 feature map output by the third fully-connected layer as a softmax classifier, and outputting 2 classes of classified data through the classifier.

The convolutional neural network in the experiment is explained above, and the specific procedure is as follows:

(1) convolution of convolutional neural networks and pooling layer procedures.

x＝Conv2D(96,(20,20),strides＝(2,2),padding＝'valid')(input_dim)

x＝bn_relu(x)

x＝MaxPooling2D(pool_size＝(3,3),strides＝(2,2),padding＝'valid')(x)

x＝Conv2D(256,(5,5),strides＝(1,1),padding＝'same')(x)

x＝bn_relu(x)

x＝MaxPooling2D(pool_size＝(3,3),strides＝(2,2),padding＝'valid')(x)

x＝Conv2D(384,(3,3),strides＝(1,1),padding＝'same')(x)

x＝PReLU()(x)

x＝Conv2D(384,(3,3),strides＝(1,1),padding＝'same')(x)

x＝PReLU()(x)

x＝Conv2D(256,(3,3),strides＝(1,1),padding＝'same')(x)

x＝PReLU()(x)

x＝MaxPooling2D(pool_size＝(3,3),strides＝(2,2),padding＝'valid')(x)

(2) A fully connected layer procedure for convolutional neural networks.

x＝Flatten()(x)

fc1＝Dense(4096)(x)

dr1＝Dropout(0.2)(fc1)

fc2＝Dense(4096)(dr1)

dr2＝Dropout(0.25)(fc2)

fc3＝Dense(out_dims)(dr2)

The iteration number of model training of the training model in the training process of the embodiment is maximum 40 epochs, the selected block size is 128, the training of the neural network is accelerated by adopting a GPU mode, Dropout regularization is added into the training model to optimize the system so as to prevent an overfitting phenomenon from occurring in the training process, and meanwhile, the atlas expansion is performed on a data set photo, for example: rotation, scaling, turning and the like, and the model has no overfitting phenomenon to the extended data set in the training process. It can be known from the loss function graph fig. 8 that, when the training model is trained to the later stage, the loss function also keeps steadily decreasing as the model learning rate gradually decreases, and when the training model of the convolutional neural network reaches 25 iterations, the curve of the loss function starts to gradually trend towards stability. As can be seen from the accuracy graph of model training fig. 9, in the beginning of several times, the accuracy of the training model is low, which is because the model parameters are not optimized due to the small number of model training iterations, but in the process of gradually increasing the number of model training iterations, the recognition rate of the model data set is gradually increased, and when the number of iterations of the convolutional neural network training model reaches 25 times, the accuracy graph of the model gradually tends to be stable. By combining the two graphs, the optimal iteration number of the model is reached when the model is iterated for 25 times. By adopting a training model designed based on a convolutional neural network, the accuracy rate can reach 96%.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims

1. An image identification method based on a convolution neural network is characterized by comprising the following steps:

2. The method of claim 1, wherein: the step 2 of performing model training by using the convolutional neural network comprises the following steps: preliminarily extracting image characteristics through the convolution layer; extracting main features through a down-sampling layer; summarizing the characteristics of all parts through a full connecting layer; generating a classifier for prediction and identification;

the method specifically comprises the following steps:

step 11: initializing a weight value of the convolutional neural network;

step 12: carrying out forward propagation on input picture data through a convolution layer, a down-sampling layer and a full-connection layer to obtain an output value; the characteristics of each layer output are as follows:

is the weight of the layer of convolution kernel,

is a convolution operation that is performed by a convolution operation,

is a feature vector of the convolutional layer input, b^lIs an offset;

where L (x) is a loss function, m is the number of samples,

And step 17: and finishing the training of the training model.

3. The method of claim 2, wherein the updates in step 15 include convolutional layer updates and fully-connected layer updates: and returning the error layer by utilizing back propagation, and updating the weight of each layer by utilizing a gradient descent method.

4. The method of claim 2, wherein, in step 13,

5. The method of claim 1, wherein, in step 2,