CN111598157A

CN111598157A - Identity card image classification method based on VGG16 network level optimization

Info

Publication number: CN111598157A
Application number: CN202010405901.4A
Authority: CN
Inventors: 杨浩楠; 李娟�; 王全增
Original assignee: Beijing University of Technology
Current assignee: Beijing University of Technology
Priority date: 2020-05-14
Filing date: 2020-05-14
Publication date: 2020-08-28
Anticipated expiration: 2040-05-14
Also published as: CN111598157B

Abstract

The invention discloses an identity card image classification method based on VGG16 network level optimization, which comprises the steps of obtaining an image data set VOC2007 data set, and carrying out data preprocessing and division of a training set test set; and constructing an improved convolutional neural network training model based on VGG 16. And training the model by using the divided training sample set. And calling the model pre-training weight. The invention improves the training speed and the recognition speed by modifying the last few layers of common convolutional layers of the classical classification network into deep separable convolutional layers, and can obviously improve the speed index due to the structural advantage of the deep separable convolutional layers, but the network needs to be trained by a pre-training method because the number of layers of the VGG16 network is large. Due to the combination of the advantages of the two parts and the comparison of loss curves in an analysis experiment, the network improved by the deep separable convolutional layer has a speed index obviously improved compared with the VGG16 of the original common convolutional layer, and the accuracy is equivalent to that of the original network.

Description

Identity card image classification method based on VGG16 network level optimization

Technical Field

The invention belongs to the technical field of deep learning, and particularly relates to an algorithm for classifying images based on convolutional neural network model hierarchy optimization, so that the aim of obviously improving a speed index compared with the VGG16 of an original common convolutional layer is fulfilled under the condition that the accuracy rate is kept unchanged.

Background

Starting from the convolutional neural network on the historical stage, through continuous improvement and optimization, the convolution is not the convolution of the current year, and Group convolution (Group convolution), hole convolution (related convolution or optimized convolution) are produced

Depth separable convolutions (depthwise separable convolution) were the essence of Xception and MobileNet series, and it was first proposed in 2013 by a trainee Laurent Sifre of Google Brain, which was analyzed and experimented in his doctor's paper in detail.

Disclosure of Invention

Aiming at the existing deep separable convolutional neural network method, the invention improves the training speed and the recognition speed of the classic classification network by providing an improved VGG-16 network, wherein the VGGNet network is developed by researchers of the university of Tsugaku and Google brain together, and the first name of a positioning task and the second name of a classification task are obtained in the competition of ILSVRC-2014. VGGNet is rapidly gaining popularity due to its excellent performance. There are some variants in the VGGNet model, some of which are common as shown in table 1, where the last welcome is VGG16, model D, which is a model with 16 layers, and the input dimension of the model is 224 × 3.

TABLE 1 VGGNet

Understanding the design principle of the VGG16 model, two main features of the convolutional neural network are to be understood firstly:

1: local connection

2: weight sharing

Depth: for local connection, the output of each neuron is only related to the local neurons around the upper layer of the neuron, so that the parameters of the network can be greatly reduced, a deeper network model can be designed, and in this way, if the designed network structure is deep, some global information can be lost, but if the designed network structure is deep, the local features of the lower layer of the network can be combined at the upper layer, and the global feature is finally obtained. It seems a clever feature that on the one hand local connections may lose global features, but on the other hand it may design deeper network models to get global features. The depth is extremely important to the performance of the network, generally speaking, the deeper the depth, the better the performance of the network, the deep network naturally integrates the low, medium and high level features, and richer features and meanings can be expressed. In the VGG series model, the network depth reaches 16-19 layers, which is the deepest depth network before paper release, so that the VGG series model shows excellent performance.

Width: for weight sharing, the weight refers to the weight of the convolution kernel, and the sharing refers to that information learned from a local region is applied to other places of the image, i.e. the whole image is deconvolved by using the same convolution kernel. Through weight sharing, parameters trained in the network are greatly reduced, and deeper models can be trained. Secondly, because of weight sharing, one convolution kernel can only learn a certain corresponding feature, so that a plurality of different convolution kernels need to be designed to extract different features in an image, the number of the convolution kernels also becomes the width of a convolution neural network, in the VGG16 model, the number of the convolution kernels is initially 64, as the network deepens, the number of the convolution kernels is increased to 512, and the network is a very wide network, so that the final performance is not so strong.

And (3) convolution kernel: meanwhile, it can be found that all convolution kernels in the VGG model are 3 × 3 in size, and the 3 × 3 small convolution kernel stack is used to replace the large convolution kernel, mainly for the following reasons:

(1)3x3 is the smallest size that can capture pixel eight neighborhood information.

(2) The limited receptive field of the two 3x3 stacked roll-base layers is 5x 5; the receptive field of the three 3x3 stacked convolutional base layers is 7x7, so that a large-sized convolutional layer can be replaced by a stack of small-sized convolutional layers, and the receptive field size is unchanged. So three filters of 3x3 can be considered as a 7x7filter decomposition with a non-linear decomposition in the middle layer and function as implicit regularization.

(3) The multiple 3x3 volume bases are more non-linear (more levels of non-linear functions, using 3 non-linear activation functions) than a large size filter volume base, making the decision function more deterministic.

(4) The convolutional layers of 3x3 have fewer parameters than a large-size filter, and assuming that the input and output feature maps of the convolutional layers are the same as C, the number of parameters of the convolutional layers of 3x3 is 3x ((3x3xC) xC) is 27C 2; the convolutional layer parameter for one (7x7xC) xC was 49C 2. The former can express more powerful features in the input data and uses fewer parameters.

The disadvantage is that the middle convolutional layer may occupy more memory when backward propagation is performed.

Nuclear pooling: compared with the 3x3 pooling kernel of AlexNet, the VGG can retain more image information by adopting the 2x2 pooling kernel in total.

Full concatenation to convolution (test phase): this is also a feature of the VGG, and the three fully-connected layers in the training phase are replaced by three convolutional layers in the network testing phase, so that the tested fully convolutional network can receive any input with a width or height, which is important in the testing phase, because there is no limitation of full connection.

The only difference between the convolutional layer and the fully-connected layer is that the neurons of the convolutional layer are locally connected to the input, and different neurons in the same channel share weights (weights).

For example, in VGGNet [1], the first fully-connected slice has an input of 77512 and an output of 4096, which can be represented by a convolution kernel size of 77, a step size of 1, no padding (padding), the convolution equivalent of 4096 output channels, an output of 11 x 4096, and the fully-connected slice equivalent, the subsequent fully-connected slice can be replaced by a 1x1 convolution equivalent. In short, the rule for converting fully-connected layers into convolutional layers is to set the convolutional kernel size to the input space size, which has the advantage that the convolutional layer has no limitation on the input size, so that the sliding window type prediction can be efficiently performed on the test image. However, the training speed and detection speed of the VGG16 network should need to be further improved.

The invention aims at the improvement of a VGG16 network to achieve the aim of improving the training speed and the recognition speed of a classic classification network, and an improved image classification method based on the VGG16 network comprises the following steps:

step 1, acquiring a VOC2007 data set of an image data set, and performing data preprocessing and division of a training set test set.

And 2, constructing an improved convolutional neural network training model based on VGG 16.

And 3, training the model by using the divided training sample set.

And 4, calling the model pre-training weight.

And 5, analyzing the experimental result aiming at the model prediction result.

Preferably, step 1 specifically comprises the following steps:

step 1.1, obtaining a VOC2007 data set through a public data set;

step 1.2, data verification is carried out, wherein the data verification comprises image integrity and corresponding class labels;

step 1.3, randomly dividing 60000 images in a data set into training sets and testing sets according to a ratio of 5:1, wherein each training set and each testing set comprises one image and a corresponding class label;

preferably, step 2 specifically comprises the following steps:

step 2.1, firstly, building an original VGG16 network as a control experiment model;

step 2.2, improving on the basis of the VGG network, and replacing the last three layers of convolution with deep separable convolution layers;

step 2.3, setting the dropout value in the training process to be 0.7, namely, in the training process, randomly removing 30% of neurons to prevent overfitting of the model;

preferably, step 3 specifically comprises the following steps:

step 3.1, adjusting the dimension of the trained data set to 224 × 3;

3.2, carrying out model training by using the training set with the newly divided dimensionality, wherein the loss calculation adopts a multi-classification cross entropy calculation method, and an optimizer in the training adopts an rmsprop optimizer;

3.3, stopping the training of the model when the network iterates until the loss value is smaller than a certain threshold value, and storing the model of the deep convolutional neural network;

step 3.4, carrying out the same operation on the training of the control experiment;

preferably, step 4 specifically comprises the following steps:

step 4.1, after the weights of the corresponding last three layers of convolution layers in the weight model are removed, the rest parts are loaded into the model provided by the invention;

step 4.2, calling the pre-training weight of the control experiment, and directly using a weight model provided by an official party;

step 4.3, canceling and comparing the optimization effect of the model provided by the invention by observing loss;

preferably, step 5 specifically comprises the following steps:

step 5.1, observing the experimental results of the method provided by the invention and the original VGG16 method;

and 5.2, comparing loss curves of the models before and after optimization on the same data set to analyze the optimization effect. The loss curves are shown in fig. 2 and 3. Wherein FIG. 2 is a loss curve of original VGG 16. FIG. 3 is a loss curve of the optimization model proposed by the present invention. Analysis shows that the loss curve of the original model is decreased smoothly and slowly, while the loss curve of the invention is decreased rapidly and is fitted well. The main reason is that the spatial feature learning and the channel feature learning are separated by the deep separable convolution, so that the needed parameters are few, the calculated amount is small, and a smaller and faster model can be obtained. Less data is used to learn a better representation and thus a better performing model. The slow drop in early loss is mainly caused by the fact that no corresponding pre-training weights are loaded.

Compared with the prior art, the invention has the following obvious advantages:

the invention improves the training speed and the recognition speed by modifying the last few layers of common convolutional layers of the classical classification network into deep separable convolutional layers, and can obviously improve the speed index due to the structural advantage of the deep separable convolutional layers, but the network needs to be trained by a pre-training method because the number of layers of the VGG16 network is large. If the pre-training weight is not called after the deep separable convolutional layer is completely replaced, a compromise method is adopted to replace the last three layers of common convolutional layers of the original VGG16 network into the deep separable convolutional layer, so that the pre-training weight can be called, and the training speed and the recognition speed can be increased. Due to the combination of the advantages of the two parts and the comparison of loss curves in an analysis experiment, the network improved by the deep separable convolutional layer has a speed index obviously improved compared with the VGG16 of the original common convolutional layer, and the accuracy is equivalent to that of the original network.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings and tables used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings and tables in the following description are only some embodiments of the present invention, and other drawings and tables can be obtained according to the drawings without creative efforts for those skilled in the art.

FIG. 1 is a representation of a deep separable convolutional neural network;

FIG. 2 is a loss curve of original VGG 16;

FIG. 3 is a loss curve of the optimization model proposed by the present invention.

Detailed Description

The present invention will be described in further detail below with reference to specific embodiments and with reference to the attached drawings.

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the following description is further made in conjunction with the data set, model, framework and experimental results used in the experiment. In the experiment, the VOC2007 data set is used as the input of the image of the model, the method of the invention is realized by adopting the tensrflow framework programming, and the network of the invention and the VGG16 network before optimization are compared through the experiment.

Step 1, acquiring an image data set VOC2007 data set, and performing data preprocessing and division of a training set test set.

Step 1.1, a VOC2007 data set is acquired by disclosing the data set.

And step 1.2, data verification is carried out, wherein the data verification comprises image integrity and corresponding class labels.

Step 1.3, randomly dividing 60000 images in a data set into training sets and testing sets according to the proportion of 5:1, wherein each training set and each testing set comprises one image and a corresponding class label.

Step 2.1, firstly, building an original VGG16 network as a control experiment model.

And 2.2, improving on the basis of the VGG network, and replacing the last three layers of convolution with deep separable convolution layers.

And 2.3, setting the dropout value in the training process to be 0.7, namely, in the training process, randomly removing 30% of neurons to prevent the model from being over-fitted.

And 3, training the model by using the divided training sample set.

Step 3.1, the dimension size of the trained data set is adjusted to 224 × 3.

And 3.2, performing model training by using the training set with the well-subdivided dimensionality, wherein the loss calculation adopts a multi-classification cross entropy calculation method, and an optimizer in the training adopts an rmsprop optimizer.

And 3.3, stopping the training of the model when the network iterates until the loss value is smaller than a certain threshold value, and storing the model of the deep convolutional neural network.

Step 3.4, the same procedure as described above was performed for training of the control experiment.

And 4, calling the model pre-training weight.

And 4.1, removing the weights of the corresponding last three layers of convolution layers in the weight model, and loading the rest parts into the model provided by the invention.

And 4.2, calling the pre-training weight of the control experiment, and directly using the weight model provided by the official party.

And 4.3, canceling and comparing the optimization effect of the model provided by the invention by observing loss.

And 5, analyzing the experimental result aiming at the model prediction result.

And 5.1, observing the experimental results of the method provided by the invention and the original VGG16 method.

The above embodiments are only exemplary embodiments of the present invention, and are not intended to limit the present invention, and the scope of the present invention is defined by the claims. Various modifications and equivalents may be made by those skilled in the art within the spirit and scope of the present invention, and such modifications and equivalents should also be considered as falling within the scope of the present invention.

Claims

1. An identity card image classification method based on VGG16 network level optimization is characterized in that: comprises the following steps of (a) carrying out,

step 1, acquiring a VOC2007 data set of an image data set, and performing data preprocessing and division of a training set test set;

step 2, constructing a convolutional neural network training model improved based on VGG 16;

step 3, training the model by using the divided training sample set;

step 4, calling model pre-training weight;

and 5, analyzing the experimental result aiming at the model prediction result.

2. The identity card image classification method based on VGG16 network level optimization of claim 1, wherein: the step 1 specifically comprises the following steps of,

step 1.1, obtaining a VOC2007 data set through a public data set;

3. The identity card image classification method based on VGG16 network level optimization of claim 1, wherein: the step 2 comprises the following steps in detail,

4. The identity card image classification method based on VGG16 network level optimization of claim 1, wherein: the step 3 comprises the following steps in detail,

step 3.1, adjusting the dimension size of the trained data set to 224 × 3;

5. The identity card image classification method based on VGG16 network level optimization of claim 1, wherein: the step 4 comprises the following steps in detail,

6. The identity card image classification method based on VGG16 network level optimization of claim 1, wherein: the step 5 comprises the following steps in particular,

step 5.1, observing the experimental results of the method and the original VGG16 method;

and 5.2, comparing loss curves of the models before and after optimization on the same data set to analyze the optimization effect.