Background
With the development of deep learning, people are more and more pursuing the accuracy of image recognition. Currently, many studies have adopted a Convolutional Neural Network (CNN) as an entry point to improve the accuracy of image recognition. The CNN can directly use the original pixels of the image as input, the characteristics do not need to be extracted first, and in addition, the model trained by the CNN has invariance to distortion such as scaling, translation, rotation and the like and has strong generalization capability. The convolution local perception and weight sharing can greatly reduce the parameter quantity of the neural network, prevent overfitting, reduce the complexity of a neural network model and provide an optimization space with high classification accuracy. The radar profile studied by the invention mainly comprises bottom semantics such as spectrum, color blocks and the like. Traditional methods such as texture detection, statistical methods, etc. do not offer much advantage for such special images, mainly for the following three reasons:
1. the inherent characteristics of the radar profile, including the distribution and gradual change of pixel points, cannot be effectively learned;
2. the radar profile contains too much information, and the traditional method is too slow in processing, so that the problem of big data cannot be solved;
3. and the accuracy of identification is difficult to improve due to the lack of an efficient learning strategy.
The upper layers of CNN are more sensitive to semantics while the middle layers are particularly sensitive to underlying patterns, such as color and gradient, and thus it is a scientific and feasible practice to solve the radar profile recognition problem using CNN. Most of the CNN image classification is based on supervised learning, and this learning method needs a large amount of data as training samples in the training process to obtain more accurate classification. In the radar profile identification process, due to the limitation of weather conditions, the sample collection work of the disaster weather such as thunderstorm and strong wind is extremely difficult. Moreover, excessive similarity between samples can also affect the training results, making features difficult to learn efficiently. Aiming at the problems of small number of samples and excessive similarity of the samples, a method for designing a deep convolution generation countermeasure network (DCGAN) is adopted to solve the problems. The DCGAN realizes expansion on the basis of GAN, retains excellent data generation capability and integrates the advantage of CNN feature extraction, so that the image analysis and processing capability of the DCGAN is improved. According to the invention, the tch Normalization realizes local Normalization, so that the problems of gradient disappearance, gradient diffusion and the like during the training of a network model are solved. Through detection, DCGAN is trained on real large-scale datasets of the real world such as celebA, LSUN and Google ImageNet, and the result is satisfactory. The network structure based on DCGAN performs sample generation operation, and effectively improves the accuracy of identification by combining with the image identification system based on CNN, so that the secondary combination of DCGAN and CNN can better serve scientific research, production and decision.
Disclosure of Invention
The purpose of the invention is as follows: aiming at the defects of the prior art, the invention provides a method for improving the CNN-based image recognition performance by using DCGAN, which combines the excellent data generation capability of DCGAN with the CNN-based image recognition framework for two degrees, well solves the problems of difficult collection of training sample data, overlarge sample similarity and the like in the image recognition process, breaks through the limitation of the number and quality of samples on the optimization problem of a classification model, strengthens the classification model and improves the accuracy of image recognition.
The technical scheme is as follows: a method for improving CNN-based image recognition performance by using DCGAN comprises the following steps:
(1) defining the structures of a generation model and a discrimination model in DCGAN;
(2) establishing a learning rate acceleration strategy;
(3) generating a sample test;
(4) constructing a CNN-based image recognition framework;
(5) and (5) optimizing the performance.
Further, the generative model in step (1) includes a data transformation layer and a deconvolution layer, and the activation function between the data transformation layer and the deconvolution layer is a LeakyReLu function. The data conversion layer mainly converts the noise vector into a vector of an image type through a reshape method. Deconvolution further converts the data dimensions into image format, using LeakyReLu as the activation function in between.
Further, the discriminant model in the step (1) includes a convolution layer and a full-link layer, an activation function between the convolution layer and the full-link layer is a ReLu function, and the end of the full-link layer uses a Sigmoid or SoftMax function for second classification; preferably, the recognition model includes four convolutional layers and a full link layer.
Further, step (1) includes training a generator G having the ability to generate comparable data to the real sample. The effect is to package one noise into another realistic sample, making the discriminator misinterpret a real sample. The discriminator D is a two-classifier used for judging the authenticity of the sample and is a source of learning of the generator.
Further, the step (1) includes establishing a network loss function, where the network loss function includes a network overall loss function, a generative model loss function, and a discriminant model loss function, and is defined as follows:
the formula of the network total loss function calculation table is as follows:
the formula of the generated model loss function calculation table is as follows:
LOSS(G)=-(log(D2(G(z))));
the discriminant model loss function calculation table formula is as follows:
LOSS(D)=-(log(D1(x))+log(1-D2(G(z))));
wherein: d (x) is a discriminant function based on the data x, and G (z) is a generation function based on the noise z;
representing x as originating from a data probability distribution, the same principle
Middle z is derived from the noise profile; d
1(x) And D
2(x) The operation is equivalent.
Further, the step (2) includes optimizing network parameters by using a mini-batch gradient descent method, where the network parameters include a batch size batch, an iteration number epoch, a learning rate α, weights and bias values W and b to be adjusted, and momentum factors m and v to be added when optimizing the gradient descent method.
Further, the step (2) specifically comprises the following steps:
(21) setting a learning rate, wherein the initial value range of the learning rate is [0.9,1.0], and the back propagation updating weight value and the bias value of the learning rate follow the following calculation formula:
w- α (learning rate) · [ value of bias derivative is calculated by loss function to weight ],
wherein, W represents the updating weight or bias value, and alpha is the learning rate;
(22) the learning rate is reduced step by step through iteration, a reverse propagation mechanism is called in each cycle to adjust the weight and the bias value, the minimum value of a loss function is further solved, the learning rate attenuation range is put into iteration operation, and the learning rate attenuation strategy is reduced according to the following formula:
wherein: escape _ rate size is in the range of 0.1 to 1.0, epochiFor the ith iterative training, α0The value range is 0.1 to 1.0 for the initial learning rate.
Further, the recognition model in the step (4) adopts a 4-layer convolutional layer and a 3-layer fully-connected layer neural network, and the output results are four categories, namely a rain and wind category, a rain and wind category and a rain and wind category.
Has the advantages that: compared with the prior art, the invention has the remarkable effects that: firstly, the excellent data generation capability of the DCGAN is combined with a CNN-based image recognition framework for two degrees, so that the problems that training sample data is difficult to collect, the sample similarity is overlarge and the like in the image recognition process are well solved; secondly, hidden details in the radar profile can be automatically learned without manual extraction; thirdly, the problem of batch processing of big data can be solved; fourthly, the limitation of the quantity and the quality of the samples on the optimization problem of the classification model is broken through, and the accuracy of image recognition is gradually improved through multiple times of training through an effective algorithm.
Detailed Description
For the purpose of explaining the technical solution disclosed in the present invention in detail, the following description is further made with reference to the drawings and the detailed implementation mechanism.
The method mainly aims at identifying the radar section image. Unlike general object images, radar profiles describe categories by spectral-like regional distribution and color, so this level of semantic can be better characterized by CNN. In the recognition system, after the features are extracted, specific classification is also required. In order to normalize feature extraction and classification, the present invention does not use a conventional SVM as a classifier, but performs classification operation through a full connection layer and Softmax.
The invention discloses a method for improving CNN-based image recognition performance by using DCGAN, a system flow chart of the method is shown in figure 1, and the method comprises the following specific steps:
step 1: construction of custom DCGAN
And according to the scale of the training data, customizing the structures of a generation model and a discrimination model in the DCGAN, wherein the structures comprise parameter setting and depth setting. In the invention, the full connection layer of the discriminant model is removed, all the activation functions are set as LeakyReLu functions, and two classifications, namely classification of 'true' and 'false', are carried out through a Sigmoid or SoftMax function. The generated model is essentially a deconvolution process, the nonlinear activation functions among all convolution layers adopt ReLu functions, and the output layers adopt tanh functions. The goal of the DCGAN design is to train a generator G that can convert the noise vector z into sample data x for later stage enhancement of the recognition model. The training target of the generator G is defined by a discriminator D, which is used for distinguishing real sample data pdata(x) And generating data pz(z) and generator G maximally lets arbiter D think its output as true. Through repeated training, G and D can finally find a balance of a non-convex game, and data which is almost different from a real sample is generated. We do not make any assumptions or model requirements on the data distribution in advance, but rather optimize directly by using gradient descent. The loss function of the network population is defined by:
the convergence direction of the network is minGmaxDV (D, G). We decompose the loss function in equation 1 according to two models, where equation 2 is the discriminant model loss function and equation 3 is the generative model loss function.
LOSS(D)=-(log(D1(x))+log(1-D2(G (z)) (equation 2)
LOSS(G)=-(log(D2(G (z)) (equation 3)
We use the machine learning algorithm in the tensflow framework to make these loss functions converge to a minimum and get the optimal weight function through back propagation. And repeating iterative optimization operation, and continuously optimizing the weight value and the offset value, so that an excellent generation model can be trained to generate the required data. The same is true for the discriminant model, and we refer to the batch gradient descent principle in the BP neural network, and simultaneously minimize the loss functions of the two. As shown in fig. 2, the structure of the custom DCGAN is shown.
Step 2: accelerated learning by introducing learning rate attenuation strategy
In order to accelerate the training process of DCGAN, a strategy that the learning rate is attenuated continuously is adopted. The self-defined DCGAN optimizes network parameters by using a mini-Batch gradient descent mode, and although a Batch Normalization protection gradient is added between convolution and an activation function, noise must be generated in an iteration process so that the descent process does not accurately converge to a minimum value, but swings around the minimum value. The reason for introducing the learning rate decay strategy is as follows: at the beginning, a large learning rate can achieve a very fast convergence rate. As the learning rate becomes smaller, the convergence step also relatively decreases, and even swinging around the minimum value does not cause much error. The preceding Normalization operation makes the gradient more stable, so the training process is fast and stable. In the training, each iteration is performed for a certain number of times to perform learning rate attenuation, and the specific steps are as follows:
1. first, a larger learning rate is used;
2. the learning rate is reduced step by step through iteration.
The learning rate decay strategy follows equation 4:
the escape _ rate size can be preset to 0.95, epochiFor the ith iterative training, α0Is the initial learning rate. The learning rate attenuation is carried out synchronously with the operation of the backward propagation, when the backward propagation is carried out once per iteration, the learning rate is updated once, and the learning rate is ensured to be different every time.
The stochastic gradient descent method does not work well, and the learning rate only optimizes its results, but does not improve its efficiency. In order to obtain an optimal solution quickly and make the training later period more stable, the learning rate after attenuation needs to be combined with an optimizer for convergence. Most optimizers, such as Momentum, operate on updated parameters. It takes momentum into account so that the gradient becomes steeper, which, while converging, makes the process very tortuous; and another optimizer, such as AdaGrad, modifies the learning rate, which is equivalent to adding a penalty mode, so that each parameter has its own learning efficiency. We combine the two methods and use Adam to accelerate the training of neural networks, whose mathematical form is shown below.
mi=b1*mi-1+ (1-b1) × (equation 5)
vi=b2*vi-1+(1-b2)*dx2(formula 6)
The updating of the weight parameters depends on two variables m and v, dx being the amount of change. In formula 5, m contains a Momentum gradient attribute, and formula 5 contains an AdaGrad resistance attribute when calculating v. Equation 7 considers both m and v to achieve the update of the weight parameter. In the experiment, the loss function is transmitted to the optimizer to be used as a source of back propagation, and the back propagation is also matched with the iterative operation. After each training, the accuracy and the error rate returned by the feedforward network are checked, so that the robustness of the model is judged.
And step 3: generating a sample assay
The sample first needs to be generated before the sample can be tested. We train DCGANs with radar profiles of both the rainy and windless categories as samples, respectively, because these two types of samples are relatively difficult to collect. In order to train more efficiently and prevent all pictures from being read into the memory at one time and causing a pause, a mini-batch training mode is adopted, and 64 pictures are trained by one batch. Each time 100 batches are run, a sample graph is generated locally. Significant training is required to adequately learn the features of the image. In order to facilitate the next training and generate samples, a model is generated and stored every 100 times of training. And after the training is finished, the trained model is loaded to generate a sample. Fig. 3 shows a real image and a generated image effect. Although the DCGAN-generated sample is visually close to the real sample, the human eye cannot be used as a standard for judging whether the generated sample is qualified. We need to test it to prove whether the generated sample has the property of real data. We randomly input the partially generated samples by using the pre-trained CNN recognition framework in fig. 4 as a detection tool, and verify the quality of the generated samples according to the classification results. If the generated samples are accurately classified into the corresponding categories, the generated samples can be considered to be qualified. Through tests, the success rate of accurately classifying the generated samples with rain and wind is 90%, and the success rate of accurately classifying the samples with rain and wind is 88%. This, together with the success rate of the approach of the real data in pre-training, remains within a reasonable error range, proving that the generated samples can be used for training together with the real samples.
With the sample generated, the sample can be tested. Since the samples used for training are of two types, there are also two types of samples generated. The four-classification operation is performed by using the generated samples, and if the two types of samples can be correctly classified, the generated samples are qualified. To avoid the recognition impact of small probability, the network used to detect the samples is the same as the network that implements image recognition later. We first build the image recognition model and finish pre-training, while comparing with the original CNN. After the superior performance is shown, all the tensor values and structures of the model are copied, namely the ckpt class file of the model, and the classification test of the sample is carried out.
And 4, step 4: construction of CNN-based image recognition framework
In the invention, a neural network with 4 convolutional layers and 3 fully-connected layers is built by using a model for identification, and the output results are classified into four categories. The depth of the network model is determined according to the scale and the classification quantity of the data to be tested, and the later stage can be expanded according to the actual situation. The model frame is shown in fig. 4.
In the first convolutional layer, we define 32 convolutional kernels of 5 × 5 dimensions, and the initialization weight takes a random value with a normal distribution standard deviation of 0.01, and the initialization bias value is 0. The step size of the convolution operation is uniformly set to be 1, and the boundary processing is set to be in a mode of crossing the boundary and supplementing 0. The step size of the pooling operation is set to 2 and its boundary handling method is to discard directly areas that are not sufficiently large for the convolution kernel. The initialization operations of the weights and offsets, convolution kernels and pooling in the remaining convolutional layers are kept consistent with the first layer. The second convolutional layer is provided with 64 convolution kernels of 5x 5; the third convolutional layer is provided with 128 convolution kernels of 3x 3; the fourth convolutional layer also sets up 128 convolution kernels of 3x 3. Since CNN can take picture pixels as direct input, it is necessary to change the data dimension to get the final one-dimensional classification result. Therefore, we define 1024 neurons in the first fully-connected layer to transform the dimensions. Considering the activation law of neurons: when the data has an activation effect, the more significant the activation effect, the stronger the effect that the neuron is evoked. Therefore, ReLu is used as the nonlinear activation function. In order to prevent excessive unnecessary neuron participation in calculation, a dropout mechanism is defined between fully connected layers, and can enable part of neurons to be in a dormant state, so that the problem of excessive calculation caused by excessive neuron activation is avoided, and the mechanism is more similar to human thinking. The second layer of fully connected we define 512 neurons, again using the ReLu activation function and appending the dropout mechanism. The final layer of fully connected we define 4 neurons for result output, representing four classes of probabilistic results, respectively, as shown in fig. 5.
And 5: performance comparison and optimization
Our pre-training data set has 10000 radar profiles, which include 4 classification categories: rain, wind, rain, and wind. There were 2500 images per category, each with a pixel size of 540 x 440. Images were derived from radar stations in the Nanjing area and the Anhui area of 2016 and 2017. DCGAN data for quality validation are of two types: 200 images of each of the rain and wind and the rain and no wind were generated.
In the final mixed training, the data set generated by DCGAN is expanded to 1000 pieces per class. After adding the real data, the 4 types of data keep consistent in quantity. In the final testing stage, we performed 4 classification tests, 200 radar images each classified, which are real radar data and have not been trained.
And putting the data generated by the DCGAN into a corresponding class training set to be trained again together with the real data, and finding that the overall accuracy of the mixed training is improved. As shown in fig. 6, the overall accuracy after the hybrid training is more stable and the accuracy is improved compared with the prior art. To verify whether the model after the hybrid training is enhanced, we compare a set of recognition results. Fig. 7 represents the 4 classification results after the recognition framework is pre-trained, and fig. 8 shows the recognition results after model enhancement. In contrast, the accuracy of the enhanced model identification is improved.