CN114219778A

CN114219778A - Data depth enhancement method based on WGAN-GP data generation and Poisson fusion

Info

Publication number: CN114219778A
Application number: CN202111482780.4A
Authority: CN
Inventors: 侯越; 张慧婷; 陈宁; 刘卓; 陈艳艳
Original assignee: Beijing University of Technology
Current assignee: Beijing University of Technology
Priority date: 2021-12-07
Filing date: 2021-12-07
Publication date: 2022-03-22
Anticipated expiration: 2041-12-07
Also published as: CN114219778B

Abstract

The invention discloses a data depth enhancement method based on WGAN-GP data generation and Poisson fusion, wherein WGAN-GP is a generation confrontation network with gradient punishment and is a generation model based on a game idea, and the generation model comprises two networks, namely a generation network G and a discrimination network D. The new training data is synthesized by inserting the road surface defect image generated by WGAN-GP into the defect-free road image. When the image is inserted, the image is inserted as truly as possible, and edge protrusion is avoided, so that the target detection model only learns the edge feature of the object instead of the disease feature quantity. According to the method, the WGAN-GP data generation technology and the Poisson fusion technology are utilized to carry out deep enhancement on the data, compared with the traditional data enhancement method, the generated pictures are brought into a training set of the pavement disease detection model, enough data quantity is provided, the detection model is made to learn possible distribution, and the precision of pavement disease intelligent detection is improved.

Description

Data depth enhancement method based on WGAN-GP data generation and Poisson fusion

Technical Field

The invention belongs to the field of deep learning image processing, and relates to a data depth enhancement method based on WGAN-GP data generation and Poisson fusion. The method is suitable for intelligently detecting the road diseases caused by lack of data samples or unbalanced classes in target detection.

Background

Due to the influences of temperature change, repeated rolling, improper regular maintenance and the like, diseases of different degrees and different types can occur on the road surface, such as road surface cracks, pits and the like seriously affect the driving comfort and safety, and necessary maintenance management is very important for promoting the improvement of the service level of urban road facilities and ensuring the safe operation of the road. At present, road maintenance excessively depends on a manual inspection mode, and the problems of high cost, strong subjectivity, low efficiency and the like exist. Therefore, the automatic detection of the road diseases is realized by adopting advanced technical means, and the method becomes the key point of road maintenance management.

In recent years, image recognition techniques by means of artificial intelligence are increasingly applied to pavement disease detection, and the core of the image recognition techniques is deep learning and convolutional neural network. Deep learning is a data-driven technique that requires a sufficient amount of data to allow the model to learn the possible distributions. However, in an actual collection environment, it is difficult to obtain enough disease samples with balanced categories to guarantee training of the neural network. The traditional data enhancement method has various types, including random scaling, contrast enhancement, random cropping, horizontal turning, vertical turning and the like, but the characteristics of the road surface diseases are often related to the color distribution and the contrast of the image, so the contrast enhancement of the traditional data enhancement method is probably not suitable for the road surface image. And when the number of images is small, operations such as random cropping and turning are difficult to supplement potential distribution rules implied by data.

Aiming at the problems, the invention provides a data depth enhancement method combining WGAN-GP data generation and Poisson fusion, road disease pictures under different shelters and different illumination conditions are artificially generated and are brought into a training set of a target detection model, so that the intelligent detection precision of the road disease is improved.

Disclosure of Invention

The technical scheme adopted by the invention is a data depth enhancement method based on the combination of generation of a countermeasure Network WGAN-GP (Wassertein genetic adaptive Network with Gradient Peaalty) and Poisson fusion, which comprises two parts of WGAN-GP data generation and Poisson fusion, wherein the method implementation flow is shown in figure 1, and the specific steps are as follows:

the method comprises the following steps: WGAN-GP data generation

The WGAN-GP is a generative confrontation Network with a gradient penalty, and is a generative model based on a game idea, and the generative model includes two networks, namely, a generative Network g (generator Network) and a discriminative Network D (Network). As shown in fig. 2, the countermeasure generation network framework is used, in which randomly distributed noise (a) is added to an original image (b) to generate data (c), the generated data (c) is input to a generation network G, the generation network G extracts feature information (D) of a pit and generates fitting data (e) to deceive a discrimination network D, the discrimination network D discriminates whether the generated network outputs the fitting data, a model is trained by continuously and alternately feeding back, and finally, when the discriminator cannot discriminate the sample from the true or false, training of the generator model is completed.

The WGAN-GP data generation algorithm comprises the following steps:

firstly, cutting a road disease boundary frame from an original road picture to be used as an input image generated by WGAN-GP data, wherein the cut picture is a pit slot under different shelters under different illumination conditions. The cut picture is then resized to 80x80 pixel size, generating an HDF5 data storage file, which is brought into the WGAN-GP countermeasure generation network.

Secondly, the WGAN-GP network reads in an HDF5 data storage file and sets network training parameters, wherein the network training parameters comprise Batch training Batch _ size, picture pixel size, training band number Epoch, discriminator parameters and discriminator parameters.

Next, a generator and arbiter network structure is set. The WGAN-GP generation network G and the discrimination network D are each set as a five-layer convolutional neural network, the generation network is composed of a ConvTrans2D deconvolution layer, a bn (batchnorm) normalization layer, Relu and a Tanh activation function, the discrimination network is composed of a Conv2D convolution layer, an in (instancenorm) normalization layer, and a leakyreu activation function, and the network settings are shown in table 1.

TABLE 1 WGAN-GP countermeasure Generation network setup

The Relu activation function used in the present invention is as follows:

the LeakyRelu activation function used is as follows:

the Tanh activation function used in the present invention is as follows:

in the formulas (1), (2) and (3), x is the input data of each layer of neural network, and f (x) is the output function of each layer of neural network.

Then, a loss function and an optimization function of the discriminator are set. For the loss function, the first two terms are game loss of the discriminator, and the last term is introduced Lipschitz constraint gradient punishment. For game loss, the training for generating the countermeasure network is to train the discrimination network D to maximize discrimination accuracy, and train the generated network G to minimize log (1-D (G (z))), namely, the training generated network G and the discrimination network D are games about the minimum maximum of the function V (G, D), and the adopted game function is shown in formula (4):

the discriminator loss function used is as follows (5):

in the formula: x is the true data distribution, maximizing D (x) means that the discriminator will learn the true sample data distribution as much as possible, P_z(z) is the input noise profile, g (z) represents the sample data profile generated by the generator, and D (g (z)) represents how best the discriminator will discriminate whether the sample data generated by the generator is true.

For the optimization function, the model parameters are optimized using the Adam method. The Adam method is a simple and high-calculation-efficiency random objective function gradient optimization algorithm, and has two advantages in the aspects of processing sparse gradients and processing non-stationary targets.

Adam holds the past mean squared gradient v_tIn the exponential decay trend, the past gradient m with an exponential decay trend_tAnd a flat minimum preference in the error plane. Then, the past attenuation mean and the past squared gradient m are calculated_tAnd v_tAs in equations (6), (7) respectively:

m_t＝β₁m_t-₁+(1-β₁)g_t (6)

wherein m is_tAnd v_tAre estimates of the first moment (mean) and the second moment (no central variance) of the gradient, respectively. The algorithm keeps the random gradient decline of the image matrix at a single learning rate, and updates all weights in the generated countermeasure network.

Due to m_tAnd v_tVectors initialized to 0, which are biased toward 0, these biasesThe calculation is shown in formulas (8) and (9):

these t are then used and the parameters are updated as shown in equation (10):

wherein, beta₁Default value is 0.9, beta₂Default value of (2) is 0.999, and the default value of epsilon is 10^-8。

And finally, the discriminator calculates the difference gradient M between the real sample and the generated sample, the Lipschitz constraint is set not to exceed M (M is set as 1 here), the Loss constraint is close to M to the maximum extent, and the continuous updating optimization of the network is realized through iteration.

Step two: poisson fusion

In the present invention, new training data is synthesized by inserting a road surface defect image generated by WGAN-GP into a road image without a defect. When the image is inserted, the image is inserted as truly as possible, and edge protrusion is avoided, so that the target detection model only learns the edge feature of the object instead of the disease feature quantity. Poisson blend (Poisson blend) is an image blending method, and can blend a source image into a target scene well, and the tone and the illumination of the source image are consistent with those of the target scene.

The method for solving the optimal value of the pixel through Poisson fusion introduced based on the Poisson equation retains the gradient information of the source image and better fuses the source image and the target image. The method solves a Poisson equation according to the specified boundary conditions, realizes the continuity on the gradient domain, and thus achieves the seamless fusion at the boundary.

The Poisson fusion algorithm comprises the following steps:

firstly, a source map source and a background map destination are prepared, wherein the original map is a road pit groove defect generated by WGAN-GP data, and the background map is a non-defect road picture shot by a vehicle-mounted mobile phone.

Second, the set point P specifies where the source graph is placed on the background graph, where point P is where the center point of the source graph is located.

Then, the gradient fields of the source image and the background image are calculated, and the gradient field of the fusion image is obtained. In the invention, the used fusion algorithm utilizes the graph gradient and the boundary condition to calculate the pixel value of the unknown region by Poisson equation, as shown in formula (11):

wherein,

is a gradient operator; v is a fade area in the image; f unknown function of the target area; f. of^*A known function of the source region; ω is the target region;

is the boundary between the source region and the target region.

Finally, the pixel values of the fused image are determined by solving the poisson equation.

Step three: effect verification

The improvement effect of the data depth enhancement technology on the target detection performance is verified by adopting the Yolov5 target detection, and the performance is quantitatively evaluated through the F1-Score quantitative index.

The precision rate is defined as the proportion of correctly detected objects among all detected objects, as in equation (12):

recall refers to the proportion of correctly detected subjects in all detected positive samples, as in formula (13):

where TP is the number of correctly detected diseases, FP is the number of non-diseases considered as diseases, and FN is the number of diseases considered as non-diseases.

F1-Score is the harmonic mean of precision and recall, which combines the yield results of precision and recall, as in equation (14):

compared with the prior art, the invention has the following technical advantages. The method utilizes the WGAN-GP data generation technology and the Poisson fusion technology to carry out deep enhancement on the data, compared with the traditional data enhancement method, the method can mine the distribution rule implied by the data and generate road disease pictures under different shelters and different illumination conditions. The generated pictures are brought into a training set of the pavement damage detection model, and sufficient data volume is provided to enable the detection model to learn possible distribution, so that the precision of pavement damage intelligent detection is improved.

Drawings

Fig. 1 shows a data depth enhancement calculation step.

FIG. 2 is a WGAN-GP algorithm architecture diagram.

Fig. 3 is a WGAN-GP training process.

Fig. 4 shows pit diseases generated based on WGAN-GP data.

Fig. 5 is a pit defect data synthesis flow.

Fig. 6 is a road picture synthesized using a Mixed fusion method.

Fig. 7 is a road picture synthesized by using the Normal fusion method.

FIG. 8 is a comparison of training accuracy before and after data depth enhancement.

Figure 9 is a comparison of training loss before and after data depth enhancement.

Detailed Description

The data set used for the Road surface disease data enhancement algorithm training is from Global Road Damage Detection Change, is a high-definition Road surface picture shot by a vehicle-mounted smart phone, has the running speed of 40km/h, and collects non-overlapping images with the size of 600 x 600 pixels.

Firstly, cutting a boundary frame of a pavement defect pit from a data set to serve as an input image generated by data, cutting 300 pit defects under different shelters under different illumination conditions, adjusting the pit defects to 80x80 pixels, generating an HDF5 data storage file, and carrying the HDF5 data storage file into WGAN-GP network training, wherein the training process is shown in fig. 3. The WGAN-GP countermeasure generation network parameter settings are shown in table 2, where K (Kernel _ size) denotes the convolution Kernel size, N _ out denotes the number of convolution kernels, and s (stride) denotes the step size.

TABLE 2 WGAN-GP countermeasure Generation network parameter settings

After 100000 times of training of the generated countermeasure network, a generator. pkl weight file is generated, that is, the generated weights after training of the batch of pit pictures are loaded and run to generate 600 pit defect pictures, and the random generation result is shown in fig. 4.

Then, data synthesis is performed on the pit and groove generation diseases by using poisson fusion, and the flow is shown in fig. 5. Taking 600 pictures of the non-diseased roads in the Japanese data set as a background map and 600 road pit diseases generated by the WGAN-GP generating network trained in the first step as a source map, and taking 1200 pictures as a data set for data synthesis. Then, a placement point of the pit in the background picture is set, and the position of the point P in the present invention is set at the picture height 3/5 and the picture width 1/2. The road picture synthesized by using the Normal fusion mode is shown in fig. 6, and the road picture synthesized by using the Mixed fusion mode is shown in fig. 7. As can be seen from the figure, compared with the Normal fusion mode, the Mixed fusion mode can better fuse the source image and the background of the target image while retaining the gradient information of the source image. Therefore, the Mixed fusion mode is selected.

Finally, the improvement effect of the WGAN-GP data generation and Poisson fusion combined data depth enhancement method on the performance of the target detection algorithm is verified, 600 synthesized pavement pictures with pit and groove diseases are added into a Japanese original public data set, and the Yolov5 target detection algorithm is adopted to compare the data with the detection precision before and after enhancement. In the invention, the experimental training is based on a windows10 operating system, a Pythrch frame is used, and the deep learning hardware environment is configured as follows: intel (R) core (TM) i7-8850H CPU processor; memory: 64 GB; a display card: quadro P4000. The computer language Python 3.8, and loads parallel computing architecture CUDA version 10.2 and CUDA-based deep learning GPU acceleration library CUDNN 8.1. Training uses a random gradient descent algorithm with an attenuation coefficient of 0.937 to optimize the network, and the parameter settings such as learning rate, batch training, training round and the like are shown in table 3.

Table 3 experimental parameter settings

The training accuracy curve is shown in fig. 8, and the training loss curve is shown in fig. 9, where the solid line represents the training process after data enhancement, and the dotted line represents the training process of the raw data. As can be seen from FIG. 8, after the data depth enhancement, the accuracy of the target detection algorithm is significantly higher than the training accuracy of the original data, and F1-Score is improved by 3.6%. As can be seen from fig. 9, after the data depth enhancement, the loss of the target detection algorithm is lower than the training loss of the original data, which verifies the effectiveness of the method provided by the present invention in solving the problems of unbalanced road surface disease types and few samples.

Claims

1. A data depth enhancement method based on WGAN-GP data generation and Poisson fusion is characterized by comprising the two parts of WGAN-GP data generation and Poisson fusion and comprising the following specific steps of:

the method comprises the following steps: generating WGAN-GP data;

firstly, cutting a road disease boundary frame from an original road picture to be used as an input image generated by WGAN-GP data, wherein the cut picture is a pit slot under different shelters under different illumination conditions; then adjusting the cut picture to 80x80 pixel size, generating an HDF5 data storage file, and bringing the HDF5 data storage file into a WGAN-GP countermeasure generation network;

secondly, the WGAN-GP network reads in an HDF5 data storage file and sets network training parameters, wherein the network training parameters comprise Batch training Batch _ size, picture pixel size, training band number Epoch, discriminator parameters and discriminator parameters;

then, setting a generator and discriminator network structure; the WGAN-GP generating network G and the discrimination network D are set as five-layer convolutional neural networks, the generating network consists of a ConvTrans2D deconvolution layer, a BN normalization layer, Relu and Tanh activation functions, and the discrimination network consists of a Conv2D convolution layer, an IN normalization layer and a LeakyRelu activation function;

finally, the discriminator calculates the difference gradient M between the real sample and the generated sample, the Lipschitz constraint is set not to exceed M, the Loss constraint of Loss is close to M to the maximum extent, and the continuous updating optimization of the network is realized through iteration;

step two: poisson fusion

The new training data is synthesized by inserting the road surface disease image generated by WGAN-GP into the road image without disease; the method for solving the optimal value of the pixel through Poisson fusion introduced based on the Poisson equation is used for fusing a source image and a target image while retaining the gradient information of the source image; the method solves a Poisson equation according to the specified boundary conditions, realizes the continuity on the gradient domain, and thus achieves the seamless fusion at the boundary.

2. The method for data depth enhancement based on WGAN-GP data generation and Poisson fusion, according to claim 1, wherein a Relu activation function is used as formula (1):

the LeakyRelu activation function used is as follows:

the Tanh activation function used is as in formula (3):

in the formulas (1), (2) and (3), x is the input data of each layer of neural network, and f (x) is the output function of each layer of neural network;

setting a loss function and an optimization function of the discriminator; for a loss function, the first two terms are game loss of the discriminator, and the last term is introduced Lipschitz constraint gradient punishment; for game loss, the training for generating the countermeasure network is to train the discrimination network D to maximize discrimination accuracy, and train the generated network G to minimize log (1-D (G (z))), namely, the training generated network G and the discrimination network D are games about the minimum maximum of the function V (G, D), and the adopted game function is shown in formula (4):

the discriminator loss function used is as follows (5):

3. The method for data depth enhancement based on WGAN-GP data generation and Poisson fusion of claim 1, wherein for the optimization function, an Adam method is used to optimize the model parameters;

adam holds the past mean squared gradient v_tIn the exponential decay trend, the past gradient m with an exponential decay trend_tAnd a flat minimum preference in the error plane; then, the past attenuation mean and the past squared gradient m are calculated_tAnd v_tAs in equations (6), (7) respectively:

m_t＝β₁m_t-1+(1-β₁)g_t (6)

wherein m is_tAnd v_tAre estimates of the first and second order moments of the gradient, respectively; keeping the random gradient decline of the image matrix and single learning rate, and updating all the weights in the generated countermeasure network;

due to m_tAnd v_tVectors initialized to 0, which are biased toward 0, are calculated as shown in equations (8), (9):

these t are then used and the parameters are updated as shown in equation (10):

wherein,β₁default value is 0.9, beta₂Default value of (2) is 0.999, and the default value of epsilon is 10^-8。

4. The method for data depth enhancement based on WGAN-GP data generation and Poisson fusion of claim 1, wherein the Poisson fusion algorithm comprises the steps of:

firstly, preparing a source map source and a background map destination, wherein the original map is a road pit and groove defect generated by WGAN-GP data, and the background map is a non-defect road picture shot by a vehicle-mounted mobile phone;

secondly, a setting point P designates the position of the source graph on the background graph, wherein the point P is the position of the center point of the source graph;

then, calculating gradient fields of the source image and the background image to further obtain a gradient field of the fusion image; in the invention, the used fusion algorithm utilizes the graph gradient and the boundary condition to calculate the pixel value of the unknown region by Poisson equation, as shown in formula (11):

wherein,

is the boundary between the source region and the target region;

finally, determining the pixel value of the fused image by solving a Poisson equation;

step three: effect verification

The improvement effect of the data depth enhancement technology on the target detection performance is verified by adopting the Yolov5 target detection, and the performance is quantitatively evaluated through the F1-Score quantitative index;

wherein TP is the number of correctly detected diseases, FP is the number of non-diseases regarded as diseases, and FN is the number of diseases regarded as non-diseases;

F1-Score is the harmonic mean of precision and recall, the output result of comprehensive precision and recall, and is shown as formula (14):