CN111353449A

CN111353449A - Infrared road image water body detection method based on condition generation countermeasure network

Info

Publication number: CN111353449A
Application number: CN202010149314.3A
Authority: CN
Inventors: 王欢; 汪立
Original assignee: Nanjing University of Science and Technology
Current assignee: Nanjing University of Science and Technology
Priority date: 2020-03-03
Filing date: 2020-03-03
Publication date: 2020-06-30

Abstract

The invention discloses an infrared road image water body detection method based on a condition generation countermeasure network, which comprises the following steps: acquiring a road image by using an infrared camera, zooming the road image to a specified size, and marking to obtain a mask representing water body position information; constructing a conditional generation countermeasure network, wherein the network adopts a wasserstein GAN structure, takes a full convolution neural network as a generator, takes a convolution neural network as a discriminator, and simultaneously utilizes a preprocessing function in a reflex attention unit to preprocess input images of the generator and the discriminator in the network; training a network by using the infrared road image and the corresponding mask; and scaling the image to be detected to a corresponding size, inputting a trained condition to generate a countermeasure network, wherein the output of the generator is a binary image representing a water body detection result. The method for detecting the water body area on the road surface by using the infrared road image has high accuracy and recall rate and is suitable for related tasks in the field of unmanned driving.

Description

Infrared road image water body detection method based on condition generation countermeasure network

Technical Field

The invention relates to the technical field of pavement water body detection, in particular to an infrared road image water body detection method based on condition generation countermeasure network.

Background

In the field of unmanned driving, detection of a road surface water body area is a key and important task, and the road surface water body area often means that dangers such as water pits which are difficult to detect can be hidden under the road surface water body area, and if the road surface water body area cannot be detected correctly, great damage can be brought to an unmanned automobile. Compared with a visible light camera, in an infrared road picture acquired by an infrared camera, the difference between a water body area and the surrounding environment is larger, and detection is facilitated, so that in practical application, the infrared camera is often used for acquiring a road surface water body image, and the water body area in the infrared image is detected by using an image processing and computer vision method. The reflection characteristic of the water body on the road surface brings certain puzzlement to the water body detection task, compared with a visible light image, the reflection characteristic of the water body in an infrared image is not obvious, but still exists, and under the condition, the water body detection algorithm realized based on the traditional image processing means easily causes virtual detection or missing detection. Meanwhile, the number and the area of the water body in the image are random, and the shape of the water body is irregular, so that the water body detection method of the infrared road image is realized by using a method related to the image segmentation field. With the development of deep learning and artificial intelligence technology, the examples of image segmentation by using the deep learning method are numerous, and in an image segmentation task, compared with the traditional method, the deep learning method is higher in accuracy and recall rate and better in segmentation effect, so that the deep learning method is an important way for solving the problem of detection of the water body on the road surface.

The paper Wasserstein GAN proposes some suggestions to the traditional structure and training process for generating a countermeasure network, and proposes a Wasserstein GAN structure. In addition, the condition generation countermeasure network has been proved to be widely applied in the field of image segmentation and often achieves a good effect, so that it is an important idea to solve the problem of detecting the water body on the road surface by using the condition generation countermeasure network. In the paper "Single Image waterer circuit Detection using FCN with Reflection Attention Units" of ECCV 2018, an author proposes a network structure for a road surface water Detection problem, namely, a Reflection Attention Unit (RAU), the principle of which is based on that the connection line of a water surface Reflection and a real object is often close to vertical, so that a feature diagram generated in the Detection process can be horizontally cut, and compared in the vertical direction to judge whether the Reflection Attention unit has a Reflection relationship, and the Reflection Attention unit is reasonably used, so that the road surface water Detection problem can be contended, and the effect of a deep learning network can be improved.

However, the detection result of the water body detection method proposed in the paper tends to be smooth and not fine enough, so that much detail information is lost, and especially, missing detection is easily caused to a small water body area. Moreover, the method mainly aims at visible light images, and the effect on infrared images is not verified.

Disclosure of Invention

The invention aims to provide an infrared road image water body detection method based on a condition generation countermeasure network, which is characterized in that the countermeasure network is generated by constructing a condition following a Wasserstein GAN structure, a full convolution network is used as a generator, a convolution neural network is used as a discriminator, and a preprocessing function in a reflex attention unit is simultaneously utilized for preprocessing an input image of the generator and the discriminator.

The technical solution for realizing the purpose of the invention is as follows: an infrared road image water body detection method based on a condition generation countermeasure network comprises the following steps:

step 1, acquiring an infrared road image by using an infrared camera, cutting and zooming the road image to a specified size, and acquiring a mask containing road water position information in the acquired image by using a labeling method;

step 2, constructing a conditional generation countermeasure network, wherein the conditional generation countermeasure network integrally adopts a basic structure of wasserstein GAN; the generator is a full convolution neural network following a U-Net model structure; the discriminator is a convolutional neural network; preprocessing the input images of the generator and the discriminator in the network by utilizing a preprocessing function in the attention reflecting unit;

step 3, generating a confrontation network by using the acquired infrared road image and the mask marked by the infrared road image and training conditions;

and 4, zooming the single-channel infrared image to be detected to a specified size, inputting the trained condition to generate the countermeasure network, and obtaining a binary image which is output by a generator of the condition generation countermeasure network and used for representing a water body detection result.

Compared with the prior art, the invention has the following remarkable advantages: (1) the training process of the condition generation countermeasure network is more stable due to the combination of the wasserstein GAN mode, in addition, the effect of the detection of the pavement water body is less influenced by the reflection of the water surface in the mode of combining the condition generation countermeasure network with the preprocessing function in the reflection attention unit, and is less influenced by the problem of unbalanced distribution of positive and negative samples, so that the detection result has lower false detection rate and undetected rate, and higher accuracy and recall rate can be obtained; (2) compared with a water body detection method utilizing a classical full convolution neural network, the method combines a U-Net structure and conditions to generate an antagonistic network structure, the detection effect is high in refinement degree, and details are processed more in place.

Drawings

FIG. 1 is a flow chart of the preprocessing function of the reflex attention unit according to the present invention.

Fig. 2 is a block diagram of a conditional generation countermeasure network employed by the present invention.

FIG. 3 is a diagram showing the effects of the present invention.

Detailed Description

In the course of research and practice on existing methods and theories, the applicant has found that: if the Wasserstein GAN is combined with the structure of the condition generation countermeasure network, the condition generation countermeasure network is optimized, the preprocessing function in the reflection attention unit is used for preprocessing the infrared road image to be detected input by the generator and the discriminator in the network, and the water body area in the infrared road image can be effectively detected.

The invention provides an infrared road image water body detection method based on a condition generation countermeasure network, which comprises the following steps:

Further, in step 2, the conditionally generated countermeasure network follows the wasserstein GAN basic structure, and meets the following requirements:

1) the output layer of the arbiter network has no activation function;

2) the loss function of the discriminator network is the difference between the prediction result of the discriminator on the generated mask and the prediction result of the discriminator on the real mask;

3) the countermeasure loss term of the loss function of the generator network is the inverse number of the corresponding generated mask prediction result of the discriminator;

4) cutting off all training parameter values of the discriminator after each optimization of the discriminator to ensure that all the training parameter values are in a specified interval;

5) the optimizer adopts a random gradient descent optimizer.

Further, in the step 2, the condition generation countermeasure network has the following structure:

the network comprises a generator network and a discriminator network;

the generator network is a full convolution neural network following a U-Net model structure, the input of the network is a real image to be detected or trained with a fixed size, the real image is input and then is preprocessed by a preprocessing function in a reflex attention unit, and then the real image is input into a first layer of convolution layer; the output of the generator is a characteristic diagram representing the water body detection result of the input infrared road image, namely a mask is generated, wherein the larger the pixel value is, the higher the probability that the pixel at the same position in the original image corresponding to the pixel belongs to the road surface water body area is;

the discriminator network is a convolutional neural network, a mask which is a real image and a corresponding real image are input, or a generated mask which is output by the generator and a corresponding real image are output, the input real image is preprocessed through a preprocessing function in the reflection attention unit, then is connected with the input real mask or the generated mask in a channel dimension, and then is input into a first layer of convolutional layer processing; the last layer of the discriminator is a fully connected layer, where the output is a single value representing the probability that the input mask is the true mask corresponding to the true image.

Further, in step 2, the preprocessing function of the reflex attention unit specifically includes:

as shown in FIG. 1, let h, w and c be the height, width and channel number of the input feature map of the preprocessing function of the reflex attention unit, the preprocessing function firstly reduces the height of the input feature map to n, reduces the width to w/2 by mean pooling, keeps the channel number unchanged, records the feature map as X, splits out each row of X, and expands all the n split rows into a new feature map with height h, width w and channel number c by upsampling, then connects the n new feature maps in the channel dimension to obtain a new feature map with channel number n × c, records as X ', then expands the input feature map I itself by n times in the channel dimension, i.e. corresponding to the connection of n input feature maps I in the channel dimension, obtains a new feature map with channel number n × c, records as I, obtains the difference between X ' and I ' I, i.e. corresponding position elements, obtains a new feature map D and D, and finally obtains a new feature map with channel number D, and D as the output feature map (D, D) in the channel number n + 6332 c, and records as I × c, and D as the output feature map.

Further, n is 8 or 16.

Further, in step 3, the step of generating the countermeasure network by the training condition includes:

a) setting network parameters, randomly initializing parameters to be trained, and inputting real images for training and corresponding real masks one by one, wherein each iteration is as in steps b) -e);

b) inputting a real image into a generator to obtain a generation mask;

c) inputting the real image and the real mask into a discriminator to obtain the output result y of the discriminator_t(ii) a Simultaneously inputting the real image and the generated mask into a discriminator to obtain an output result y of the discriminator_f；

d) By generating masks, real masks, output y of the discriminator_fCalculating the loss of the generator according to the loss function of the generator, and outputting the result y through the discriminator_tAnd y_fCalculating the loss of the discriminator according to the loss function of the discriminator;

e) optimizing network parameters according to the loss of the generator and the discriminator and the network structure;

f) and after the data to be used for training is used, the training is finished, and the network parameters are stored.

Further, in the step 4, generating a countermeasure network by using the trained conditions, and obtaining a binary image representing the water detection result includes:

a) scaling the image to be detected to the size of the adaptive generator, and inputting the scaled image into the trained condition to generate a confrontation network;

b) obtaining a generated mask generated by a generator, and binarizing the generated mask by using a threshold value, wherein the threshold value is an average value of a pixel value used for representing a road surface water body area and a pixel value used for representing a non-road surface water body area in an input real mask, namely adding two possible values and dividing the sum by 2; and the mask after binarization is the detection result of the road surface water body corresponding to the input image.

The technical solution of the present invention will be described in detail below with reference to the embodiments and the accompanying drawings.

Examples

An infrared road image water body detection method based on a condition generation countermeasure network comprises the following steps:

the method comprises the following steps of 1, acquiring an infrared road image by using an infrared camera, zooming the infrared road image to a specified size, and obtaining a mask containing road surface water body position information in the acquired image by using a labeling method, wherein the specified size is 640 × 360, then determining pixels representing a water body area by using a manual labeling method, and generating a binary image representing the water body position, namely the mask, wherein the mask size is also 640 × 360, an area with a pixel value of 0 represents a corresponding non-road surface water body area in the original image, an area with a pixel value of 255 represents a corresponding road surface water body area in the original image, and each acquired image has a corresponding real mask.

Step 2: a conditional generation countermeasure network following the basic structure of wasserstein GAN is constructed, the structure diagram of the conditional generation countermeasure network is shown in fig. 2, and the network is composed of two parts, which are respectively:

a) a generator network, the structure of which comprises:

inputting an original picture to be detected, wherein the width of the original picture is 640 pixels, the height of the original picture is 360 pixels, and the number of channels is 3;

the preprocessing layer is used for processing the input original picture by using a preprocessing function in the attention reflecting unit;

convolutional layers 1, 64 convolutional kernels, with a convolutional kernel size of 5 × 5, step size of 2 × 2;

convolution layer 2, 128 convolution kernels, convolution kernel size 5 × 5, step length 2 × 2, output after batch regularization processing, by the linear rectification function activation processing of the leakage with gradient 0.2;

convolutional layers 3, 256 convolutional kernels, the size of the convolutional kernels is 5 × 5, the step length is 2 × 2, and the output is subjected to batch regularization processing;

the convolution layer comprises 4 convolution kernels and 512 convolution kernels, the size of each convolution kernel is 5 × 5, the step length is 2 × 2, and after the output is subjected to batch regularization processing, the output is activated and processed by a linear rectification function with leakage, the gradient of which is 0.2;

the convolution layer comprises 5 convolution kernels and 512 convolution kernels, the size of each convolution kernel is 5 × 5, the step length is 2 × 2, and after the output is subjected to batch regularization processing, the output is activated and processed by a linear rectification function with leakage, the gradient of which is 0.2;

the convolution layer comprises 6 convolution kernels and 512 convolution kernels, the size of each convolution kernel is 5 × 5, the step length is 2 × 2, and after the output is subjected to batch regularization processing, the output is activated and processed by a linear rectification function with leakage, the gradient of which is 0.2;

the convolution layer comprises 7 convolution kernels and 512 convolution kernels, the size of each convolution kernel is 5 × 5, the step length is 2 × 2, and after the output is subjected to batch regularization processing, the output is activated and processed by a linear rectification function with leakage, the gradient of which is 0.2;

convolution layer 8, 512 convolution kernels, convolution kernel size 5 × 5, step length 2 × 2, output after batch regularization processing, by linear rectification function (ReLU) activation processing;

the deconvolution layer comprises 1 convolution layer and 512 convolution kernels, wherein the size of the convolution kernels is 5 × 5, the step length is 2 × 2, the length and the width of a deconvolution operation output characteristic diagram are consistent with the output result of the convolution layer 7, the deconvolution result is output and is subjected to batch regularization processing, then is inactivated randomly with the probability of 0.5, and is connected with the output result of the convolution layer 7 which is not activated in the channel dimension, and is activated by a linear rectification function;

the deconvolution layer has 2 and 512 convolution kernels, the size of the convolution kernels is 5 × 5, the step length is 2 × 2, the length and the width of a deconvolution operation output characteristic diagram are consistent with the output result of the convolution layer 6, the output of the deconvolution result is subjected to batch regularization processing, is randomly inactivated with the probability of 0.5, is connected with the output result of the convolution layer 6 which is not activated in the channel dimension, and is activated by a linear rectification function;

the deconvolution layer has 3 and 512 convolution kernels, the size of the convolution kernels is 5 × 5, the step length is 2 × 2, the length and the width of a deconvolution operation output characteristic diagram are consistent with the output result of the convolution layer 5, the output of the deconvolution result is subjected to batch regularization processing, is randomly inactivated with the probability of 0.5, is connected with the output result of the convolution layer 5 which is not activated in the channel dimension, and is activated by a linear rectification function;

the deconvolution layer comprises 4 convolutional layers and 512 convolutional kernels, the size of the convolutional kernels is 5 × 5, the step length is 2 × 2, the length and the width of a deconvolution operation output characteristic diagram are consistent with the output result of the convolutional layers 4, the output result of the deconvolution is subjected to batch regularization processing, then is connected with the output result of the convolutional layers 4 which is not activated in channel dimension, and is activated by a linear rectification function;

5, 256 convolution kernels are added to the deconvolution layer, the size of each convolution kernel is 5 × 5, the step length is 2 × 2, the length and the width of a deconvolution operation output characteristic diagram are consistent with the result output by the attention reflecting unit 2, the result output by the deconvolution is subjected to batch regularization processing, then is connected with the output result which is not activated by the attention reflecting unit 2 in the channel dimension, and is activated by a linear rectification function;

6 deconvolution layers and 128 convolution kernels, wherein the size of the convolution kernels is 5 × 5, the step length is 2 × 2, the length and the width of a deconvolution operation output characteristic diagram are consistent with the output result of the convolution layer 2, the output result of the deconvolution is subjected to batch regularization processing, then is connected with the output result of the convolution layer 2 which is not activated in channel dimension, and is subjected to activation processing by a linear rectification function;

the deconvolution layer comprises 7 convolution kernels and 64 convolution kernels, the size of each convolution kernel is 5 × 5, the step length is 2 × 2, the length and the width of a deconvolution operation output characteristic diagram are consistent with the result output by the attention reflecting unit 1, the result output by the deconvolution operation is subjected to batch regularization processing, then is connected with the output result which is not activated by the attention reflecting unit 1 in the channel dimension, and is activated by a linear rectification function;

and (4) deconvolution layer 8, 1 convolution kernel, wherein the size of the convolution kernel is 5 × 5, the step size is 2 × 2, the length and the width of an output feature map of the deconvolution operation are consistent with those of an input image of the generator, and the deconvolution result is activated by a hyperbolic tangent function (tanh) and then is output as the generator.

b) The discriminator network, its structure includes:

inputting an original picture and a mask picture, wherein the width of the original picture is 640 pixels, the height of the original picture is 360 pixels, the number of channels of the original picture is 3, and the number of channels of the mask picture is 1;

the preprocessing layer is used for processing an input original picture by using a preprocessing function in the attention reflecting unit and connecting the processed picture with an input mask picture in a channel dimension;

convolution layer 1, 64 convolution kernels, convolution kernel size 5 × 5, step length 2 × 2, output through the linear rectification function activation processing with leakage of gradient 0.2;

convolution layer 3, 256 convolution kernels, convolution kernel size 5 × 5, step length 2 × 2, output after batch regularization processing, by the linear rectification function activation processing of the leakage with gradient 0.2;

and the output of the all-connection layer is a single value, the higher the value is, the higher the probability that the input mask is a real mask representing the position of the pavement water area in the input image is, and the layer has no activation function.

The input of the preprocessing function in the attention reflecting unit is an infrared road image to be detected, which has been adjusted in size and stored as a three-channel image, and is marked as I, and the processing flow of the preprocessing function is shown in fig. 1, and the preprocessing function specifically operates as follows:

let h, w, and c be the height, width, and channel number of the input feature map of the reflex attention unit, respectively, in this embodiment, h is 360, w is 64, and c is 3, the preprocessing function first reduces the height of the input feature map to 16 by mean pooling, reduces the width to w/2, i.e., 320, and does not change the channel number, note that the feature map at this time is X, then splits out each row of X, and expands all 16 split out rows by upsampling to h, i.e., 360, w, i.e., 640, and channel number to c, i.e., 3, then connects these 16 new feature maps in the channel dimension to obtain a new feature map with channel number of 16 × c, i.e., 48, note that X, then the input feature map I itself expands to 16 times in the channel dimension, i.e., corresponds to 16 input feature maps I, I is connected in the channel dimension, to obtain 16I, i.e., 48, i.e., I, n.

And step 3: and (4) generating a confrontation network by using the acquired infrared road image and the mask marked by the infrared road image and training conditions. Firstly, mapping the pixel value of the binary mask, wherein the 0 value is mapped to-1, and the 255 value is mapped to 1. Randomly initializing parameters needing training in the network, wherein in the training process, each time one picture for training and a corresponding real mask are input into the generator, generating a generated mask, and simultaneously inputting the real picture and the real mask into the discriminator to generate a discrimination result y_tAnd inputting the real picture and the generated mask into a discriminator to generate a discrimination result y_f. For the arbiter, the penalty function can be expressed as y_f-y_tFor the generator, the loss function may be expressed as-200 × y_f+L_dataWherein L is_dataThe calculation method of the data loss item of the generator comprises the steps of subtracting a generated mask and a real mask and calculating the absolute value of the result, dividing the absolute value by the total number of pixels of the generated mask to obtain the average pixel distance, namely the data loss item L of the generator_data. In each iteration of the training process, the discriminant is optimized once, and then the training parameters after the discriminant is updated are in the range of [ -0.5,0.5 [)]The interval of (2) is truncated, namely the parameter assignment larger than 0.5 is 0.5, the parameter assignment smaller than-0.5 is-0.5, the generator is optimized twice, the optimizer uses a random gradient descent optimizer, the optimization goal is to minimize the corresponding loss function value, and the learning rate is set to be 0.0002. After the picture and the real mask used for training are circularly used for 300 times, the training process is ended, and the model parameters are saved.

Step 4, zooming the infrared road image to be detected to a specified size, namely 640 × 360, inputting a trained condition to generate a generator of the countermeasure network, wherein no training process exists in the network, and parameters are not changeable, obtaining an image which is output by the generator of the condition generation countermeasure network and used for representing a water body detection result, and dividing the image by taking 0 as a threshold value, namely, setting pixels with values larger than 0 as 255 and pixels with values smaller than or equal to 0 as 0 to obtain a binary image representing the output result, wherein the pixels with values of 255 represent that a corresponding area in the original image is a road surface water body, and the pixels with values of 0 represent that a corresponding area in the original image is a non-road surface water body, and the original image, a real mask corresponding to the original image and a prediction result are shown in figure 3.

Claims

1. An infrared road image water body detection method based on a condition generation countermeasure network is characterized by comprising the following steps:

2. The infrared road image water body detection method based on the condition-generated countermeasure network of claim 1, wherein in the step 2, the condition-generated countermeasure network follows a wasserstein GAN basic structure, and meets the following requirements:

1) the output layer of the arbiter network has no activation function;

5) the optimizer adopts a random gradient descent optimizer.

3. The infrared road image water body detection method based on the condition generating countermeasure network of claim 1, wherein in the step 2, the condition generating countermeasure network has the following structure:

the network comprises a generator network and a discriminator network;

4. The infrared road image water body detection method based on the condition-generated countermeasure network of claim 1, wherein in the step 2, the preprocessing function of the reflex attention unit specifically comprises the following operation steps:

the method comprises the steps of firstly reducing the height of an input feature map of a preprocessing function of a reflection attention unit to n, reducing the width of the input feature map to w/2 and keeping the number of channels unchanged by the aid of h, w and c respectively through mean pooling, recording the feature map as X, then splitting each line of X, expanding all n split lines to a new feature map with the height of h, the width of w and the number of channels of c through upsampling, then connecting the n new feature maps in channel dimension to obtain a new feature map with the number of channels of n × c, recording the new feature map as X ', then expanding the input feature map I by the number of original n times in channel dimension, namely equivalent to connecting the n input feature maps I in the channel dimension to obtain a new feature map with the number of channels of n × c, recording the new feature map as I ', obtaining the difference between X and I ', namely corresponding to position elements D, subtracting the new feature map D from the input feature map I in the channel dimension to obtain a new feature map as I ', and obtaining a new feature map as I, and connecting the channel number of n, and recording the feature map as I ', and obtaining a high feature map as an output function D, wherein the input feature map is × D and the output of the channel.

5. The infrared road image water body detection method based on the condition-generated countermeasure network of claim 4, wherein n is 8 or 16.

6. The infrared road image water body detection method based on condition generation countermeasure network of claim 1, wherein in the step 3, the step of training the condition generation countermeasure network is as follows:

b) inputting a real image into a generator to obtain a generation mask;

7. The infrared road image water body detection method based on the condition-generated countermeasure network of claim 1, wherein in the step 4, the countermeasure network is generated by using the trained conditions, and the step of obtaining the binary image representing the water body detection result includes: