CN111008979A

CN111008979A - Robust night image semantic segmentation method

Info

Publication number: CN111008979A
Application number: CN201911250296.1A
Authority: CN
Inventors: 孙磊; 杨恺伦; 李华兵; 汪凯巍
Original assignee: Hangzhou Lingxiang Technology Co ltd
Current assignee: Hangzhou Lingxiang Technology Co ltd
Priority date: 2019-12-09
Filing date: 2019-12-09
Publication date: 2020-04-14

Abstract

The invention discloses a robustness enhancing method for night semantic segmentation. The method comprises the steps of training a confrontation generation network, converting partial images in a street view data set under normal illumination conditions containing semantic segmentation labels into artificial night street view images, and training a semantic segmentation network model by using the obtained street view data set containing partial night images, wherein the model has strong robustness for night image semantic segmentation prediction. The method has the advantages of high real-time performance, low price and no need of marking a large number of night data sets.

Description

Robust night image semantic segmentation method

Technical Field

The invention belongs to the technical fields of pattern recognition technology, image processing technology, computer vision technology and deep learning, and relates to a robust night image semantic segmentation method.

Background

The automatic driving is in an important position in the intelligent transportation industry, so that image semantic segmentation is gradually becoming a research hotspot in the field of computer vision, and the semantic segmentation can realize the classification and marking of pixel levels of traffic scenes. Due to the strong feature characterization capability of the deep convolutional neural network, the semantic segmentation method based on the deep convolutional neural network is greatly improved.

At present, most semantic segmentation data sets aiming at road scenes are acquired in clear weather, semantic segmentation models obtained by training the data sets perform well under normal illumination conditions, but when the method is used for road scene images at night, due to poor illumination conditions and much stray light, the difference between the extracted features of the images and the extracted features under the normal illumination conditions is large, the precision of the methods is greatly reduced, and the requirements of automatic driving cannot be met. To solve this problem, we need to improve the robustness of semantic segmentation at night

Disclosure of Invention

The invention aims to: in order to solve the problem that the existing semantic segmentation technology is low in night image accuracy, the invention provides a robust night image semantic segmentation method based on generation of a countermeasure network.

The purpose of the invention is realized by the following technical scheme: converting a part of daytime images in a data set containing semantic segmentation labels into artificial night images through a confrontation generation network model, and generating a data set with a certain proportion of artificial night images; training a semantic segmentation neural network by using the data; and obtaining a more accurate object class prediction result by the semantic segmentation neural network model obtained by training the actually acquired night image. Specifically, the method comprises the following steps:

step 1: acquiring a data set used for training and generating a confrontation network model, wherein the data set comprises equal number of night road scene images and day road scene images;

step 2: constructing and generating a confrontation network model, wherein the model comprises a pair of generators and a discriminator;

and step 3: inputting the data set obtained in the step 1 into a generation countermeasure network for training to obtain two generators for converting night images into day images and converting the day images into night images;

and 4, step 4: acquiring a data set containing semantic segmentation labels for training a semantic segmentation network model;

and 5: a generator for converting the daytime image obtained in the step 3 into a night image, and converting the part of the daytime image in the data set containing the semantic segmentation labels into an artificial night image to obtain a data set containing the artificial night image;

step 6: inputting the data set containing the artificial night image obtained in the step 5 into a semantic segmentation network model for training to obtain a robust night image semantic segmentation model;

and 7: and (4) inputting the actually acquired night image into the semantic segmentation model obtained in the step (6) to realize the robust night image semantic segmentation.

Further, the semantic segmentation network model is an ERF-PSPNet semantic segmentation network model, the model is composed of an encoder and a decoder, wherein the encoder is a residual decomposition convolution network and comprises a decomposition convolution layer Non-cottleneck-1D, the operation amount is reduced, the precision is kept, the decoder is a space pyramid pooling network, and each layer of the network is shown in the following table:

layer(s)	Module	Number of output channels	Output resolution
				1	Down sampling module	3	320×240
2	Down sampling module	16	160×120
				3-7	5×Non-bt-1D	64	80×60
8	Down sampling module	128	40×30
				9	Non-bt-1D(dilated 2)	128	40×30
10	Non-bt-1D(dilated 4)	128	40×30
				11	Non-bt-1D(dilated 8)	128	40×30
12	Non-bt-1D(dilated 16)	128	40×30
				13	Non-bt-1D(dilated 2)	128	40×30
14	Non-bt-1D(dilated 4)	128	40×30
				15	Non-bt-1D(dilated 8)	128	40×30
16	Non-bt-1D(dilated 16)	128	40×30
				17	Non-bt-1D(dilated 2)	128	40×30
18a	17 layer signature	128	40×30
				18b	Pooling, convolution	32	40×30
18c	Pooling, convolution	32	20×15
				18d	Pooling, convolution	32	10×8
18e	Pooling, convolution	32	5×4
				19	Convolution with a bit line	Number of categories	40×30
20	Upsampling	Number of categories	640×480

The ERF-PSPNet may classify the RGB input image pixel by pixel, producing a corresponding label map.

Further, the data set used to train the generation of the antagonistic network model is an autopilot data set, including cityscaps, bdd.

Further, the generative confrontation network model is CycleGAN.

Further, the CycleGAN training process is as follows:

the night road scene image and the day road scene image are respectively input into two generators of the cycleGAN for training, wherein 200 epochs are used, the learning rate is set to be 0.0002, and the random cutting size is set to be 256 multiplied by 256.

Further, in the step 5, the proportion of the artificial night image in the data set including the artificial night image is 30%.

Further, in step 6, the loss function adopted by the semantic segmentation model is focal loss, and the formula is as follows:

loss(p)＝-(1-p)^γlog p

where p is a probability of determining that the pixel is of a certain class, γ is a modulation factor, and γ is set to 2 in the present invention.

Compared with other methods for enhancing the semantic segmentation robustness, the method has the advantages that:

a large number of label data sets are not needed, and a large number of manpower and material resources can be saved. An artificial night data set is generated only by generating a confrontation network, and a semantic segmentation network is input for training by mixing an artificial night image and a daytime image, so that the robustness of the confrontation network is improved;

the real-time performance is high. As the trained model does not need to be additionally operated and processed in the reasoning stage, and the extra operation amount is not increased, the semantic segmentation model keeps the original real-time performance and supports the high-real-time night road information prediction.

The price is low. Because the methods are all based on the algorithm level, other sensors such as an infrared camera or a radar and the like do not need to be additionally used, and compared with other night environment sensing methods, the method does not need additional hardware cost.

The prediction accuracy is high. The information prediction accuracy of the semantic segmentation network model obtained by training in the method for the street view image at night is higher than that of other similar methods, and the semantic segmentation network model can run in real time.

Drawings

FIG. 1 is a flow chart of the present invention;

FIG. 2 is a diagram of a Non-bottompiece-1D module;

FIG. 3 is a diagram of a semantic segmentation network ERF-PSPNet model;

FIG. 4 is a diagram of a ResnetBlock model in a generation countermeasure network;

FIG. 5 is a diagram of a night image actually acquired;

FIG. 6 is a graph of semantic segmentation network prediction without the proposed method;

FIG. 7 is a graph of a semantic segmentation prediction using the proposed method;

FIG. 8 is a semantic segmentation truth label diagram;

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to specific embodiments and accompanying drawings.

The method relates to a method for enhancing robustness of night image semantic segmentation, which has the core that a countermeasure generating network is utilized to preprocess a data set for semantic segmentation training, a scheme framework is shown in figure 1, and the specific implementation steps are as follows:

step 1: acquiring a data set for training to generate an antagonistic network, wherein the data set must contain a certain number of night images, and the data set can adopt an automatic driving data set, such as cityscaps, bdd and the like, and selecting equivalent pictures of a night road scene image and a day road scene image respectively to form a data set for training to generate the antagonistic network;

step 2: constructing a non-pairing generation confrontation network model, wherein the model comprises a pair of generators and a discriminator;

and step 3: inputting the data set obtained in the step 1 into a generation countermeasure network for training, and obtaining two generators for converting night images into day images and converting the day images into night images; in this embodiment, the adopted antagonistic network generation model is CycleGAN, and specifically, the model structure of the generator is as follows:

layer(s)	Module	Number of output channels
			1	7 x 7 convolutional layer	64
2	ReLU laserLive function	64
			3	3 x 3 convolutional layer	128
4	BatchNorm layer	128
			5	ReLU activation function	128
6	3 x 3 convolutional layer	256
			7	BatchNorm layer	256
8	ReLU activation function	256
			9～17	9×ResnetBlock	256
18	3 x 3 deconvolution layer	128
			19	BatchNorm layer	128
20	ReLU activation function	128
			21	3 x 3 deconvolution layer	64
22	BatchNorm layer	64
			23	ReLU activation function	64
24	7 x 7 convolutional layer	3
			25	Tanh activation function	3

Wherein the ResnetBlock structure is shown in figure 4.

During the cycleGAN training, night road scene images and day road scene images are respectively input into two generators of the cycleGAN for training, wherein 200 epochs are used, the learning rate is set to be 0.0002, and the random cutting size is set to be 256 multiplied by 256. Finally, two generators for converting night images into day images and converting the day images into night images are obtained;

and 5: by using the generator for converting the daytime image into the night image in the trained generated countermeasure network obtained in the step 3, converting part of the daytime image in the data set provided for the semantic segmentation network model into the artificial night image to obtain the data set containing the artificial night image, and tests prove that the semantic segmentation result is closest to the true value when the proportion of the artificial night image in the data set containing the artificial night image is 30%, and the adopted proportion is 30% in the embodiment;

step 6: the semantic segmentation network model, which may be SegNet (refer to the paper Badrinarayana, V., Kendall, A., and circular, R., "Segnet: Adeep connected audio-decoder-decoding for image segmentation," IEEE Transactions on paper analysis and machine interaction 39(12), "2481-file 2495(2017)), ERFNet (refer to the paper Romera, E., Alvarez, J.M., Bergsa, L.M., and Arroy, R.," EdgefF: electronic efficiency for comparison-network-update, R., "IEEE-file system, S.," update-system for comparison-system ", and S.," update-file system for update-update, real-time semantic segmentation networks such as Kreso I, Bevandic P, equivalent in feedback of Pre-routed ImageNet architecture for Real-time semantic segmentation of Road-driving Images [ C ]// Proceedings of the IEEE Conference on computer Vision and Pattern registration.2019: 12607 + 12616.); in this embodiment, an ERF-PSPNet is adopted, the model is composed of an encoder and a decoder, as shown in fig. 3, where the encoder is a residual decomposition convolutional network including a decomposition convolutional layer Non-cottleneck-1D, the decoder is a spatial pyramid pooling network, and each layer of the ERF-PSPNet semantic segmentation network model is shown in the following table:

The loss function used is focal loss, which is formulated as follows:

loss(p)＝-(1-p)^γlog p

where p is a probability of determining that the pixel is of a certain class, γ is a modulation factor, and γ is set to 2 in this embodiment.

And 7: and (3) inputting the actually acquired night image into the semantic segmentation model obtained by training in the step (6) for classification prediction, wherein the ERF-PSPNet can classify the RGB input image pixel by pixel to generate a corresponding label map, and a classification prediction result is obtained.

Fig. 5 is a night image actually acquired, fig. 8 is a classification true value of the night image, fig. 6 is a classification prediction result of a model which does not use the method of the present invention for the semantic segmentation image of the night image, and fig. 7 is a classification prediction of the model which uses the method of the present invention for the semantic segmentation image of the image.

Claims

1. A robust night image semantic segmentation method is characterized by comprising the following steps: converting a part of daytime images in the data set containing the semantic segmentation labels into artificial night images through a confrontation generation network model, generating a data set containing the artificial night images and using the data set to train a semantic segmentation neural network model; and inputting the actually acquired night image into the trained semantic segmentation neural network model to obtain a night image semantic segmentation prediction result. Specifically, the method comprises the following steps:

2. The method of claim 1, wherein the semantic segmentation network model is an ERF-PSPNet, and the model is composed of an encoder and a decoder, wherein the encoder is a residual solution convolutional network and includes a solution convolutional layer Non-convolutional-1D, and the decoder is a spatial pyramid pooling network, and each layer of the ERF-PSPNet semantic segmentation network model is as follows:

3. the method of claim 1, wherein the data set used to train generation of the antagonistic network model is an autopilot data set, including cityscaps, bdd, and the like.

4. The method of claim 1, wherein the generative confrontation network model is CycleGAN.

5. The method of claim 4, wherein the CycleGAN training process is as follows:

6. The method as claimed in claim 1, wherein in the step 5, the data set containing the artificial night image has a proportion of the artificial night image of 30%.

7. The method according to claim 1, wherein in step 6, the loss function adopted by the semantic segmentation model is focal loss, and the formula is as follows:

loss(p)＝-(1-p)^γlog p

where p is the probability of determining that the pixel is of a certain class, and γ is the modulation factor.