CN111008979A - Robust night image semantic segmentation method - Google Patents

Robust night image semantic segmentation method Download PDF

Info

Publication number
CN111008979A
CN111008979A CN201911250296.1A CN201911250296A CN111008979A CN 111008979 A CN111008979 A CN 111008979A CN 201911250296 A CN201911250296 A CN 201911250296A CN 111008979 A CN111008979 A CN 111008979A
Authority
CN
China
Prior art keywords
semantic segmentation
night
data set
image
images
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201911250296.1A
Other languages
Chinese (zh)
Inventor
孙磊
杨恺伦
李华兵
汪凯巍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Lingxiang Technology Co ltd
Original Assignee
Hangzhou Lingxiang Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Lingxiang Technology Co ltd filed Critical Hangzhou Lingxiang Technology Co ltd
Priority to CN201911250296.1A priority Critical patent/CN111008979A/en
Publication of CN111008979A publication Critical patent/CN111008979A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image

Abstract

The invention discloses a robustness enhancing method for night semantic segmentation. The method comprises the steps of training a confrontation generation network, converting partial images in a street view data set under normal illumination conditions containing semantic segmentation labels into artificial night street view images, and training a semantic segmentation network model by using the obtained street view data set containing partial night images, wherein the model has strong robustness for night image semantic segmentation prediction. The method has the advantages of high real-time performance, low price and no need of marking a large number of night data sets.

Description

Robust night image semantic segmentation method
Technical Field
The invention belongs to the technical fields of pattern recognition technology, image processing technology, computer vision technology and deep learning, and relates to a robust night image semantic segmentation method.
Background
The automatic driving is in an important position in the intelligent transportation industry, so that image semantic segmentation is gradually becoming a research hotspot in the field of computer vision, and the semantic segmentation can realize the classification and marking of pixel levels of traffic scenes. Due to the strong feature characterization capability of the deep convolutional neural network, the semantic segmentation method based on the deep convolutional neural network is greatly improved.
At present, most semantic segmentation data sets aiming at road scenes are acquired in clear weather, semantic segmentation models obtained by training the data sets perform well under normal illumination conditions, but when the method is used for road scene images at night, due to poor illumination conditions and much stray light, the difference between the extracted features of the images and the extracted features under the normal illumination conditions is large, the precision of the methods is greatly reduced, and the requirements of automatic driving cannot be met. To solve this problem, we need to improve the robustness of semantic segmentation at night
Disclosure of Invention
The invention aims to: in order to solve the problem that the existing semantic segmentation technology is low in night image accuracy, the invention provides a robust night image semantic segmentation method based on generation of a countermeasure network.
The purpose of the invention is realized by the following technical scheme: converting a part of daytime images in a data set containing semantic segmentation labels into artificial night images through a confrontation generation network model, and generating a data set with a certain proportion of artificial night images; training a semantic segmentation neural network by using the data; and obtaining a more accurate object class prediction result by the semantic segmentation neural network model obtained by training the actually acquired night image. Specifically, the method comprises the following steps:
step 1: acquiring a data set used for training and generating a confrontation network model, wherein the data set comprises equal number of night road scene images and day road scene images;
step 2: constructing and generating a confrontation network model, wherein the model comprises a pair of generators and a discriminator;
and step 3: inputting the data set obtained in the step 1 into a generation countermeasure network for training to obtain two generators for converting night images into day images and converting the day images into night images;
and 4, step 4: acquiring a data set containing semantic segmentation labels for training a semantic segmentation network model;
and 5: a generator for converting the daytime image obtained in the step 3 into a night image, and converting the part of the daytime image in the data set containing the semantic segmentation labels into an artificial night image to obtain a data set containing the artificial night image;
step 6: inputting the data set containing the artificial night image obtained in the step 5 into a semantic segmentation network model for training to obtain a robust night image semantic segmentation model;
and 7: and (4) inputting the actually acquired night image into the semantic segmentation model obtained in the step (6) to realize the robust night image semantic segmentation.
Further, the semantic segmentation network model is an ERF-PSPNet semantic segmentation network model, the model is composed of an encoder and a decoder, wherein the encoder is a residual decomposition convolution network and comprises a decomposition convolution layer Non-cottleneck-1D, the operation amount is reduced, the precision is kept, the decoder is a space pyramid pooling network, and each layer of the network is shown in the following table:
layer(s) Module Number of output channels Output resolution
1 Down sampling module 3 320×240
2 Down sampling module 16 160×120
3-7 5×Non-bt-1D 64 80×60
8 Down sampling module 128 40×30
9 Non-bt-1D(dilated 2) 128 40×30
10 Non-bt-1D(dilated 4) 128 40×30
11 Non-bt-1D(dilated 8) 128 40×30
12 Non-bt-1D(dilated 16) 128 40×30
13 Non-bt-1D(dilated 2) 128 40×30
14 Non-bt-1D(dilated 4) 128 40×30
15 Non-bt-1D(dilated 8) 128 40×30
16 Non-bt-1D(dilated 16) 128 40×30
17 Non-bt-1D(dilated 2) 128 40×30
18a 17 layer signature 128 40×30
18b Pooling, convolution 32 40×30
18c Pooling, convolution 32 20×15
18d Pooling, convolution 32 10×8
18e Pooling, convolution 32 5×4
19 Convolution with a bit line Number of categories 40×30
20 Upsampling Number of categories 640×480
The ERF-PSPNet may classify the RGB input image pixel by pixel, producing a corresponding label map.
Further, the data set used to train the generation of the antagonistic network model is an autopilot data set, including cityscaps, bdd.
Further, the generative confrontation network model is CycleGAN.
Further, the CycleGAN training process is as follows:
the night road scene image and the day road scene image are respectively input into two generators of the cycleGAN for training, wherein 200 epochs are used, the learning rate is set to be 0.0002, and the random cutting size is set to be 256 multiplied by 256.
Further, in the step 5, the proportion of the artificial night image in the data set including the artificial night image is 30%.
Further, in step 6, the loss function adopted by the semantic segmentation model is focal loss, and the formula is as follows:
loss(p)=-(1-p)γlog p
where p is a probability of determining that the pixel is of a certain class, γ is a modulation factor, and γ is set to 2 in the present invention.
Compared with other methods for enhancing the semantic segmentation robustness, the method has the advantages that:
a large number of label data sets are not needed, and a large number of manpower and material resources can be saved. An artificial night data set is generated only by generating a confrontation network, and a semantic segmentation network is input for training by mixing an artificial night image and a daytime image, so that the robustness of the confrontation network is improved;
the real-time performance is high. As the trained model does not need to be additionally operated and processed in the reasoning stage, and the extra operation amount is not increased, the semantic segmentation model keeps the original real-time performance and supports the high-real-time night road information prediction.
The price is low. Because the methods are all based on the algorithm level, other sensors such as an infrared camera or a radar and the like do not need to be additionally used, and compared with other night environment sensing methods, the method does not need additional hardware cost.
The prediction accuracy is high. The information prediction accuracy of the semantic segmentation network model obtained by training in the method for the street view image at night is higher than that of other similar methods, and the semantic segmentation network model can run in real time.
Drawings
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a diagram of a Non-bottompiece-1D module;
FIG. 3 is a diagram of a semantic segmentation network ERF-PSPNet model;
FIG. 4 is a diagram of a ResnetBlock model in a generation countermeasure network;
FIG. 5 is a diagram of a night image actually acquired;
FIG. 6 is a graph of semantic segmentation network prediction without the proposed method;
FIG. 7 is a graph of a semantic segmentation prediction using the proposed method;
FIG. 8 is a semantic segmentation truth label diagram;
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to specific embodiments and accompanying drawings.
The method relates to a method for enhancing robustness of night image semantic segmentation, which has the core that a countermeasure generating network is utilized to preprocess a data set for semantic segmentation training, a scheme framework is shown in figure 1, and the specific implementation steps are as follows:
step 1: acquiring a data set for training to generate an antagonistic network, wherein the data set must contain a certain number of night images, and the data set can adopt an automatic driving data set, such as cityscaps, bdd and the like, and selecting equivalent pictures of a night road scene image and a day road scene image respectively to form a data set for training to generate the antagonistic network;
step 2: constructing a non-pairing generation confrontation network model, wherein the model comprises a pair of generators and a discriminator;
and step 3: inputting the data set obtained in the step 1 into a generation countermeasure network for training, and obtaining two generators for converting night images into day images and converting the day images into night images; in this embodiment, the adopted antagonistic network generation model is CycleGAN, and specifically, the model structure of the generator is as follows:
layer(s) Module Number of output channels
1 7 x 7 convolutional layer 64
2 ReLU laserLive function 64
3 3 x 3 convolutional layer 128
4 BatchNorm layer 128
5 ReLU activation function 128
6 3 x 3 convolutional layer 256
7 BatchNorm layer 256
8 ReLU activation function 256
9~17 9×ResnetBlock 256
18 3 x 3 deconvolution layer 128
19 BatchNorm layer 128
20 ReLU activation function 128
21 3 x 3 deconvolution layer 64
22 BatchNorm layer 64
23 ReLU activation function 64
24 7 x 7 convolutional layer 3
25 Tanh activation function 3
Wherein the ResnetBlock structure is shown in figure 4.
During the cycleGAN training, night road scene images and day road scene images are respectively input into two generators of the cycleGAN for training, wherein 200 epochs are used, the learning rate is set to be 0.0002, and the random cutting size is set to be 256 multiplied by 256. Finally, two generators for converting night images into day images and converting the day images into night images are obtained;
and 4, step 4: acquiring a data set containing semantic segmentation labels for training a semantic segmentation network model;
and 5: by using the generator for converting the daytime image into the night image in the trained generated countermeasure network obtained in the step 3, converting part of the daytime image in the data set provided for the semantic segmentation network model into the artificial night image to obtain the data set containing the artificial night image, and tests prove that the semantic segmentation result is closest to the true value when the proportion of the artificial night image in the data set containing the artificial night image is 30%, and the adopted proportion is 30% in the embodiment;
step 6: the semantic segmentation network model, which may be SegNet (refer to the paper Badrinarayana, V., Kendall, A., and circular, R., "Segnet: Adeep connected audio-decoder-decoding for image segmentation," IEEE Transactions on paper analysis and machine interaction 39(12), "2481-file 2495(2017)), ERFNet (refer to the paper Romera, E., Alvarez, J.M., Bergsa, L.M., and Arroy, R.," EdgefF: electronic efficiency for comparison-network-update, R., "IEEE-file system, S.," update-system for comparison-system ", and S.," update-file system for update-update, real-time semantic segmentation networks such as Kreso I, Bevandic P, equivalent in feedback of Pre-routed ImageNet architecture for Real-time semantic segmentation of Road-driving Images [ C ]// Proceedings of the IEEE Conference on computer Vision and Pattern registration.2019: 12607 + 12616.); in this embodiment, an ERF-PSPNet is adopted, the model is composed of an encoder and a decoder, as shown in fig. 3, where the encoder is a residual decomposition convolutional network including a decomposition convolutional layer Non-cottleneck-1D, the decoder is a spatial pyramid pooling network, and each layer of the ERF-PSPNet semantic segmentation network model is shown in the following table:
layer(s) Module Number of output channels Output resolution
1 Down sampling module 3 320×240
2 Down sampling module 16 160×120
3-7 5×Non-bt-1D 64 80×60
8 Down sampling module 128 40×30
9 Non-bt-1D(dilated 2) 128 40×30
10 Non-bt-1D(dilated 4) 128 40×30
11 Non-bt-1D(dilated 8) 128 40×30
12 Non-bt-1D(dilated 16) 128 40×30
13 Non-bt-1D(dilated 2) 128 40×30
14 Non-bt-1D(dilated 4) 128 40×30
15 Non-bt-1D(dilated 8) 128 40×30
16 Non-bt-1D(dilated 16) 128 40×30
17 Non-bt-1D(dilated 2) 128 40×30
18a 17 layer signature 128 40×30
18b Pooling, convolution 32 40×30
18c Pooling, convolution 32 20×15
18d Pooling, convolution 32 10×8
18e Pooling, convolution 32 5×4
19 Convolution with a bit line Number of categories 40×30
20 Upsampling Number of categories 640×480
The loss function used is focal loss, which is formulated as follows:
loss(p)=-(1-p)γlog p
where p is a probability of determining that the pixel is of a certain class, γ is a modulation factor, and γ is set to 2 in this embodiment.
And 7: and (3) inputting the actually acquired night image into the semantic segmentation model obtained by training in the step (6) for classification prediction, wherein the ERF-PSPNet can classify the RGB input image pixel by pixel to generate a corresponding label map, and a classification prediction result is obtained.
Fig. 5 is a night image actually acquired, fig. 8 is a classification true value of the night image, fig. 6 is a classification prediction result of a model which does not use the method of the present invention for the semantic segmentation image of the night image, and fig. 7 is a classification prediction of the model which uses the method of the present invention for the semantic segmentation image of the image.

Claims (7)

1. A robust night image semantic segmentation method is characterized by comprising the following steps: converting a part of daytime images in the data set containing the semantic segmentation labels into artificial night images through a confrontation generation network model, generating a data set containing the artificial night images and using the data set to train a semantic segmentation neural network model; and inputting the actually acquired night image into the trained semantic segmentation neural network model to obtain a night image semantic segmentation prediction result. Specifically, the method comprises the following steps:
step 1: acquiring a data set used for training and generating a confrontation network model, wherein the data set comprises equal number of night road scene images and day road scene images;
step 2: constructing and generating a confrontation network model, wherein the model comprises a pair of generators and a discriminator;
and step 3: inputting the data set obtained in the step 1 into a generation countermeasure network for training to obtain two generators for converting night images into day images and converting the day images into night images;
and 4, step 4: acquiring a data set containing semantic segmentation labels for training a semantic segmentation network model;
and 5: a generator for converting the daytime image obtained in the step 3 into a night image, and converting the part of the daytime image in the data set containing the semantic segmentation labels into an artificial night image to obtain a data set containing the artificial night image;
step 6: inputting the data set containing the artificial night image obtained in the step 5 into a semantic segmentation network model for training to obtain a robust night image semantic segmentation model;
and 7: and (4) inputting the actually acquired night image into the semantic segmentation model obtained in the step (6) to realize the robust night image semantic segmentation.
2. The method of claim 1, wherein the semantic segmentation network model is an ERF-PSPNet, and the model is composed of an encoder and a decoder, wherein the encoder is a residual solution convolutional network and includes a solution convolutional layer Non-convolutional-1D, and the decoder is a spatial pyramid pooling network, and each layer of the ERF-PSPNet semantic segmentation network model is as follows:
Figure FDA0002308831070000011
Figure FDA0002308831070000021
3. the method of claim 1, wherein the data set used to train generation of the antagonistic network model is an autopilot data set, including cityscaps, bdd, and the like.
4. The method of claim 1, wherein the generative confrontation network model is CycleGAN.
5. The method of claim 4, wherein the CycleGAN training process is as follows:
the night road scene image and the day road scene image are respectively input into two generators of the cycleGAN for training, wherein 200 epochs are used, the learning rate is set to be 0.0002, and the random cutting size is set to be 256 multiplied by 256.
6. The method as claimed in claim 1, wherein in the step 5, the data set containing the artificial night image has a proportion of the artificial night image of 30%.
7. The method according to claim 1, wherein in step 6, the loss function adopted by the semantic segmentation model is focal loss, and the formula is as follows:
loss(p)=-(1-p)γlog p
where p is the probability of determining that the pixel is of a certain class, and γ is the modulation factor.
CN201911250296.1A 2019-12-09 2019-12-09 Robust night image semantic segmentation method Withdrawn CN111008979A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911250296.1A CN111008979A (en) 2019-12-09 2019-12-09 Robust night image semantic segmentation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911250296.1A CN111008979A (en) 2019-12-09 2019-12-09 Robust night image semantic segmentation method

Publications (1)

Publication Number Publication Date
CN111008979A true CN111008979A (en) 2020-04-14

Family

ID=70114053

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911250296.1A Withdrawn CN111008979A (en) 2019-12-09 2019-12-09 Robust night image semantic segmentation method

Country Status (1)

Country Link
CN (1) CN111008979A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111504331A (en) * 2020-04-29 2020-08-07 杭州环峻科技有限公司 Method and device for positioning panoramic intelligent vehicle from coarse to fine
CN112287938A (en) * 2020-10-29 2021-01-29 苏州浪潮智能科技有限公司 Text segmentation method, system, device and medium
CN112756742A (en) * 2021-01-08 2021-05-07 南京理工大学 Laser vision weld joint tracking system based on ERFNet network
CN113537228A (en) * 2021-07-07 2021-10-22 中国电子科技集团公司第五十四研究所 Real-time image semantic segmentation method based on depth features

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109670409A (en) * 2018-11-28 2019-04-23 浙江大学 A kind of scene expression system and method for the rodlike pixel of semanteme
CN110188817A (en) * 2019-05-28 2019-08-30 厦门大学 A kind of real-time high-performance street view image semantic segmentation method based on deep learning

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109670409A (en) * 2018-11-28 2019-04-23 浙江大学 A kind of scene expression system and method for the rodlike pixel of semanteme
CN110188817A (en) * 2019-05-28 2019-08-30 厦门大学 A kind of real-time high-performance street view image semantic segmentation method based on deep learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
KAILUN YANG等: "Unifying terrain awareness through real-time semantic segmentation" *
LEI SUN等: "See Clearer at Night: Towards Robust Nighttime Semantic Segmentation through Day-Night Image Conversion" *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111504331A (en) * 2020-04-29 2020-08-07 杭州环峻科技有限公司 Method and device for positioning panoramic intelligent vehicle from coarse to fine
CN112287938A (en) * 2020-10-29 2021-01-29 苏州浪潮智能科技有限公司 Text segmentation method, system, device and medium
CN112287938B (en) * 2020-10-29 2022-12-06 苏州浪潮智能科技有限公司 Text segmentation method, system, device and medium
CN112756742A (en) * 2021-01-08 2021-05-07 南京理工大学 Laser vision weld joint tracking system based on ERFNet network
CN113537228A (en) * 2021-07-07 2021-10-22 中国电子科技集团公司第五十四研究所 Real-time image semantic segmentation method based on depth features
CN113537228B (en) * 2021-07-07 2022-10-21 中国电子科技集团公司第五十四研究所 Real-time image semantic segmentation method based on depth features

Similar Documents

Publication Publication Date Title
CN109241982B (en) Target detection method based on deep and shallow layer convolutional neural network
CN111008979A (en) Robust night image semantic segmentation method
CN108334881B (en) License plate recognition method based on deep learning
CN110147794A (en) A kind of unmanned vehicle outdoor scene real time method for segmenting based on deep learning
CN111967313B (en) Unmanned aerial vehicle image annotation method assisted by deep learning target detection algorithm
CN111310773A (en) Efficient license plate positioning method of convolutional neural network
CN108154102A (en) A kind of traffic sign recognition method
CN109509156B (en) Image defogging processing method based on generation countermeasure model
CN109670555B (en) Instance-level pedestrian detection and pedestrian re-recognition system based on deep learning
CN112800906B (en) Improved YOLOv 3-based cross-domain target detection method for automatic driving automobile
CN113160062B (en) Infrared image target detection method, device, equipment and storage medium
CN112287941A (en) License plate recognition method based on automatic character region perception
CN113408584A (en) RGB-D multi-modal feature fusion 3D target detection method
CN115861619A (en) Airborne LiDAR (light detection and ranging) urban point cloud semantic segmentation method and system of recursive residual double-attention kernel point convolution network
CN115331183A (en) Improved YOLOv5s infrared target detection method
CN112084897A (en) Rapid traffic large-scene vehicle target detection method of GS-SSD
CN116385958A (en) Edge intelligent detection method for power grid inspection and monitoring
CN117079163A (en) Aerial image small target detection method based on improved YOLOX-S
CN112668662B (en) Outdoor mountain forest environment target detection method based on improved YOLOv3 network
CN113902753A (en) Image semantic segmentation method and system based on dual-channel and self-attention mechanism
Aldabbagh et al. Classification of chili plant growth using deep learning
CN111353509B (en) Key point extractor generation method of visual SLAM system
CN114111647A (en) Artificial intelligence-based method and system for measuring damaged area of insulator umbrella skirt
CN113011308A (en) Pedestrian detection method introducing attention mechanism
CN112132835A (en) SeFa and artificial intelligence-based jelly effect analysis method for photovoltaic track camera

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20200414

WW01 Invention patent application withdrawn after publication