CN111008979A - Robust night image semantic segmentation method - Google Patents
Robust night image semantic segmentation method Download PDFInfo
- Publication number
- CN111008979A CN111008979A CN201911250296.1A CN201911250296A CN111008979A CN 111008979 A CN111008979 A CN 111008979A CN 201911250296 A CN201911250296 A CN 201911250296A CN 111008979 A CN111008979 A CN 111008979A
- Authority
- CN
- China
- Prior art keywords
- semantic segmentation
- night
- data set
- image
- images
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/11—Region-based segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10004—Still image; Photographic image
Abstract
The invention discloses a robustness enhancing method for night semantic segmentation. The method comprises the steps of training a confrontation generation network, converting partial images in a street view data set under normal illumination conditions containing semantic segmentation labels into artificial night street view images, and training a semantic segmentation network model by using the obtained street view data set containing partial night images, wherein the model has strong robustness for night image semantic segmentation prediction. The method has the advantages of high real-time performance, low price and no need of marking a large number of night data sets.
Description
Technical Field
The invention belongs to the technical fields of pattern recognition technology, image processing technology, computer vision technology and deep learning, and relates to a robust night image semantic segmentation method.
Background
The automatic driving is in an important position in the intelligent transportation industry, so that image semantic segmentation is gradually becoming a research hotspot in the field of computer vision, and the semantic segmentation can realize the classification and marking of pixel levels of traffic scenes. Due to the strong feature characterization capability of the deep convolutional neural network, the semantic segmentation method based on the deep convolutional neural network is greatly improved.
At present, most semantic segmentation data sets aiming at road scenes are acquired in clear weather, semantic segmentation models obtained by training the data sets perform well under normal illumination conditions, but when the method is used for road scene images at night, due to poor illumination conditions and much stray light, the difference between the extracted features of the images and the extracted features under the normal illumination conditions is large, the precision of the methods is greatly reduced, and the requirements of automatic driving cannot be met. To solve this problem, we need to improve the robustness of semantic segmentation at night
Disclosure of Invention
The invention aims to: in order to solve the problem that the existing semantic segmentation technology is low in night image accuracy, the invention provides a robust night image semantic segmentation method based on generation of a countermeasure network.
The purpose of the invention is realized by the following technical scheme: converting a part of daytime images in a data set containing semantic segmentation labels into artificial night images through a confrontation generation network model, and generating a data set with a certain proportion of artificial night images; training a semantic segmentation neural network by using the data; and obtaining a more accurate object class prediction result by the semantic segmentation neural network model obtained by training the actually acquired night image. Specifically, the method comprises the following steps:
step 1: acquiring a data set used for training and generating a confrontation network model, wherein the data set comprises equal number of night road scene images and day road scene images;
step 2: constructing and generating a confrontation network model, wherein the model comprises a pair of generators and a discriminator;
and step 3: inputting the data set obtained in the step 1 into a generation countermeasure network for training to obtain two generators for converting night images into day images and converting the day images into night images;
and 4, step 4: acquiring a data set containing semantic segmentation labels for training a semantic segmentation network model;
and 5: a generator for converting the daytime image obtained in the step 3 into a night image, and converting the part of the daytime image in the data set containing the semantic segmentation labels into an artificial night image to obtain a data set containing the artificial night image;
step 6: inputting the data set containing the artificial night image obtained in the step 5 into a semantic segmentation network model for training to obtain a robust night image semantic segmentation model;
and 7: and (4) inputting the actually acquired night image into the semantic segmentation model obtained in the step (6) to realize the robust night image semantic segmentation.
Further, the semantic segmentation network model is an ERF-PSPNet semantic segmentation network model, the model is composed of an encoder and a decoder, wherein the encoder is a residual decomposition convolution network and comprises a decomposition convolution layer Non-cottleneck-1D, the operation amount is reduced, the precision is kept, the decoder is a space pyramid pooling network, and each layer of the network is shown in the following table:
layer(s) | Module | Number of output channels | Output resolution |
1 | Down sampling module | 3 | 320×240 |
2 | Down sampling module | 16 | 160×120 |
3-7 | 5×Non-bt-1D | 64 | 80×60 |
8 | Down sampling module | 128 | 40×30 |
9 | Non-bt-1D(dilated 2) | 128 | 40×30 |
10 | Non-bt-1D(dilated 4) | 128 | 40×30 |
11 | Non-bt-1D(dilated 8) | 128 | 40×30 |
12 | Non-bt-1D(dilated 16) | 128 | 40×30 |
13 | Non-bt-1D(dilated 2) | 128 | 40×30 |
14 | Non-bt-1D(dilated 4) | 128 | 40×30 |
15 | Non-bt-1D(dilated 8) | 128 | 40×30 |
16 | Non-bt-1D(dilated 16) | 128 | 40×30 |
17 | Non-bt-1D(dilated 2) | 128 | 40×30 |
18a | 17 layer signature | 128 | 40×30 |
18b | Pooling, convolution | 32 | 40×30 |
18c | Pooling, convolution | 32 | 20×15 |
18d | Pooling, convolution | 32 | 10×8 |
18e | Pooling, convolution | 32 | 5×4 |
19 | Convolution with a bit line | Number of categories | 40×30 |
20 | Upsampling | Number of categories | 640×480 |
The ERF-PSPNet may classify the RGB input image pixel by pixel, producing a corresponding label map.
Further, the data set used to train the generation of the antagonistic network model is an autopilot data set, including cityscaps, bdd.
Further, the generative confrontation network model is CycleGAN.
Further, the CycleGAN training process is as follows:
the night road scene image and the day road scene image are respectively input into two generators of the cycleGAN for training, wherein 200 epochs are used, the learning rate is set to be 0.0002, and the random cutting size is set to be 256 multiplied by 256.
Further, in the step 5, the proportion of the artificial night image in the data set including the artificial night image is 30%.
Further, in step 6, the loss function adopted by the semantic segmentation model is focal loss, and the formula is as follows:
loss(p)=-(1-p)γlog p
where p is a probability of determining that the pixel is of a certain class, γ is a modulation factor, and γ is set to 2 in the present invention.
Compared with other methods for enhancing the semantic segmentation robustness, the method has the advantages that:
a large number of label data sets are not needed, and a large number of manpower and material resources can be saved. An artificial night data set is generated only by generating a confrontation network, and a semantic segmentation network is input for training by mixing an artificial night image and a daytime image, so that the robustness of the confrontation network is improved;
the real-time performance is high. As the trained model does not need to be additionally operated and processed in the reasoning stage, and the extra operation amount is not increased, the semantic segmentation model keeps the original real-time performance and supports the high-real-time night road information prediction.
The price is low. Because the methods are all based on the algorithm level, other sensors such as an infrared camera or a radar and the like do not need to be additionally used, and compared with other night environment sensing methods, the method does not need additional hardware cost.
The prediction accuracy is high. The information prediction accuracy of the semantic segmentation network model obtained by training in the method for the street view image at night is higher than that of other similar methods, and the semantic segmentation network model can run in real time.
Drawings
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a diagram of a Non-bottompiece-1D module;
FIG. 3 is a diagram of a semantic segmentation network ERF-PSPNet model;
FIG. 4 is a diagram of a ResnetBlock model in a generation countermeasure network;
FIG. 5 is a diagram of a night image actually acquired;
FIG. 6 is a graph of semantic segmentation network prediction without the proposed method;
FIG. 7 is a graph of a semantic segmentation prediction using the proposed method;
FIG. 8 is a semantic segmentation truth label diagram;
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to specific embodiments and accompanying drawings.
The method relates to a method for enhancing robustness of night image semantic segmentation, which has the core that a countermeasure generating network is utilized to preprocess a data set for semantic segmentation training, a scheme framework is shown in figure 1, and the specific implementation steps are as follows:
step 1: acquiring a data set for training to generate an antagonistic network, wherein the data set must contain a certain number of night images, and the data set can adopt an automatic driving data set, such as cityscaps, bdd and the like, and selecting equivalent pictures of a night road scene image and a day road scene image respectively to form a data set for training to generate the antagonistic network;
step 2: constructing a non-pairing generation confrontation network model, wherein the model comprises a pair of generators and a discriminator;
and step 3: inputting the data set obtained in the step 1 into a generation countermeasure network for training, and obtaining two generators for converting night images into day images and converting the day images into night images; in this embodiment, the adopted antagonistic network generation model is CycleGAN, and specifically, the model structure of the generator is as follows:
layer(s) | Module | Number of output channels |
1 | 7 x 7 convolutional layer | 64 |
2 | ReLU laserLive function | 64 |
3 | 3 x 3 convolutional layer | 128 |
4 | BatchNorm layer | 128 |
5 | ReLU activation function | 128 |
6 | 3 x 3 convolutional layer | 256 |
7 | BatchNorm layer | 256 |
8 | ReLU activation function | 256 |
9~17 | 9×ResnetBlock | 256 |
18 | 3 x 3 deconvolution layer | 128 |
19 | BatchNorm layer | 128 |
20 | ReLU activation function | 128 |
21 | 3 x 3 deconvolution layer | 64 |
22 | BatchNorm layer | 64 |
23 | ReLU activation function | 64 |
24 | 7 x 7 convolutional layer | 3 |
25 | Tanh activation function | 3 |
Wherein the ResnetBlock structure is shown in figure 4.
During the cycleGAN training, night road scene images and day road scene images are respectively input into two generators of the cycleGAN for training, wherein 200 epochs are used, the learning rate is set to be 0.0002, and the random cutting size is set to be 256 multiplied by 256. Finally, two generators for converting night images into day images and converting the day images into night images are obtained;
and 4, step 4: acquiring a data set containing semantic segmentation labels for training a semantic segmentation network model;
and 5: by using the generator for converting the daytime image into the night image in the trained generated countermeasure network obtained in the step 3, converting part of the daytime image in the data set provided for the semantic segmentation network model into the artificial night image to obtain the data set containing the artificial night image, and tests prove that the semantic segmentation result is closest to the true value when the proportion of the artificial night image in the data set containing the artificial night image is 30%, and the adopted proportion is 30% in the embodiment;
step 6: the semantic segmentation network model, which may be SegNet (refer to the paper Badrinarayana, V., Kendall, A., and circular, R., "Segnet: Adeep connected audio-decoder-decoding for image segmentation," IEEE Transactions on paper analysis and machine interaction 39(12), "2481-file 2495(2017)), ERFNet (refer to the paper Romera, E., Alvarez, J.M., Bergsa, L.M., and Arroy, R.," EdgefF: electronic efficiency for comparison-network-update, R., "IEEE-file system, S.," update-system for comparison-system ", and S.," update-file system for update-update, real-time semantic segmentation networks such as Kreso I, Bevandic P, equivalent in feedback of Pre-routed ImageNet architecture for Real-time semantic segmentation of Road-driving Images [ C ]// Proceedings of the IEEE Conference on computer Vision and Pattern registration.2019: 12607 + 12616.); in this embodiment, an ERF-PSPNet is adopted, the model is composed of an encoder and a decoder, as shown in fig. 3, where the encoder is a residual decomposition convolutional network including a decomposition convolutional layer Non-cottleneck-1D, the decoder is a spatial pyramid pooling network, and each layer of the ERF-PSPNet semantic segmentation network model is shown in the following table:
layer(s) | Module | Number of output channels | Output resolution |
1 | Down sampling module | 3 | 320×240 |
2 | Down sampling module | 16 | 160×120 |
3-7 | 5×Non-bt-1D | 64 | 80×60 |
8 | Down sampling module | 128 | 40×30 |
9 | Non-bt-1D(dilated 2) | 128 | 40×30 |
10 | Non-bt-1D(dilated 4) | 128 | 40×30 |
11 | Non-bt-1D(dilated 8) | 128 | 40×30 |
12 | Non-bt-1D(dilated 16) | 128 | 40×30 |
13 | Non-bt-1D(dilated 2) | 128 | 40×30 |
14 | Non-bt-1D(dilated 4) | 128 | 40×30 |
15 | Non-bt-1D(dilated 8) | 128 | 40×30 |
16 | Non-bt-1D(dilated 16) | 128 | 40×30 |
17 | Non-bt-1D(dilated 2) | 128 | 40×30 |
18a | 17 layer signature | 128 | 40×30 |
18b | Pooling, convolution | 32 | 40×30 |
18c | Pooling, convolution | 32 | 20×15 |
18d | Pooling, convolution | 32 | 10×8 |
18e | Pooling, convolution | 32 | 5×4 |
19 | Convolution with a bit line | Number of categories | 40×30 |
20 | Upsampling | Number of categories | 640×480 |
The loss function used is focal loss, which is formulated as follows:
loss(p)=-(1-p)γlog p
where p is a probability of determining that the pixel is of a certain class, γ is a modulation factor, and γ is set to 2 in this embodiment.
And 7: and (3) inputting the actually acquired night image into the semantic segmentation model obtained by training in the step (6) for classification prediction, wherein the ERF-PSPNet can classify the RGB input image pixel by pixel to generate a corresponding label map, and a classification prediction result is obtained.
Fig. 5 is a night image actually acquired, fig. 8 is a classification true value of the night image, fig. 6 is a classification prediction result of a model which does not use the method of the present invention for the semantic segmentation image of the night image, and fig. 7 is a classification prediction of the model which uses the method of the present invention for the semantic segmentation image of the image.
Claims (7)
1. A robust night image semantic segmentation method is characterized by comprising the following steps: converting a part of daytime images in the data set containing the semantic segmentation labels into artificial night images through a confrontation generation network model, generating a data set containing the artificial night images and using the data set to train a semantic segmentation neural network model; and inputting the actually acquired night image into the trained semantic segmentation neural network model to obtain a night image semantic segmentation prediction result. Specifically, the method comprises the following steps:
step 1: acquiring a data set used for training and generating a confrontation network model, wherein the data set comprises equal number of night road scene images and day road scene images;
step 2: constructing and generating a confrontation network model, wherein the model comprises a pair of generators and a discriminator;
and step 3: inputting the data set obtained in the step 1 into a generation countermeasure network for training to obtain two generators for converting night images into day images and converting the day images into night images;
and 4, step 4: acquiring a data set containing semantic segmentation labels for training a semantic segmentation network model;
and 5: a generator for converting the daytime image obtained in the step 3 into a night image, and converting the part of the daytime image in the data set containing the semantic segmentation labels into an artificial night image to obtain a data set containing the artificial night image;
step 6: inputting the data set containing the artificial night image obtained in the step 5 into a semantic segmentation network model for training to obtain a robust night image semantic segmentation model;
and 7: and (4) inputting the actually acquired night image into the semantic segmentation model obtained in the step (6) to realize the robust night image semantic segmentation.
2. The method of claim 1, wherein the semantic segmentation network model is an ERF-PSPNet, and the model is composed of an encoder and a decoder, wherein the encoder is a residual solution convolutional network and includes a solution convolutional layer Non-convolutional-1D, and the decoder is a spatial pyramid pooling network, and each layer of the ERF-PSPNet semantic segmentation network model is as follows:
3. the method of claim 1, wherein the data set used to train generation of the antagonistic network model is an autopilot data set, including cityscaps, bdd, and the like.
4. The method of claim 1, wherein the generative confrontation network model is CycleGAN.
5. The method of claim 4, wherein the CycleGAN training process is as follows:
the night road scene image and the day road scene image are respectively input into two generators of the cycleGAN for training, wherein 200 epochs are used, the learning rate is set to be 0.0002, and the random cutting size is set to be 256 multiplied by 256.
6. The method as claimed in claim 1, wherein in the step 5, the data set containing the artificial night image has a proportion of the artificial night image of 30%.
7. The method according to claim 1, wherein in step 6, the loss function adopted by the semantic segmentation model is focal loss, and the formula is as follows:
loss(p)=-(1-p)γlog p
where p is the probability of determining that the pixel is of a certain class, and γ is the modulation factor.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911250296.1A CN111008979A (en) | 2019-12-09 | 2019-12-09 | Robust night image semantic segmentation method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911250296.1A CN111008979A (en) | 2019-12-09 | 2019-12-09 | Robust night image semantic segmentation method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111008979A true CN111008979A (en) | 2020-04-14 |
Family
ID=70114053
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911250296.1A Withdrawn CN111008979A (en) | 2019-12-09 | 2019-12-09 | Robust night image semantic segmentation method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111008979A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111504331A (en) * | 2020-04-29 | 2020-08-07 | 杭州环峻科技有限公司 | Method and device for positioning panoramic intelligent vehicle from coarse to fine |
CN112287938A (en) * | 2020-10-29 | 2021-01-29 | 苏州浪潮智能科技有限公司 | Text segmentation method, system, device and medium |
CN112756742A (en) * | 2021-01-08 | 2021-05-07 | 南京理工大学 | Laser vision weld joint tracking system based on ERFNet network |
CN113537228A (en) * | 2021-07-07 | 2021-10-22 | 中国电子科技集团公司第五十四研究所 | Real-time image semantic segmentation method based on depth features |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109670409A (en) * | 2018-11-28 | 2019-04-23 | 浙江大学 | A kind of scene expression system and method for the rodlike pixel of semanteme |
CN110188817A (en) * | 2019-05-28 | 2019-08-30 | 厦门大学 | A kind of real-time high-performance street view image semantic segmentation method based on deep learning |
-
2019
- 2019-12-09 CN CN201911250296.1A patent/CN111008979A/en not_active Withdrawn
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109670409A (en) * | 2018-11-28 | 2019-04-23 | 浙江大学 | A kind of scene expression system and method for the rodlike pixel of semanteme |
CN110188817A (en) * | 2019-05-28 | 2019-08-30 | 厦门大学 | A kind of real-time high-performance street view image semantic segmentation method based on deep learning |
Non-Patent Citations (2)
Title |
---|
KAILUN YANG等: "Unifying terrain awareness through real-time semantic segmentation" * |
LEI SUN等: "See Clearer at Night: Towards Robust Nighttime Semantic Segmentation through Day-Night Image Conversion" * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111504331A (en) * | 2020-04-29 | 2020-08-07 | 杭州环峻科技有限公司 | Method and device for positioning panoramic intelligent vehicle from coarse to fine |
CN112287938A (en) * | 2020-10-29 | 2021-01-29 | 苏州浪潮智能科技有限公司 | Text segmentation method, system, device and medium |
CN112287938B (en) * | 2020-10-29 | 2022-12-06 | 苏州浪潮智能科技有限公司 | Text segmentation method, system, device and medium |
CN112756742A (en) * | 2021-01-08 | 2021-05-07 | 南京理工大学 | Laser vision weld joint tracking system based on ERFNet network |
CN113537228A (en) * | 2021-07-07 | 2021-10-22 | 中国电子科技集团公司第五十四研究所 | Real-time image semantic segmentation method based on depth features |
CN113537228B (en) * | 2021-07-07 | 2022-10-21 | 中国电子科技集团公司第五十四研究所 | Real-time image semantic segmentation method based on depth features |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109241982B (en) | Target detection method based on deep and shallow layer convolutional neural network | |
CN111008979A (en) | Robust night image semantic segmentation method | |
CN108334881B (en) | License plate recognition method based on deep learning | |
CN110147794A (en) | A kind of unmanned vehicle outdoor scene real time method for segmenting based on deep learning | |
CN111967313B (en) | Unmanned aerial vehicle image annotation method assisted by deep learning target detection algorithm | |
CN111310773A (en) | Efficient license plate positioning method of convolutional neural network | |
CN108154102A (en) | A kind of traffic sign recognition method | |
CN109509156B (en) | Image defogging processing method based on generation countermeasure model | |
CN109670555B (en) | Instance-level pedestrian detection and pedestrian re-recognition system based on deep learning | |
CN112800906B (en) | Improved YOLOv 3-based cross-domain target detection method for automatic driving automobile | |
CN113160062B (en) | Infrared image target detection method, device, equipment and storage medium | |
CN112287941A (en) | License plate recognition method based on automatic character region perception | |
CN113408584A (en) | RGB-D multi-modal feature fusion 3D target detection method | |
CN115861619A (en) | Airborne LiDAR (light detection and ranging) urban point cloud semantic segmentation method and system of recursive residual double-attention kernel point convolution network | |
CN115331183A (en) | Improved YOLOv5s infrared target detection method | |
CN112084897A (en) | Rapid traffic large-scene vehicle target detection method of GS-SSD | |
CN116385958A (en) | Edge intelligent detection method for power grid inspection and monitoring | |
CN117079163A (en) | Aerial image small target detection method based on improved YOLOX-S | |
CN112668662B (en) | Outdoor mountain forest environment target detection method based on improved YOLOv3 network | |
CN113902753A (en) | Image semantic segmentation method and system based on dual-channel and self-attention mechanism | |
Aldabbagh et al. | Classification of chili plant growth using deep learning | |
CN111353509B (en) | Key point extractor generation method of visual SLAM system | |
CN114111647A (en) | Artificial intelligence-based method and system for measuring damaged area of insulator umbrella skirt | |
CN113011308A (en) | Pedestrian detection method introducing attention mechanism | |
CN112132835A (en) | SeFa and artificial intelligence-based jelly effect analysis method for photovoltaic track camera |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20200414 |
|
WW01 | Invention patent application withdrawn after publication |