CN109635642A - A kind of road scene dividing method based on residual error network and expansion convolution - Google Patents
A kind of road scene dividing method based on residual error network and expansion convolution Download PDFInfo
- Publication number
- CN109635642A CN109635642A CN201811293377.5A CN201811293377A CN109635642A CN 109635642 A CN109635642 A CN 109635642A CN 201811293377 A CN201811293377 A CN 201811293377A CN 109635642 A CN109635642 A CN 109635642A
- Authority
- CN
- China
- Prior art keywords
- residual block
- road scene
- convolution layer
- semantic segmentation
- layer
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
- 238000000034 method Methods 0.000 title claims abstract description 43
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 claims abstract description 130
- 230000011218 segmentation Effects 0.000 claims abstract description 83
- 238000012549 training Methods 0.000 claims abstract description 46
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 31
- 238000012360 testing method Methods 0.000 claims abstract description 11
- 238000010586 diagram Methods 0.000 claims description 6
- 241000287196 Asthenes Species 0.000 claims description 3
- 238000012545 processing Methods 0.000 claims description 3
- 230000010339 dilation Effects 0.000 claims description 2
- 238000010276 construction Methods 0.000 abstract 1
- 230000006870 function Effects 0.000 description 16
- 238000013135 deep learning Methods 0.000 description 7
- 238000013528 artificial neural network Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 230000003190 augmentative effect Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 230000004807 localization Effects 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 230000035755 proliferation Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/56—Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/20—Scenes; Scene-specific elements in augmented reality scenes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/35—Categorising the entire scene, e.g. birthday party or wedding scene
- G06V20/38—Outdoor scenes
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Engineering & Computer Science (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a kind of based on residual error network and expands the road scene dividing method of convolution, constructs convolutional neural networks in the training stage, the Residual block that hidden layer is set gradually by 10 is formed;The original road scene image of every in training set is input in convolutional neural networks and is trained, the corresponding 12 width semantic segmentation prognostic chart of every original road scene image is obtained;By calculate set that the corresponding 12 width semantic segmentation prognostic chart of every original road scene image is constituted and corresponding true semantic segmentation image procossing at 12 width one-hot coding image constructions set between loss function value, obtain the best initial weights vector of convolutional neural networks classification based training model;It in test phase, is predicted using the best initial weights vector of convolutional neural networks classification based training model, obtains the corresponding prediction semantic segmentation image of road scene image to semantic segmentation;Advantage is that computation complexity is low, segmentation is high-efficient, segmentation precision is high, and robustness is good.
Description
Technical Field
The invention relates to a semantic segmentation technology for deep learning, in particular to a road scene segmentation method based on a residual error network and expansion convolution.
Background
Deep learning is a branch of artificial neural networks, and artificial neural networks with deep network structures are the earliest network models for deep learning. Originally, deep learning was primarily applied in the image and speech domains. Since 2006, deep learning has been used in academic circles with continuous temperature rise, deep learning and neural networks have extremely wide applications in semantic segmentation, computer vision, speech recognition and tracking, and its extremely high efficiency also makes it have great potential in real-time applications and other aspects.
Convolutional neural networks have been successful in image classification, localization, and scene understanding. With the proliferation of tasks such as augmented reality and autonomous driving of vehicles, many researchers have turned their attention to scene understanding, where one of the main steps is semantic segmentation, i.e., classification of each pixel in a given image. Semantic segmentation has important implications in mobile and robot related applications.
The semantic segmentation problem plays an important role in many application scenarios, such as picture understanding and automatic driving, and therefore has recently attracted much attention in academic and industrial fields. The classical semantic segmentation methods include a Full Connected Network (FCN) and a convolutional neural Network (SegNet), and the methods have good expressions of pixel precision, average pixel precision and average cross-over ratio on a road scene segmentation database. However, one disadvantage of FCN is that the response tensor size (length and width) is smaller and smaller due to the existence of the pooling layer, however, the FCN is designed to require an output with the same size as the input, so the FCN performs upsampling, but the upsampling cannot retrieve all the lost information without loss; the convolutional neural network SegNet is a network model constructed on the basis of FCN, however, it does not well control the problem of information loss. Therefore, these methods affect the segmentation accuracy due to information loss, resulting in a decrease in the robustness of the method.
Disclosure of Invention
The invention aims to solve the technical problem of providing a road scene segmentation method based on a residual error network and expansion convolution, which is low in computation complexity, high in segmentation efficiency, high in segmentation precision and good in robustness.
The technical scheme adopted by the invention for solving the technical problems is as follows: a road scene segmentation method based on residual error network and expansion convolution is characterized by comprising a training stage and a testing stage;
the specific steps of the training phase process are as follows:
step 1_ 1: selecting Q original road scene images and real semantic segmentation images corresponding to each original road scene image, forming a training set, and recording the Q-th original road scene image in the training set as { I }q(I, j) }, the training set is summed with { I }q(i, j) } the corresponding real semantic segmentation image is recorded asThen using a one-hot coding technique to encode each original in the training setProcessing the real semantic segmentation image corresponding to the road scene image into 12 single-hot coded images, and converting the single-hot coded images into a plurality of road scene imagesThe processed set of 12 one-hot coded images is denoted asThe road scene image is an RGB color image, Q is a positive integer, Q is more than or equal to 100, Q is a positive integer, Q is more than or equal to 1 and less than or equal to Q, I is more than or equal to 1 and less than or equal to W, j is more than or equal to 1 and less than or equal to H, and W represents { I ≦ Iq(I, j) }, H denotes { I }qHeight of (I, j) }, Iq(I, j) represents { IqThe pixel value of the pixel point with the coordinate position (i, j) in (i, j),to representThe middle coordinate position is the pixel value of the pixel point of (i, j);
step 1_ 2: constructing a convolutional neural network: the convolutional neural network comprises an input layer, a hidden layer and an output layer; the hidden layers are composed of 10 sequentially arranged Residual blocks, wherein each convolution layer in the 1 st Residual block forms an expanded convolution layer by setting the expansion ratio to 1, each convolution layer in the 2 nd Residual block forms an expanded convolution layer by setting the expansion ratio to 1, each convolution layer in the 3 rd Residual block forms an expanded convolution layer by setting the expansion ratio to 2, each convolution layer in the 4 th Residual block forms an expanded convolution layer by setting the expansion ratio to 2, each convolution layer in the 5 th Residual block forms an expanded convolution layer by setting the expansion ratio to 4, each convolution layer in the 6 th Residual block forms an expanded convolution layer by setting the expansion ratio to 4, each convolution layer in the 7 th Residual block forms an expanded convolution layer by setting the expansion ratio to 2, each convolution layer in the 8 th Residual block forms an expanded convolution layer by setting the expansion ratio to 2, each convolution layer in the 9 th Residual block forms an expanded convolution layer by setting the expansion rate to be 1, and each convolution layer in the 10 th Residual block forms an expanded convolution layer by setting the expansion rate to be 1;
for an input layer, the input end of the input layer receives an R channel component, a G channel component and a B channel component of an original input image, and the output end of the input layer outputs the R channel component, the G channel component and the B channel component of the original input image to a hidden layer; wherein, the width of the original input image received by the input end of the input layer is required to be W, and the height of the original input image is required to be H;
for the 1 st Residual block, the input end of the 1 st Residual block receives the R channel component, the G channel component and the B channel component of the original input image output by the output end of the input layer, the output end of the 1 st Residual block outputs 32 characteristic graphs, and the set formed by the 32 characteristic graphs is recorded as R1(ii) a Wherein R is1Each feature map in (1) has a width W and a height H;
for the 2 nd Residual block, the input terminal of the 2 nd Residual block receives R1The output end of the 2 nd Residual block outputs 32 characteristic graphs, and the set formed by the 32 characteristic graphs is marked as R2(ii) a Wherein R is2Each feature map in (1) has a width W and a height H;
for the 3 rd Residual block, the input terminal of the 3 rd Residual block receives R2The output end of the 3 rd Residual block outputs 64 characteristic graphs, and the set of the 64 characteristic graphs is marked as R3(ii) a Wherein R is3Each feature map in (1) has a width W and a height H;
for the 4 th Residual block, the input terminal of the 4 th Residual block receives R3The output end of the 4 th Residual block outputs 64 characteristic graphs, and the set of the 64 characteristic graphs is marked as R4(ii) a Wherein R is4Each feature map in (1) has a width W and a height H;
for the 5 th Residual block, the input terminal of the 5 th Residual block receives R4The output end of the 5 th Residual block outputs 128 characteristic graphs, and the set of the 128 characteristic graphs is marked as R5(ii) a Wherein R is5Each feature map in (1) has a width W and a height H;
for the 6 th Residual block, the input terminal of the 6 th Residual block receives R5The output end of the 6 th Residual block outputs 128 characteristic graphs, and the set of the 128 characteristic graphs is marked as R6(ii) a Wherein R is6Each feature map in (1) has a width W and a height H;
for the 7 th Residual block, the input terminal of the 7 th Residual block receives R6The output end of the 7 th Residual block outputs 64 characteristic graphs, and the set of the 64 characteristic graphs is marked as R7(ii) a Wherein R is7Each feature map in (1) has a width W and a height H;
for the 8 th Residual block, the input terminal of the 8 th Residual block receives R7The output end of the 8 th Residual block outputs 64 characteristic graphs, and the set of the 64 characteristic graphs is marked as R8(ii) a Wherein R is8Each feature map in (1) has a width W and a height H;
for the 9 th Residual block, the input terminal of the 9 th Residual block receives R8The output end of the 9 th Residual block outputs 32 characteristic graphs, and the set formed by the 32 characteristic graphs is marked as R9(ii) a Wherein R is9Each feature map in (1) has a width W and a height H;
for the 10 th Residual block, the input terminal of the 10 th Residual block receives R9The 10 th Residual block outputs 32 characteristic graphs, and the set of the 32 characteristic graphs is marked as R10(ii) a Wherein R is10Each feature map in (1) has a width W and a height H;
for the output layer, which consists of 1 convolutional layer,input terminal of output layer receives R10The output end of the output layer outputs 12 semantic segmentation prediction graphs corresponding to the original input image; wherein the width of each semantic segmentation prediction graph is W, and the height of each semantic segmentation prediction graph is H;
step 1_ 3: taking each original road scene image in the training set as an original input image, inputting the original input image into a convolutional neural network for training to obtain 12 semantic segmentation prediction graphs corresponding to each original road scene image in the training set, and performing semantic segmentation on the { I } graphq(i, j) } the set of 12 semantic segmentation prediction graphs is recorded as
Step 1_ 4: calculating loss function values between a set formed by 12 semantic segmentation prediction images corresponding to each original road scene image in the training set and a set formed by 12 single-hot coded images processed by corresponding real semantic segmentation images, and converting the loss function values into the loss function valuesAndthe value of the loss function in between is recorded as
Step 1_ 5: repeatedly executing the step 1_3 and the step 1_4 for V times to obtain a convolutional neural network classification training model, and obtaining Q multiplied by V loss function values; then finding out the loss function value with the minimum value from the Q multiplied by V loss function values; and then, correspondingly taking the weight vector and the bias item corresponding to the loss function value with the minimum value as the optimal weight vector and the optimal bias item of the convolutional neural network classification training model, and correspondingly marking as WbestAnd bbest(ii) a Wherein V is greater than 1;
the test stage process comprises the following specific steps:
step 2_ 1: order toRepresenting a road scene image to be semantically segmented; wherein, i ' is more than or equal to 1 and less than or equal to W ', j ' is more than or equal to 1 and less than or equal to H ', and W ' representsWidth of (A), H' representsThe height of (a) of (b),to representThe middle coordinate position is the pixel value of the pixel point of (i, j);
step 2_ 2: will be provided withThe R channel component, the G channel component and the B channel component are input into a convolutional neural network classification training model and are subjected to W-based classificationbestAnd bbestMaking a prediction to obtainCorresponding predictive semantic segmentation image, denotedWherein,to representAnd the pixel value of the pixel point with the middle coordinate position of (i ', j').
In the step 1-4, the first step,and obtaining by adopting classification cross entropy.
Compared with the prior art, the invention has the advantages that:
1) in the method, in the process of constructing the convolutional neural network, identical short connected Residual blocks (Residual network blocks) embedded in ResNet (Residual network) are introduced, the hidden layer of the convolutional neural network is formed by the stacked 10 Residual blocks, the setting of the Residual blocks increases the extraction capability of characteristic information, and the structural efficiency of basic modules of the Residual network is fully absorbed, so that the prediction accuracy of the trained convolutional neural network classification training model is improved.
2) The hidden layer of the convolutional neural network constructed by the method only adopts 10 Residual blocks, so that the cost loss of a series of problems such as redundancy, large data volume and the like is greatly reduced, and the computational complexity is low; the expansion convolution layer formed by the convolution layer through setting the expansion rate is adopted in each Residual block, the expansion convolution well avoids information lost in the size conversion process, the receptive field is expanded, the resolution of the characteristic graph is ensured to be unchanged, effective depth information is kept to the maximum extent, and the semantic segmentation prediction graph obtained in the training stage and the prediction semantic segmentation image obtained in the testing stage are high in resolution, accurate in boundary and good in spatial continuity.
3) The Residual block adopted by the method not only greatly improves the extraction strength of the characteristic information, but also prevents the overfitting of the model, has extremely strong robustness and greatly improves the segmentation efficiency.
Drawings
FIG. 1 is a block diagram of an overall implementation of the method of the present invention;
FIG. 2 is a schematic diagram of the structure of a convolutional neural network created by the method of the present invention;
FIG. 3a is a selected road scene image to be semantically segmented;
FIG. 3b is a real semantic segmentation image corresponding to the road scene image to be semantically segmented shown in FIG. 3 a;
FIG. 3c is a predicted semantic segmentation image obtained by predicting the road scene image to be semantically segmented shown in FIG. 3a by using the method of the present invention;
FIG. 4a is another selected road scene image to be semantically segmented;
FIG. 4b is a real semantic segmentation image corresponding to the road scene image to be semantically segmented shown in FIG. 4 a;
fig. 4c is a predicted semantic segmentation image obtained by predicting the road scene image to be semantically segmented shown in fig. 4a by using the method of the present invention.
Detailed Description
The invention is described in further detail below with reference to the accompanying examples.
The general implementation block diagram of the road scene segmentation method based on the residual error network and the dilation convolution is shown in fig. 1, and the road scene segmentation method comprises a training stage and a testing stage.
The specific steps of the training phase process are as follows:
step 1_ 1: selecting Q original road scene images and real semantic segmentation images corresponding to each original road scene image, forming a training set, and recording the Q-th original road scene image in the training set as { I }q(I, j) }, the training set is summed with { I }q(i, j) } the corresponding real semantic segmentation image is recorded asThen each of the training sets is encoded using the existing one-hot encoding technique (one-hot)Processing the real semantic segmentation image corresponding to the original road scene image into 12 single-hot coded imagesThe processed set of 12 one-hot coded images is denoted asThe road scene image is an RGB color image, Q is a positive integer, Q is more than or equal to 100, if Q is 100, Q is a positive integer, Q is more than or equal to 1 and less than or equal to Q, I is more than or equal to 1 and less than or equal to W, j is more than or equal to 1 and less than or equal to H, and W represents { I ≦ Hq(I, j) }, H denotes { I }q(I, j) } e.g. take W352, H480, Iq(I, j) represents { IqThe pixel value of the pixel point with the coordinate position (i, j) in (i, j),to representThe middle coordinate position is the pixel value of the pixel point of (i, j); here, the original road scene image directly selects 100 images in the road scene image database CamVid training set.
Step 1_ 2: constructing a convolutional neural network: as shown in fig. 2, the convolutional neural network includes an input layer, a hidden layer, and an output layer; the hidden layer is composed of 10 sequentially arranged Residual blocks, wherein each convolution layer in the 1 st Residual block forms an expanded convolution layer by setting an expansion ratio (1), each convolution layer in the 2 nd Residual block forms an expanded convolution layer by setting an expansion ratio (1), each convolution layer in the 3 rd Residual block forms an expanded convolution layer by setting an expansion ratio (2), each convolution layer in the 4 th Residual block forms an expanded convolution layer by setting an expansion ratio (2), each convolution layer in the 5 th Residual block forms an expanded convolution layer by setting an expansion ratio (4), each convolution layer in the 6 th Residual block forms an expanded convolution layer by setting an expansion ratio (4), each convolution layer in the 7 th Residual block forms a convolution layer by setting an expansion ratio (2), each expanded convolution layer in the 8 th Residual block forms an expanded convolution layer by setting an expansion ratio (2), each convolution layer in the 9 th Residual block forms an expanded convolution layer by setting the expansion ratio to 1, each convolution layer in the 10 th Residual block forms an expanded convolution layer by setting the expansion ratio to 1, and the convolution kernel size of the expanded convolution layer in the 10 Residual blocks remains unchanged and is 3 x 3.
For an input layer, the input end of the input layer receives an R channel component, a G channel component and a B channel component of an original input image, and the output end of the input layer outputs the R channel component, the G channel component and the B channel component of the original input image to a hidden layer; wherein the input end of the input layer is required to receive the original input image with width W and height H.
For the 1 st Residual block, the input end of the 1 st Residual block receives the R channel component, the G channel component and the B channel component of the original input image output by the output end of the input layer, the output end of the 1 st Residual block outputs 32 characteristic graphs, and the set formed by the 32 characteristic graphs is recorded as R1(ii) a Wherein R is1Each feature map in (1) has a width W and a height H.
For the 2 nd Residual block, the input terminal of the 2 nd Residual block receives R1The output end of the 2 nd Residual block outputs 32 characteristic graphs, and the set formed by the 32 characteristic graphs is marked as R2(ii) a Wherein R is2Each feature map in (1) has a width W and a height H.
For the 3 rd Residual block, the input terminal of the 3 rd Residual block receives R2The output end of the 3 rd Residual block outputs 64 characteristic graphs, and the set of the 64 characteristic graphs is marked as R3(ii) a Wherein R is3Each feature map in (1) has a width W and a height H.
For the 4 th Residual block, the input terminal of the 4 th Residual block receives R3All feature maps in (1), the 4 th Residual blockThe output end of the system outputs 64 characteristic diagrams, and a set of the 64 characteristic diagrams is marked as R4(ii) a Wherein R is4Each feature map in (1) has a width W and a height H.
For the 5 th Residual block, the input terminal of the 5 th Residual block receives R4The output end of the 5 th Residual block outputs 128 characteristic graphs, and the set of the 128 characteristic graphs is marked as R5(ii) a Wherein R is5Each feature map in (1) has a width W and a height H.
For the 6 th Residual block, the input terminal of the 6 th Residual block receives R5The output end of the 6 th Residual block outputs 128 characteristic graphs, and the set of the 128 characteristic graphs is marked as R6(ii) a Wherein R is6Each feature map in (1) has a width W and a height H.
For the 7 th Residual block, the input terminal of the 7 th Residual block receives R6The output end of the 7 th Residual block outputs 64 characteristic graphs, and the set of the 64 characteristic graphs is marked as R7(ii) a Wherein R is7Each feature map in (1) has a width W and a height H.
For the 8 th Residual block, the input terminal of the 8 th Residual block receives R7The output end of the 8 th Residual block outputs 64 characteristic graphs, and the set of the 64 characteristic graphs is marked as R8(ii) a Wherein R is8Each feature map in (1) has a width W and a height H.
For the 9 th Residual block, the input terminal of the 9 th Residual block receives R8The output end of the 9 th Residual block outputs 32 characteristic graphs, and the set formed by the 32 characteristic graphs is marked as R9(ii) a Wherein R is9Each feature map in (1) has a width W and a height H.
For the 10 th Residual block, the input terminal of the 10 th Residual block receives R9All the characteristics ofIn the figure, 32 characteristic graphs are output at the output end of the 10 th Residual block, and a set formed by the 32 characteristic graphs is marked as R10(ii) a Wherein R is10Each feature map in (1) has a width W and a height H.
For the output layer, which consists of 1 convolutional layer, the input of the output layer receives R10The output end of the output layer outputs 12 semantic segmentation prediction graphs corresponding to the original input image; wherein, the width of each semantic segmentation prediction graph is W, and the height of each semantic segmentation prediction graph is H.
Step 1_ 3: taking each original road scene image in the training set as an original input image, inputting the original input image into a convolutional neural network for training to obtain 12 semantic segmentation prediction graphs corresponding to each original road scene image in the training set, and performing semantic segmentation on the { I } graphq(i, j) } the set of 12 semantic segmentation prediction graphs is recorded as
Step 1_ 4: calculating loss function values between a set formed by 12 semantic segmentation prediction images corresponding to each original road scene image in the training set and a set formed by 12 single-hot coded images processed by corresponding real semantic segmentation images, and converting the loss function values into the loss function valuesAndthe value of the loss function in between is recorded asObtained using categorical cross entropy (categorical cross entropy).
Step 1_ 5: repeatedly executing the step 1_3 and the step 1_4 for V times to obtain a convolutional neural network classification training model, and obtaining Q multiplied by V loss function values; then finding out the loss function value with the minimum value from the Q multiplied by V loss function values; then the value is comparedAnd the weight vector and the bias item corresponding to the minimum loss function value are correspondingly used as the optimal weight vector and the optimal bias item of the convolutional neural network classification training model and are correspondingly marked as WbestAnd bbest(ii) a Where V > 1, in this example V is 300.
The test stage process comprises the following specific steps:
step 2_ 1: order toRepresenting a road scene image to be semantically segmented; wherein, i ' is more than or equal to 1 and less than or equal to W ', j ' is more than or equal to 1 and less than or equal to H ', and W ' representsWidth of (A), H' representsThe height of (a) of (b),to representAnd the middle coordinate position is the pixel value of the pixel point of (i, j).
Step 2_ 2: will be provided withThe R channel component, the G channel component and the B channel component are input into a convolutional neural network classification training model and are subjected to W-based classificationbestAnd bbestMaking a prediction to obtainCorresponding predictive semantic segmentation image, denotedWherein,to representAnd the pixel value of the pixel point with the middle coordinate position of (i ', j').
To further verify the feasibility and effectiveness of the method of the invention, experiments were performed.
The architecture of the convolutional neural network is constructed using a python-based deep learning library Keras 2.1.5. A road scene image database CamVid test set is adopted to analyze how the segmentation effect of the road scene image is obtained by prediction by the method. Here, the segmentation performance of the predicted semantic segmentation image is evaluated using 3 common objective parameters for evaluating the semantic segmentation method as evaluation indexes, i.e., Pixel Accuracy (PA), Mean Pixel Accuracy (MPA), and Mean intersection unit (MIoU).
The method is utilized to predict each road scene image in the road scene image database CamVid test set to obtain a predicted semantic segmentation image corresponding to each road scene image, the pixel precision PA, the average pixel precision MPA and the average cross-merge ratio MIoU which reflect the semantic segmentation effect of the method are listed in the table 1, and the higher the values of the pixel precision PA, the average pixel precision MPA and the average cross-merge ratio MIoU are, the higher the effectiveness and the prediction accuracy are. As can be seen from the data listed in Table 1, the segmentation result of the road scene image obtained by the method of the present invention is good, which indicates that it is feasible and effective to obtain the predicted semantic segmentation image corresponding to the road scene image by using the method of the present invention.
TABLE 1 evaluation results on test sets using the method of the invention
FIG. 3a shows a selected road scene image to be semantically segmented; FIG. 3b shows a real semantic segmentation image corresponding to the road scene image to be semantically segmented shown in FIG. 3 a; FIG. 3c shows a predicted semantic segmentation image obtained by predicting the road scene image to be semantically segmented shown in FIG. 3a by using the method of the present invention; FIG. 4a shows another selected road scene image to be semantically segmented; FIG. 4b shows a real semantic segmentation image corresponding to the road scene image to be semantically segmented shown in FIG. 4 a; fig. 4c shows a predicted semantic segmentation image obtained by predicting the road scene image to be semantically segmented shown in fig. 4a by using the method of the present invention. Comparing fig. 3b and fig. 3c, and comparing fig. 4b and fig. 4c, it can be seen that the predicted semantic segmentation image obtained by the method of the present invention has high segmentation accuracy, which is close to the real semantic segmentation image.
Claims (2)
1. A road scene segmentation method based on residual error network and expansion convolution is characterized by comprising a training stage and a testing stage;
the specific steps of the training phase process are as follows:
step 1_ 1: selecting Q original road scene images and real semantic segmentation images corresponding to each original road scene image, forming a training set, and recording the Q-th original road scene image in the training set as { I }q(I, j) }, the training set is summed with { I }q(i, j) } corresponding real semantic segmentation imageIs marked asThen, processing the real semantic segmentation image corresponding to each original road scene image in the training set into 12 single-hot coded images by adopting a single-hot coding technology, and processing the single-hot coded imagesThe processed set of 12 one-hot coded images is denoted asThe road scene image is an RGB color image, Q is a positive integer, Q is more than or equal to 100, Q is a positive integer, Q is more than or equal to 1 and less than or equal to Q, I is more than or equal to 1 and less than or equal to W, j is more than or equal to 1 and less than or equal to H, and W represents { I ≦ Iq(I, j) }, H denotes { I }qHeight of (I, j) }, Iq(I, j) represents { IqThe pixel value of the pixel point with the coordinate position (i, j) in (i, j),to representThe middle coordinate position is the pixel value of the pixel point of (i, j);
step 1_ 2: constructing a convolutional neural network: the convolutional neural network comprises an input layer, a hidden layer and an output layer; the hidden layers are composed of 10 sequentially arranged Residual blocks, wherein each convolution layer in the 1 st Residual block forms an expanded convolution layer by setting the expansion ratio to 1, each convolution layer in the 2 nd Residual block forms an expanded convolution layer by setting the expansion ratio to 1, each convolution layer in the 3 rd Residual block forms an expanded convolution layer by setting the expansion ratio to 2, each convolution layer in the 4 th Residual block forms an expanded convolution layer by setting the expansion ratio to 2, each convolution layer in the 5 th Residual block forms an expanded convolution layer by setting the expansion ratio to 4, each convolution layer in the 6 th Residual block forms an expanded convolution layer by setting the expansion ratio to 4, each convolution layer in the 7 th Residual block forms an expanded convolution layer by setting the expansion ratio to 2, each convolution layer in the 8 th Residual block forms an expanded convolution layer by setting the expansion ratio to 2, each convolution layer in the 9 th Residual block forms an expanded convolution layer by setting the expansion rate to be 1, and each convolution layer in the 10 th Residual block forms an expanded convolution layer by setting the expansion rate to be 1;
for an input layer, the input end of the input layer receives an R channel component, a G channel component and a B channel component of an original input image, and the output end of the input layer outputs the R channel component, the G channel component and the B channel component of the original input image to a hidden layer; wherein, the width of the original input image received by the input end of the input layer is required to be W, and the height of the original input image is required to be H;
for the 1 st Residual block, the input end of the 1 st Residual block receives the R channel component, the G channel component and the B channel component of the original input image output by the output end of the input layer, the output end of the 1 st Residual block outputs 32 characteristic graphs, and the set formed by the 32 characteristic graphs is recorded as R1(ii) a Wherein R is1Each feature map in (1) has a width W and a height H;
for the 2 nd Residual block, the input terminal of the 2 nd Residual block receives R1The output end of the 2 nd Residual block outputs 32 characteristic graphs, and the set formed by the 32 characteristic graphs is marked as R2(ii) a Wherein R is2Each feature map in (1) has a width W and a height H;
for the 3 rd Residual block, the input terminal of the 3 rd Residual block receives R2The output end of the 3 rd Residual block outputs 64 characteristic graphs, and the set of the 64 characteristic graphs is marked as R3(ii) a Wherein R is3Each feature map in (1) has a width W and a height H;
for the 4 th Residual block, the input terminal of the 4 th Residual block receives R3The output end of the 4 th Residual block outputs 64 characteristic graphs, and the set of the 64 characteristic graphs is marked as R4(ii) a Wherein R is4Each feature map in (1) has a width W and a height H;
for the 5 th Residual block, the input terminal of the 5 th Residual block receives R4The output end of the 5 th Residual block outputs 128 characteristic graphs, and the set of the 128 characteristic graphs is marked as R5(ii) a Wherein R is5Each feature map in (1) has a width W and a height H;
for the 6 th Residual block, the input terminal of the 6 th Residual block receives R5The output end of the 6 th Residual block outputs 128 characteristic graphs, and the set of the 128 characteristic graphs is marked as R6(ii) a Wherein R is6Each feature map in (1) has a width W and a height H;
for the 7 th Residual block, the input terminal of the 7 th Residual block receives R6The output end of the 7 th Residual block outputs 64 characteristic graphs, and the set of the 64 characteristic graphs is marked as R7(ii) a Wherein R is7Each feature map in (1) has a width W and a height H;
for the 8 th Residual block, the input terminal of the 8 th Residual block receives R7The output end of the 8 th Residual block outputs 64 characteristic graphs, and the set of the 64 characteristic graphs is marked as R8(ii) a Wherein R is8Each feature map in (1) has a width W and a height H;
for the 9 th Residual block, the input terminal of the 9 th Residual block receives R8The output end of the 9 th Residual block outputs 32 characteristic graphs, and the set formed by the 32 characteristic graphs is marked as R9(ii) a Wherein R is9Each feature map in (1) has a width W and a height H;
for the 10 th Residual block, the input terminal of the 10 th Residual block receives R9The 10 th Residual block outputs 32 characteristic graphs, and the set of the 32 characteristic graphs is marked as R10(ii) a Wherein R is10Each feature map in (1) has a width W and a height H;
for the output layer, which consists of 1 convolutional layer, the input of the output layer receives R10All characteristic diagrams in (1)The output end of the output layer outputs 12 semantic segmentation prediction graphs corresponding to the original input image; wherein the width of each semantic segmentation prediction graph is W, and the height of each semantic segmentation prediction graph is H;
step 1_ 3: taking each original road scene image in the training set as an original input image, inputting the original input image into a convolutional neural network for training to obtain 12 semantic segmentation prediction graphs corresponding to each original road scene image in the training set, and performing semantic segmentation on the { I } graphq(i, j) } the set of 12 semantic segmentation prediction graphs is recorded as
Step 1_ 4: calculating loss function values between a set formed by 12 semantic segmentation prediction images corresponding to each original road scene image in the training set and a set formed by 12 single-hot coded images processed by corresponding real semantic segmentation images, and converting the loss function values into the loss function valuesAndthe value of the loss function in between is recorded as
Step 1_ 5: repeatedly executing the step 1_3 and the step 1_4 for V times to obtain a convolutional neural network classification training model, and obtaining Q multiplied by V loss function values; then finding out the loss function value with the minimum value from the Q multiplied by V loss function values; and then, correspondingly taking the weight vector and the bias item corresponding to the loss function value with the minimum value as the optimal weight vector and the optimal bias item of the convolutional neural network classification training model, and correspondingly marking as WbestAnd bbest(ii) a Wherein V is greater than 1;
the test stage process comprises the following specific steps:
step 2_ 1: order toRepresenting a road scene image to be semantically segmented; wherein, i ' is more than or equal to 1 and less than or equal to W ', j ' is more than or equal to 1 and less than or equal to H ', and W ' representsWidth of (A), H' representsThe height of (a) of (b),to representThe middle coordinate position is the pixel value of the pixel point of (i, j);
step 2_ 2: will be provided withThe R channel component, the G channel component and the B channel component are input into a convolutional neural network classification training model and are subjected to W-based classificationbestAnd bbestMaking a prediction to obtainCorresponding predictive semantic segmentation image, denotedWherein,to representAnd the pixel value of the pixel point with the middle coordinate position of (i ', j').
2. The method of claim 1, wherein the road scene segmentation method is based on residual error network and dilation convolutionCharacterized in that in the step 1-4,and obtaining by adopting classification cross entropy.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811293377.5A CN109635642A (en) | 2018-11-01 | 2018-11-01 | A kind of road scene dividing method based on residual error network and expansion convolution |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811293377.5A CN109635642A (en) | 2018-11-01 | 2018-11-01 | A kind of road scene dividing method based on residual error network and expansion convolution |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109635642A true CN109635642A (en) | 2019-04-16 |
Family
ID=66067090
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811293377.5A Withdrawn CN109635642A (en) | 2018-11-01 | 2018-11-01 | A kind of road scene dividing method based on residual error network and expansion convolution |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109635642A (en) |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110232721A (en) * | 2019-05-16 | 2019-09-13 | 福建自贸试验区厦门片区Manteia数据科技有限公司 | A kind of crisis organ delineates the training method and device of model automatically |
CN110276316A (en) * | 2019-06-26 | 2019-09-24 | 电子科技大学 | A kind of human body critical point detection method based on deep learning |
CN110287798A (en) * | 2019-05-27 | 2019-09-27 | 魏运 | Vector network pedestrian detection method based on characteristic module and context fusion |
CN110287932A (en) * | 2019-07-02 | 2019-09-27 | 中国科学院遥感与数字地球研究所 | Route denial information extraction based on the segmentation of deep learning image, semantic |
CN110490082A (en) * | 2019-07-23 | 2019-11-22 | 浙江科技学院 | A kind of road scene semantic segmentation method of effective integration neural network characteristics |
CN110728682A (en) * | 2019-09-09 | 2020-01-24 | 浙江科技学院 | Semantic segmentation method based on residual pyramid pooling neural network |
CN110782462A (en) * | 2019-10-30 | 2020-02-11 | 浙江科技学院 | Semantic segmentation method based on double-flow feature fusion |
CN110782458A (en) * | 2019-10-23 | 2020-02-11 | 浙江科技学院 | Object image 3D semantic prediction segmentation method of asymmetric coding network |
CN110991415A (en) * | 2019-12-21 | 2020-04-10 | 武汉中海庭数据技术有限公司 | Structural target high-precision segmentation method, electronic equipment and storage medium |
CN111401436A (en) * | 2020-03-13 | 2020-07-10 | 北京工商大学 | Streetscape image segmentation method fusing network and two-channel attention mechanism |
CN111507990A (en) * | 2020-04-20 | 2020-08-07 | 南京航空航天大学 | Tunnel surface defect segmentation method based on deep learning |
CN112529064A (en) * | 2020-12-03 | 2021-03-19 | 燕山大学 | Efficient real-time semantic segmentation method |
-
2018
- 2018-11-01 CN CN201811293377.5A patent/CN109635642A/en not_active Withdrawn
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110232721A (en) * | 2019-05-16 | 2019-09-13 | 福建自贸试验区厦门片区Manteia数据科技有限公司 | A kind of crisis organ delineates the training method and device of model automatically |
CN110287798A (en) * | 2019-05-27 | 2019-09-27 | 魏运 | Vector network pedestrian detection method based on characteristic module and context fusion |
CN110276316B (en) * | 2019-06-26 | 2022-05-24 | 电子科技大学 | Human body key point detection method based on deep learning |
CN110276316A (en) * | 2019-06-26 | 2019-09-24 | 电子科技大学 | A kind of human body critical point detection method based on deep learning |
CN110287932A (en) * | 2019-07-02 | 2019-09-27 | 中国科学院遥感与数字地球研究所 | Route denial information extraction based on the segmentation of deep learning image, semantic |
CN110490082A (en) * | 2019-07-23 | 2019-11-22 | 浙江科技学院 | A kind of road scene semantic segmentation method of effective integration neural network characteristics |
CN110490082B (en) * | 2019-07-23 | 2022-04-05 | 浙江科技学院 | Road scene semantic segmentation method capable of effectively fusing neural network features |
CN110728682A (en) * | 2019-09-09 | 2020-01-24 | 浙江科技学院 | Semantic segmentation method based on residual pyramid pooling neural network |
CN110728682B (en) * | 2019-09-09 | 2022-03-29 | 浙江科技学院 | Semantic segmentation method based on residual pyramid pooling neural network |
CN110782458A (en) * | 2019-10-23 | 2020-02-11 | 浙江科技学院 | Object image 3D semantic prediction segmentation method of asymmetric coding network |
CN110782458B (en) * | 2019-10-23 | 2022-05-31 | 浙江科技学院 | Object image 3D semantic prediction segmentation method of asymmetric coding network |
CN110782462A (en) * | 2019-10-30 | 2020-02-11 | 浙江科技学院 | Semantic segmentation method based on double-flow feature fusion |
CN110782462B (en) * | 2019-10-30 | 2022-08-09 | 浙江科技学院 | Semantic segmentation method based on double-flow feature fusion |
CN110991415A (en) * | 2019-12-21 | 2020-04-10 | 武汉中海庭数据技术有限公司 | Structural target high-precision segmentation method, electronic equipment and storage medium |
CN111401436A (en) * | 2020-03-13 | 2020-07-10 | 北京工商大学 | Streetscape image segmentation method fusing network and two-channel attention mechanism |
CN111401436B (en) * | 2020-03-13 | 2023-04-18 | 中国科学院地理科学与资源研究所 | Streetscape image segmentation method fusing network and two-channel attention mechanism |
CN111507990B (en) * | 2020-04-20 | 2022-02-11 | 南京航空航天大学 | Tunnel surface defect segmentation method based on deep learning |
CN111507990A (en) * | 2020-04-20 | 2020-08-07 | 南京航空航天大学 | Tunnel surface defect segmentation method based on deep learning |
CN112529064A (en) * | 2020-12-03 | 2021-03-19 | 燕山大学 | Efficient real-time semantic segmentation method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109635642A (en) | A kind of road scene dividing method based on residual error network and expansion convolution | |
CN110782462B (en) | Semantic segmentation method based on double-flow feature fusion | |
CN110728682B (en) | Semantic segmentation method based on residual pyramid pooling neural network | |
CN111401436B (en) | Streetscape image segmentation method fusing network and two-channel attention mechanism | |
CN113469094A (en) | Multi-mode remote sensing data depth fusion-based earth surface coverage classification method | |
CN110929736B (en) | Multi-feature cascading RGB-D significance target detection method | |
CN112116030A (en) | Image classification method based on vector standardization and knowledge distillation | |
CN110490205B (en) | Road scene semantic segmentation method based on full-residual-error hole convolutional neural network | |
CN109635662B (en) | Road scene semantic segmentation method based on convolutional neural network | |
CN112396607A (en) | Streetscape image semantic segmentation method for deformable convolution fusion enhancement | |
CN112489164B (en) | Image coloring method based on improved depth separable convolutional neural network | |
CN109635763B (en) | Crowd density estimation method | |
CN113269787A (en) | Remote sensing image semantic segmentation method based on gating fusion | |
CN116797787B (en) | Remote sensing image semantic segmentation method based on cross-modal fusion and graph neural network | |
CN109446933B (en) | Road scene semantic segmentation method based on convolutional neural network | |
CN112560966B (en) | Polarized SAR image classification method, medium and equipment based on scattering map convolution network | |
CN109508639B (en) | Road scene semantic segmentation method based on multi-scale porous convolutional neural network | |
CN110647990A (en) | Cutting method of deep convolutional neural network model based on grey correlation analysis | |
CN113516133A (en) | Multi-modal image classification method and system | |
CN117237559A (en) | Digital twin city-oriented three-dimensional model data intelligent analysis method and system | |
CN116844004A (en) | Point cloud automatic semantic modeling method for digital twin scene | |
CN115222754A (en) | Mirror image segmentation method based on knowledge distillation and antagonistic learning | |
CN114581789A (en) | Hyperspectral image classification method and system | |
CN114565625A (en) | Mineral image segmentation method and device based on global features | |
CN112488117B (en) | Point cloud analysis method based on direction-induced convolution |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20190416 |