CN110490205A - Road scene semantic segmentation method based on the empty convolutional neural networks of Complete Disability difference - Google Patents
Road scene semantic segmentation method based on the empty convolutional neural networks of Complete Disability difference Download PDFInfo
- Publication number
- CN110490205A CN110490205A CN201910664797.8A CN201910664797A CN110490205A CN 110490205 A CN110490205 A CN 110490205A CN 201910664797 A CN201910664797 A CN 201910664797A CN 110490205 A CN110490205 A CN 110490205A
- Authority
- CN
- China
- Prior art keywords
- neural network
- convolution kernel
- network block
- layer
- output end
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Computational Biology (AREA)
- General Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a kind of road scene semantic segmentation methods based on the empty convolutional neural networks of Complete Disability difference, it is in the empty convolutional neural networks of training stage building Complete Disability difference, it includes input layer, hidden layer and output layer, and hidden layer includes 1 transition convolution block, 8 neural network blocks, 7 warp blocks, 4 fused layers;The original road scene image of every in training set is input in the empty convolutional neural networks of Complete Disability difference and is trained, the corresponding 12 width semantic segmentation prognostic chart of every original road scene image is obtained;By calculate set that the corresponding 12 width semantic segmentation prognostic chart of every original road scene image is constituted and corresponding true semantic segmentation image procossing at 12 width one-hot coding image constructions set between loss function value, obtain the empty convolutional neural networks training pattern of Complete Disability difference;It is predicted in test phase using the empty convolutional neural networks training pattern of Complete Disability difference;Advantage is its segmentation accuracy height, and strong robustness.
Description
Technical field
The present invention relates to a kind of semantic segmentation methods of deep learning, more particularly, to one kind based on the empty convolution of Complete Disability difference
The road scene semantic segmentation method of neural network.
Background technique
The rise of intelligent transportation industry, so that semantic segmentation has more and more applications in intelligent transportation system, from
Traffic scene understands and multiple target obstacle detection can all be realized to vision guided navigation by semantic segmentation technology.Currently, the most frequently used
Semantic segmentation method have support vector machines, random forest scheduling algorithm.These traditional machine learning methods are concentrated mainly on two
In classification task, for detecting and identifying certain objects, such as road surface, vehicle and pedestrian, and generally require to pass through height
The feature of complexity is realized.
The semantic segmentation method of deep learning directly carries out the training of pixel scale end-to-end (end-to-end),
It only needs to input the image in training set into training in model framework, obtains the corresponding weight of model, it can be to test set
It is predicted.The powerful place of convolutional neural networks is its automatic learning characteristic of multilayered structure energy, and may learn
The feature of many levels.Currently, the semantic segmentation frame based on deep learning is coding-decoding framework, cataloged procedure substantially
In by pond layer gradually decrease location information, extract abstract characteristics;It is gradually recovered location information during decoding, it is general to decode
There is direct connection between coding.And convolution (dilated convolutions) with holes is as side common in segmentation task
Method has abandoned pond layer, expands perception domain by way of convolution with holes, and the convolution with holes perception domain of smaller value is smaller, learns
It practises to the specific feature in part;The convolution with holes of the larger value has biggish perception domain, can learn to more abstract feature,
These abstract features are more preferable to robustness such as size, the position and direction of object.
The method that existing road scene semantic segmentation method mostly uses greatly deep learning, using deep learning come to road
It is simple and convenient that scene carries out semantic segmentation, it is often more important that, the application of deep learning greatly improves road scene image picture
The precision of plain grade classification task.Currently, the road scene semantic segmentation method based on deep learning utilizes convolutional layer and pond layer
The model combined is more, however the characteristic pattern for utilizing pondization operation to obtain with convolution operation merely is single and does not have representative
Property, so that the characteristic information for the image that will lead to is reduced, the effect information for eventually resulting in reduction is relatively rough, segmentation essence
It spends low.
Summary of the invention
Technical problem to be solved by the invention is to provide a kind of road fields based on the empty convolutional neural networks of Complete Disability difference
Scape semantic segmentation method, segmentation accuracy is high, and strong robustness.
The technical scheme of the invention to solve the technical problem is: a kind of based on the empty convolutional Neural net of Complete Disability difference
The road scene semantic segmentation method of network, it is characterised in that including two processes of training stage and test phase;
The specific steps of the training stage process are as follows:
Step 1_1: Q original road scene image and the corresponding true language of every original road scene image are chosen
The q original road scene image in training set is denoted as { I by adopted segmented image, and composing training collectionq(i, j) }, it will instruct
Practice and concentrates and { Iq(i, j) } corresponding true semantic segmentation image is denoted asThen it will be instructed using one-hot coding technology
The corresponding true semantic segmentation image procossing of every original road scene image that white silk is concentrated, will at 12 width one-hot coding imagesThe set for 12 width one-hot coding image constructions being processed into is denoted asWherein, road scene image is RGB color
Chromatic graph picture, Q are positive integer, and Q >=200, q are positive integer, and 1≤q≤Q, 1≤i≤W, 1≤j≤H, W indicate { Iq(i, j) } width
Degree, H indicate { Iq(i, j) } height, Iq(i, j) indicates { Iq(i, j) } in coordinate position be (i, j) pixel pixel value,It indicatesMiddle coordinate position is the pixel value of the pixel of (i, j);
Step 1_2: the empty convolutional neural networks of building Complete Disability difference: the empty convolutional neural networks of Complete Disability difference include input layer,
Hidden layer and output layer, hidden layer include 1 transition convolution block, 8 neural network blocks, 7 warp blocks, 4 fused layers;
For input layer, input terminal receives R channel components, G channel components and the channel B component of a width input picture,
Its output end exports the R channel components, G channel components and channel B component of input picture to hidden layer;Wherein, it is desirable that input layer
The width of the received input picture of input terminal be W, be highly H;
For hidden layer, the input terminal of transition convolution block is the input terminal of hidden layer, receives the output end output of input layer
Input picture R channel components, G channel components and channel B component, it is W that the output end of transition convolution block, which exports 64 breadth degree,
And height is the characteristic pattern of H, and the set that this 64 width characteristic pattern is constituted is denoted as G1;The input terminal of 1st neural network block receives G1
In all characteristic patterns, the output end of the 1st neural network block exports 128 breadth degree and isAnd height isCharacteristic pattern, will
The set that this 128 width characteristic pattern is constituted is denoted as S1;The input terminal of 2nd neural network block receives S1In all characteristic patterns, the 2nd
The output end of a neural network block exports 256 breadth degreeAnd height isCharacteristic pattern, by this 256 width characteristic pattern constitute
Set be denoted as S2;The input terminal of 3rd neural network block receives S2In all characteristic patterns, the output of the 3rd neural network block
End exports 512 breadth degreeAnd height isCharacteristic pattern, by this 512 width characteristic pattern constitute set be denoted as S3;4th
The input terminal of neural network block receives S3In all characteristic patterns, the output end of the 4th neural network block exports 1024 breadth degree
ForAnd height isCharacteristic pattern, by this 1024 width characteristic pattern constitute set be denoted as S4;The input of 1st warp block
End receives S4In all characteristic patterns, the output end of the 1st warp block exports 512 breadth degree and isAnd height isSpy
The set that this 512 width characteristic pattern is constituted is denoted as F by sign figure1;The input terminal of 5th neural network block receives S3In all features
Figure, the output end of the 5th neural network block export 512 breadth degree and areAnd height isCharacteristic pattern, by this 512 width feature
The set that figure is constituted is denoted as S5;The input terminal of 1st fused layer receives F1In all characteristic patterns and S5In all characteristic patterns,
The output end 512 breadth degree of output of the 1st fused layer are after addition mixing operationAnd height isCharacteristic pattern, by this 512
The set that width characteristic pattern is constituted is denoted as A1;The input terminal of 2nd warp block receives A1In all characteristic patterns, the 2nd deconvolution
The output end of block exports 256 breadth degreeAnd height isCharacteristic pattern, by this 256 width characteristic pattern constitute set be denoted as
F2;The input terminal of 6th neural network block receives S2In all characteristic patterns, the output end output 256 of the 6th neural network block
Breadth degree isAnd height isCharacteristic pattern, by this 256 width characteristic pattern constitute set be denoted as S6;3rd warp block
Input terminal receives S3In all characteristic patterns, the output end of the 3rd warp block exports 256 breadth degree and isAnd height is
Characteristic pattern, by this 256 width characteristic pattern constitute set be denoted as F3;The input terminal of 2nd fused layer receives F2In all features
Figure, S6In all characteristic patterns and F3In all characteristic patterns, be added the output end output 256 of the 2nd fused layer after mixing operation
Breadth degree isAnd height isCharacteristic pattern, by this 256 width characteristic pattern constitute set be denoted as A2;4th warp block
Input terminal receives A2In all characteristic patterns, the output end of the 4th warp block exports 128 breadth degree and isAnd height is
Characteristic pattern, by this 128 width characteristic pattern constitute set be denoted as F4;The input terminal of 7th neural network block receives S1In it is all
Characteristic pattern, the output end of the 7th neural network block export 128 breadth degree and areAnd height isCharacteristic pattern, by this 128 width
The set that characteristic pattern is constituted is denoted as S7;The input terminal of 5th warp block receives S2In all characteristic patterns, the 5th warp block
Output end export 128 breadth degree beAnd height isCharacteristic pattern, by this 128 width characteristic pattern constitute set be denoted as F5;
The input terminal of 3rd fused layer receives F4In all characteristic patterns, S7In all characteristic patterns and F5In all characteristic patterns, phase
Add the output end of the 3rd fused layer after mixing operation to export 128 breadth degree to beAnd height isCharacteristic pattern, by this 128 width
The set that characteristic pattern is constituted is denoted as A3;The input terminal of 6th warp block receives A3In all characteristic patterns, the 6th warp block
Output end export 64 breadth degree be W and height be H characteristic pattern, by this 64 width characteristic pattern constitute set be denoted as F6;8th
The input terminal of neural network block receives G1In all characteristic patterns, it is W that the output end of the 8th neural network block, which exports 64 breadth degree,
And height is the characteristic pattern of H, and the set that this 64 width characteristic pattern is constituted is denoted as S8;The input terminal of 7th warp block receives S1In
All characteristic patterns, the output end of the 7th warp block exports the characteristic pattern that 64 breadth degree are W and height is H, by this 64 width spy
The set that sign figure is constituted is denoted as F7;The input terminal of 4th fused layer receives F6In all characteristic patterns, S8In all characteristic patterns
And F7In all characteristic patterns, the output end for being added the 4th fused layer after mixing operation exports that 64 breadth degree are W and height is H
Characteristic pattern, by this 64 width characteristic pattern constitute set be denoted as A4, the output end of the 4th fused layer is the output end of hidden layer;
For output layer, input terminal receives A4In all characteristic patterns, output end export 12 breadth degree be W and height
For the characteristic pattern of H, the set that this 12 width characteristic pattern is constituted is denoted as O1;
Step 1_3: using the original road scene image of every in training set as input picture, it is empty to be input to Complete Disability difference
It is trained in the convolutional neural networks of hole, obtains the corresponding 12 width semanteme point of every original road scene image in training set
Prognostic chart is cut, by { Iq(i, j) } set that constitutes of corresponding 12 width semantic segmentation prognostic chart is denoted as
Step 1_4: the corresponding 12 width semantic segmentation prognostic chart of every original road scene image in training set is calculated
Composition set with corresponding true semantic segmentation image procossing at 12 width one-hot coding image constructions set between damage
Functional value is lost, it willWithBetween loss function value be denoted asUsing
Negative Log-liklihood function obtains;
Step 1_5: repeating step 1_3 and step 1_4 is V times total, obtains the empty convolutional neural networks training of Complete Disability difference
Model, and Q × V loss function value is obtained;Then the smallest loss function value of value is found out from Q × V loss function value;
Then it will be worth the corresponding weighted vector of the smallest loss function value and bias term to should be used as the empty convolutional neural networks of Complete Disability difference
The optimal bias term of best initial weights vector sum of training pattern, correspondence are denoted as WbestAnd bbest;Wherein, V > 1;
The specific steps of the test phase process are as follows:
Step 2_1: it enablesIndicate the road scene image to semantic segmentation;Wherein, 1≤i'≤W', 1≤j'≤
H', W' are indicatedWidth, H' indicateHeight,It indicatesMiddle coordinate position is
The pixel value of the pixel of (i, j);
Step 2_2: willR channel components, G channel components and channel B component be input to Complete Disability difference cavity volume
In product neural network training model, and utilize WbestAnd bbestIt is predicted, is obtainedCorresponding prediction semantic segmentation figure
Picture is denoted asWherein,It indicatesMiddle coordinate position is the picture of the pixel of (i', j')
Element value.
In the step 1_2, transition convolution block is by the first convolutional layer, the first batch normalization layer, that set gradually
One active coating, the second convolutional layer, second batch normalization layer, the second active coating, third convolutional layer, third batch normalization layer,
Third active coating composition, the input terminal of the first convolutional layer are the input terminal of transition convolution block, and the first batch normalizes the input of layer
End receives all characteristic patterns of the output end output of the first convolutional layer, and the input terminal of the first active coating receives the first batch and normalizes
All characteristic patterns of the output end output of layer, the input terminal of the second convolutional layer receive all of the output end output of the first active coating
Characteristic pattern, the input terminal of the second batch normalization layer receive all characteristic patterns of the output end output of the second convolutional layer, and second swashs
The input terminal of layer living receives all characteristic patterns of the output end output of the second batch normalization layer, the input termination of third convolutional layer
All characteristic patterns of the output end output of the second active coating are received, the input terminal of third batch normalization layer receives third convolutional layer
All characteristic patterns of output end output, the input terminal of third active coating receive the institute of the output end output of third batch normalization layer
There is characteristic pattern, the output end of third active coating is the output end of transition convolution block;Wherein, the first convolutional layer, the second convolutional layer and
It is 64, zero padding parameter be " same ", step-length is 1 that the convolution kernel size of third convolutional layer, which is 3 × 3, convolution kernel number,
The active mode of first active coating, the second active coating and third active coating is " Relu ".
In the step 1_2, the structure of the 1st to the 4th neural network block is identical, by the Volume Four set gradually
Lamination, the first R type neural network block and the first Type B neural network block composition, the input terminal of Volume Four lamination are the mind where it
Input terminal through network block, the input terminal of the first R type neural network block receive all spies of the output end output of Volume Four lamination
Sign figure, the input terminal of the first Type B neural network block receive all characteristic patterns of the output end output of the first R type neural network block,
The output end of first Type B neural network block is the output end of the neural network block where it;Wherein, in the 1st neural network block
Volume Four lamination convolution kernel size be 3 × 3, convolution kernel number be 128, zero padding parameter be " same ", step-length 2, the 1st
The convolution kernel number of the first R type neural network block and the first Type B neural network block in neural network block is 128, the 2nd mind
Convolution kernel size through the Volume Four lamination in network block be 3 × 3, convolution kernel number be 256, zero padding parameter be " same ", step
The convolution kernel number of the first R type neural network block and the first Type B neural network block in a length of 2, the 2nd neural network block is
It is 512, zero padding parameter that the convolution kernel size of Volume Four lamination in 256, the 3rd neural network block, which is 3 × 3, convolution kernel number,
For " same ", step-length 2, the volume of the first R type neural network block and the first Type B neural network block in the 3rd neural network block
Product core number is 512, and the convolution kernel size of the Volume Four lamination in the 4th neural network block is that 3 × 3, convolution kernel number is
1024, zero padding parameter is " same ", step-length 2, the first R type neural network block and the first Type B mind in the 4th neural network block
Convolution kernel number through network block is 1024;
The structure of 5th to the 8th neural network block is identical, the 2nd R type neural network block by setting gradually and
Two Type B neural network blocks composition, the input terminal of the 2nd R type neural network block are the input terminal of the neural network block where it, the
The input terminal of two Type B neural network blocks receives all characteristic patterns of the output end output of the 2nd R type neural network block, the second Type B
The output end of neural network block is the output end of the neural network block where it;Wherein, the 2nd R in the 5th neural network block
The convolution kernel number of type neural network block and the second Type B neural network block is 512, the 2nd R type in the 6th neural network block
The convolution kernel number of neural network block and the second Type B neural network block is 256, the 2nd R type mind in the 7th neural network block
Convolution kernel number through network block and the second Type B neural network block is 128, the 2nd R type nerve in the 8th neural network block
The convolution kernel number of network block and the second Type B neural network block is 64.
The first R type neural network block is identical with the structure of the 2nd R type neural network block, by successively setting
The 5th convolutional layer set, the 4th batch normalization layer, the 4th active coating, the first empty convolutional layer, the 5th batch normalization layer, the
Five active coatings, the 6th convolutional layer, the 6th batch normalization layer, the 6th active coating composition, the input terminal of the 5th convolutional layer is its institute
R type neural network block input terminal, the 4th batch normalization layer input terminal receive the 5th convolutional layer output end output
All characteristic patterns, the input terminal of the 4th active coating receives all characteristic patterns of the output end output of the 4th batch normalization layer,
The input terminal of first empty convolutional layer receives all characteristic patterns of the output end output of the 4th active coating, the 5th batch normalization layer
Input terminal receive the first empty convolutional layer output end output all characteristic patterns, the input terminal of the 5th active coating receives the 5th
All characteristic patterns of the output end output of batch normalization layer, the input terminal of the 6th convolutional layer receive the output end of the 5th active coating
All characteristic patterns of output, the input terminal of the 6th batch normalization layer receive all features of the output end output of the 6th convolutional layer
Figure, the input terminal of the 6th active coating receive all characteristic patterns of the output end output of the 6th batch normalization layer, will enter into the
All characteristic patterns of the output end output of all characteristic patterns and the 6th active coating of the input terminal of five convolutional layers carry out jump connection
All characteristic patterns of the output end output of R type neural network block as where afterwards;Wherein, in the 1st neural network block
In first R type neural network block, the convolution kernel size of the 5th convolutional layer and the 6th convolutional layer is that 3 × 3, convolution kernel number is
128, it is 1 that zero padding parameter, which is " same ", step-length, and the convolution kernel size of the first empty convolutional layer is 3 × 3, convolution kernel number
It is for 128, zero padding parameter " same ", step-length 1, empty deconvolution parameter be 2;The first R type mind in the 2nd neural network block
Through in network block, it is 256, zero padding that the convolution kernel size of the 5th convolutional layer and the 6th convolutional layer, which is 3 × 3, convolution kernel number,
Parameter is that " same ", step-length are 1, the convolution kernel size of the first empty convolutional layer be 3 × 3, convolution kernel number be 256, benefit
Zero parameter is " same ", step-length 1, empty deconvolution parameter be 2;The first R type neural network block in the 3rd neural network block
In, the convolution kernel size of the 5th convolutional layer and the 6th convolutional layer is that 3 × 3, convolution kernel number is that 512, zero padding parameter is
" same ", step-length are 1, and the convolution kernel size of the first empty convolutional layer is 3 × 3, convolution kernel number is 512, zero padding parameter is "
Same ", step-length 1, empty deconvolution parameter are 2;In the first R type neural network block in the 4th neural network block, volume five
The convolution kernel size of lamination and the 6th convolutional layer is that 3 × 3, convolution kernel number is that 1024, zero padding parameter is " same ", walks
Length is 1, and the convolution kernel size of the first empty convolutional layer is 3 × 3, convolution kernel number is 1024, zero padding parameter is " same ", step
A length of 1, empty deconvolution parameter is 2;In the 2nd R type neural network block in the 5th neural network block, the 5th convolutional layer and
It is 512, zero padding parameter be " same ", step-length is 1 that the convolution kernel size of six convolutional layers, which is 3 × 3, convolution kernel number,
The convolution kernel size of first empty convolutional layer is 3 × 3, convolution kernel number is 512, zero padding parameter is " same ", step-length 1, sky
Hole deconvolution parameter is 2;In the 2nd R type neural network block in the 6th neural network block, the 5th convolutional layer and the 6th convolutional layer
Convolution kernel size be 3 × 3, convolution kernel number be 256, zero padding parameter be " same ", step-length be 1, the first cavity
The convolution kernel size of convolutional layer is 3 × 3, convolution kernel number is 256, zero padding parameter is " same ", step-length 1, empty convolution ginseng
Number is 2;In the 2nd R type neural network block in the 7th neural network block, the convolution kernel of the 5th convolutional layer and the 6th convolutional layer
It is 128, zero padding parameter be " same ", step-length is 1 that size, which is 3 × 3, convolution kernel number, the first empty convolutional layer
Convolution kernel size be 3 × 3, convolution kernel number be 128, zero padding parameter is " same ", step-length 1, empty deconvolution parameter be 2;In
In the 2nd R type neural network block in 8th neural network block, the convolution kernel size of the 5th convolutional layer and the 6th convolutional layer is
3 × 3, it be " same ", step-length is 1 that convolution kernel number, which is 64, zero padding parameter, the convolution kernel size of the first empty convolutional layer
For 3 × 3, convolution kernel number be 64, zero padding parameter is " same ", step-length 1, empty deconvolution parameter be 2;4th active coating,
The active mode of five active coatings and the 6th active coating is " Relu ".
The first Type B neural network block is identical with the structure of the second Type B neural network block, by successively setting
The 7th convolutional layer set, the 7th batch normalization layer, the 7th active coating, the second empty convolutional layer, the 8th batch normalization layer, the
Eight active coatings, the 8th convolutional layer, the 9th batch normalization layer, the 9th active coating composition, the input terminal of the 7th convolutional layer is its institute
Type B neural network block input terminal, the 7th batch normalization layer input terminal receive the 7th convolutional layer output end output
All characteristic patterns, the input terminal of the 7th active coating receives all characteristic patterns of the output end output of the 7th batch normalization layer,
The input terminal of second empty convolutional layer receives all characteristic patterns of the output end output of the 7th active coating, the 8th batch normalization layer
Input terminal receive the second empty convolutional layer output end output all characteristic patterns, the input terminal of the 8th active coating receives the 8th
All characteristic patterns of the output end output of batch normalization layer, the input terminal of the 8th convolutional layer receive the output end of the 8th active coating
All characteristic patterns of output, the input terminal of the 9th batch normalization layer receive all features of the output end output of the 8th convolutional layer
Figure, the input terminal of the 9th active coating receive all characteristic patterns of the output end output of the 9th batch normalization layer, the 9th active coating
Output end be its where Type B neural network block output end;Wherein, the first Type B nerve in the 1st neural network block
In network block, the convolution kernel size of the 7th convolutional layer and the 8th convolutional layer is that 3 × 3, convolution kernel number is 128, zero padding ginseng
Number is that " same ", step-length are 1, and it is 128, zero padding that the convolution kernel size of the second empty convolutional layer, which is 3 × 3, convolution kernel number,
Parameter is " same ", step-length 1, empty deconvolution parameter be 2;The first Type B neural network block in the 2nd neural network block
In, the convolution kernel size of the 7th convolutional layer and the 8th convolutional layer is that 3 × 3, convolution kernel number is that 256, zero padding parameter is
" same ", step-length are 1, and the convolution kernel size of the second empty convolutional layer is 3 × 3, convolution kernel number is 256, zero padding parameter is "
Same ", step-length 1, empty deconvolution parameter are 2;In the first Type B neural network block in the 3rd neural network block, volume seven
The convolution kernel size of lamination and the 8th convolutional layer is that 3 × 3, convolution kernel number is that 512, zero padding parameter is " same ", walks
Length is 1, and the convolution kernel size of the second empty convolutional layer is 3 × 3, convolution kernel number is 512, zero padding parameter is " same ", step
A length of 1, empty deconvolution parameter is 2;In the first Type B neural network block in the 4th neural network block, the 7th convolutional layer and
It is 1024, zero padding parameter be " same ", step-length is 1 that the convolution kernel size of eight convolutional layers, which is 3 × 3, convolution kernel number,
The convolution kernel size of second empty convolutional layer is 3 × 3, convolution kernel number is 1024, zero padding parameter is " same ", step-length 1, sky
Hole deconvolution parameter is 2;In the second Type B neural network block in the 5th neural network block, the 7th convolutional layer and the 8th convolutional layer
Convolution kernel size be 3 × 3, convolution kernel number be 512, zero padding parameter be " same ", step-length be 1, the second cavity
The convolution kernel size of convolutional layer is 3 × 3, convolution kernel number is 512, zero padding parameter is " same ", step-length 1, empty convolution ginseng
Number is 2;In the second Type B neural network block in the 6th neural network block, the convolution kernel of the 7th convolutional layer and the 8th convolutional layer
It is 256, zero padding parameter be " same ", step-length is 1 that size, which is 3 × 3, convolution kernel number, the second empty convolutional layer
Convolution kernel size be 3 × 3, convolution kernel number be 256, zero padding parameter is " same ", step-length 1, empty deconvolution parameter be 2;In
In the second Type B neural network block in 7th neural network block, the convolution kernel size of the 7th convolutional layer and the 8th convolutional layer is
3 × 3, it be " same ", step-length is 1 that convolution kernel number, which is 128, zero padding parameter, and the convolution kernel of the second empty convolutional layer is big
Small is 3 × 3, convolution kernel number is 128, zero padding parameter is " same ", step-length 1, empty deconvolution parameter be 2;In the 8th nerve
In the second Type B neural network block in network block, the convolution kernel size of the 7th convolutional layer and the 8th convolutional layer is 3 × 3, convolution
It is " same ", step-length is 1 that core number, which is 64, zero padding parameter, and the convolution kernel size of the second empty convolutional layer is 3 × 3, volume
Product core number is 64, zero padding parameter is " same ", step-length 1, empty deconvolution parameter be 2;7th active coating, the 8th active coating and
The active mode of 9th active coating is " Relu ".
In the step 1_2, the structure of the 1st to the 7th warp block is identical, by the deconvolution set gradually
Layer, the tenth batch normalization layer, the tenth active coating composition, the input terminal of warp lamination is the input of the warp block where it
End, the input terminal of the tenth batch normalization layer receive all characteristic patterns of the output end output of warp lamination, the tenth active coating
Input terminal receives all characteristic patterns of the output end output of the tenth batch normalization layer, and the output end of the tenth active coating is where it
Warp block output end;Wherein, the convolution kernel size of the warp lamination in the 1st warp block is 4 × 4, convolution kernel
Number is that 512, zero padding parameter is " same ", step-length 2, the convolution kernel size of the 2nd and the warp lamination in the 3rd warp block
It is 256 for 4 × 4, convolution kernel number, zero padding parameter is " same ", step-length 2, the warp in the 4th and the 5th warp block
The convolution kernel size of lamination be 4 × 4, convolution kernel number be 128, zero padding parameter be " same ", step-length 2, the 6th and the 7th
The convolution kernel size of warp lamination in warp block be 4 × 4, convolution kernel number be 64, zero padding parameter is that " same ", step-length are
2, the active mode of the tenth active coating is " Relu ".
In the step 1_2,4 fused layers are Add fused layer.
In the step 1_2, output layer normalizes layer, the tenth by the 9th convolutional layer, the tenth batch set gradually
One active coating composition, the input terminal of the 9th convolutional layer are the input terminal of output layer, and the tenth batch normalizes the input termination of layer
All characteristic patterns of the output end output of the 9th convolutional layer are received, the input terminal of the 11st active coating receives the normalization of the tenth batch
All characteristic patterns of the output end output of layer, the output end of the 11st active coating are the output end of output layer;Wherein, the 9th convolution
The convolution kernel size of layer be 1 × 1, convolution kernel number be 12, zero padding parameter is " same ", the 1, the 11st active coating of step-length it is sharp
Mode living is " Relu ".
Compared with the prior art, the advantages of the present invention are as follows:
1) the method for the present invention constructs the empty convolutional neural networks of Complete Disability difference, and the convolutional layer for being 2 with step-length is instead of existing rank
The common pond layer of section, since pond layer can cause irreversible characteristic loss to image, and semantic segmentation is to precision of prediction
It is required that it is very high, therefore step-length has been selected to be substituted for 2 convolutional layer, the available effect identical with pond layer of the convolutional layer
Fruit, and can effectively avoid irreversible information loss caused by pond, that is, characteristics of image has been effectively ensured and has not had excessive loss.
2) the method for the present invention expands network receptive field using empty convolutional layer, and due to pond layer the advantages of can more than have
Reduction image size is imitated, receptive field can be expanded effectively also to guarantee to extract more global informations, therefore be 2 with step-length
When convolutional layer substitutes pond layer, receptive field is not expanded effectively, has lost part global information, therefore empty convolution is added
Layer, to guarantee that network receptive field is constant or even increases, empty convolutional layer is combined with the convolutional layer that step-length is 2, it is ensured that complete
Residual error cavity convolutional neural networks extract most local feature and global characteristics.
3) the method for the present invention uses jump when building Complete Disability difference cavity convolutional neural networks and is connected to main company
Mode is connect, to constitute Complete Disability difference network, residual error network has always very outstanding performance on semantic segmentation direction, therefore at this
Jump connection is added in inventive method, it can be with the loss of effective compensation image in an encoding process, to guarantee last prediction essence
Degree, and advanced features and low-level features have preferably been merged in connection of jumping, and avoid gradient disappearance or gradient explosion, thus
Improve the robustness of the empty convolutional neural networks training pattern of Complete Disability difference.
Detailed description of the invention
Fig. 1 is the composed structure schematic diagram of the empty convolutional neural networks of Complete Disability difference constructed in the method for the present invention;
Fig. 2 a is the 1st original road scene image of Same Scene;
Fig. 2 b is to be predicted using the method for the present invention road scene image original shown in Fig. 2 a, obtained prediction
Semantic segmentation image;
Fig. 3 a is the 2nd original road scene image of Same Scene;
Fig. 3 b is to be predicted using the method for the present invention road scene image original shown in Fig. 3 a, obtained prediction
Semantic segmentation image;
Fig. 4 a is the 3rd original road scene image of Same Scene;
Fig. 4 b is to be predicted using the method for the present invention road scene image original shown in Fig. 4 a, obtained prediction
Semantic segmentation image;
Fig. 5 a is the 4th original road scene image of Same Scene;
Fig. 5 b is to be predicted using the method for the present invention road scene image original shown in Fig. 5 a, obtained prediction
Semantic segmentation image.
Specific embodiment
The present invention will be described in further detail below with reference to the embodiments of the drawings.
A kind of road scene semantic segmentation method based on the empty convolutional neural networks of Complete Disability difference proposed by the present invention, packet
Include two processes of training stage and test phase.
The specific steps of the training stage process are as follows:
Step 1_1: Q original road scene image and the corresponding true language of every original road scene image are chosen
The q original road scene image in training set is denoted as { I by adopted segmented image, and composing training collectionq(i, j) }, it will instruct
Practice and concentrates and { Iq(i, j) } corresponding true semantic segmentation image is denoted asThen existing one-hot coding skill is used
Art (one-hot) is by the corresponding true semantic segmentation image procossing of the original road scene image of every in training set at 12 width
One-hot coding image, willThe set for 12 width one-hot coding image constructions being processed into is denoted asWherein, road
Scene image is RGB color image, and Q is positive integer, Q >=200, and such as taking Q=367, q is positive integer, 1≤q≤Q, 1≤i≤W, 1
≤ j≤H, W indicate { Iq(i, j) } width, H indicate { Iq(i, j) } height, such as take W=480, H=360, Iq(i, j) is indicated
{Iq(i, j) } in coordinate position be (i, j) pixel pixel value,It indicatesMiddle coordinate position is
The pixel value of the pixel of (i, j).
Here, original road scene image directly selects 367 in road scene image database CamVid training set
Width image.
Step 1_2: the empty convolutional neural networks of building Complete Disability difference: as shown in Figure 1, the empty convolutional neural networks packet of Complete Disability difference
Input layer, hidden layer and output layer are included, hidden layer includes 1 transition convolution block, 8 neural network blocks, 7 warp blocks, 4
Fused layer.
For input layer, input terminal receives R channel components, G channel components and the channel B component of a width input picture,
Its output end exports the R channel components, G channel components and channel B component of input picture to hidden layer;Wherein, it is desirable that input layer
The width of the received input picture of input terminal be W, be highly H.
For hidden layer, the input terminal of transition convolution block is the input terminal of hidden layer, receives the output end output of input layer
Input picture R channel components, G channel components and channel B component, it is W that the output end of transition convolution block, which exports 64 breadth degree,
And height is the characteristic pattern of H, and the set that this 64 width characteristic pattern is constituted is denoted as G1;The input terminal of 1st neural network block receives G1
In all characteristic patterns, the output end of the 1st neural network block exports 128 breadth degree and isAnd height isCharacteristic pattern, will
The set that this 128 width characteristic pattern is constituted is denoted as S1;The input terminal of 2nd neural network block receives S1In all characteristic patterns, the 2nd
The output end of a neural network block exports 256 breadth degreeAnd height isCharacteristic pattern, by this 256 width characteristic pattern constitute
Set be denoted as S2;The input terminal of 3rd neural network block receives S2In all characteristic patterns, the output of the 3rd neural network block
End exports 512 breadth degreeAnd height isCharacteristic pattern, by this 512 width characteristic pattern constitute set be denoted as S3;4th
The input terminal of neural network block receives S3In all characteristic patterns, the output end of the 4th neural network block exports 1024 breadth degree
ForAnd height isCharacteristic pattern, by this 1024 width characteristic pattern constitute set be denoted as S4;The input of 1st warp block
End receives S4In all characteristic patterns, the output end of the 1st warp block exports 512 breadth degree and isAnd height isSpy
The set that this 512 width characteristic pattern is constituted is denoted as F by sign figure1;The input terminal of 5th neural network block receives S3In all features
Figure, the output end of the 5th neural network block export 512 breadth degree and areAnd height isCharacteristic pattern, by this 512 width feature
The set that figure is constituted is denoted as S5;The input terminal of 1st fused layer receives F1In all characteristic patterns and S5In all characteristic patterns,
The output end 512 breadth degree of output of the 1st fused layer are after addition mixing operationAnd height isCharacteristic pattern, by this 512
The set that width characteristic pattern is constituted is denoted as A1;The input terminal of 2nd warp block receives A1In all characteristic patterns, the 2nd deconvolution
The output end of block exports 256 breadth degreeAnd height isCharacteristic pattern, by this 256 width characteristic pattern constitute set be denoted as
F2;The input terminal of 6th neural network block receives S2In all characteristic patterns, the output end output 256 of the 6th neural network block
Breadth degree isAnd height isCharacteristic pattern, by this 256 width characteristic pattern constitute set be denoted as S6;3rd warp block
Input terminal receives S3In all characteristic patterns, the output end of the 3rd warp block exports 256 breadth degree and isAnd height is
Characteristic pattern, by this 256 width characteristic pattern constitute set be denoted as F3;The input terminal of 2nd fused layer receives F2In all features
Figure, S6In all characteristic patterns and F3In all characteristic patterns, be added the output end output 256 of the 2nd fused layer after mixing operation
Breadth degree isAnd height isCharacteristic pattern, by this 256 width characteristic pattern constitute set be denoted as A2;4th warp block
Input terminal receives A2In all characteristic patterns, the output end of the 4th warp block exports 128 breadth degree and isAnd height is
Characteristic pattern, by this 128 width characteristic pattern constitute set be denoted as F4;The input terminal of 7th neural network block receives S1In it is all
Characteristic pattern, the output end of the 7th neural network block export 128 breadth degree and areAnd height isCharacteristic pattern, by this 128 width
The set that characteristic pattern is constituted is denoted as S7;The input terminal of 5th warp block receives S2In all characteristic patterns, the 5th warp block
Output end export 128 breadth degree beAnd height isCharacteristic pattern, by this 128 width characteristic pattern constitute set be denoted as F5;
The input terminal of 3rd fused layer receives F4In all characteristic patterns, S7In all characteristic patterns and F5In all characteristic patterns, phase
Add the output end of the 3rd fused layer after mixing operation to export 128 breadth degree to beAnd height isCharacteristic pattern, by this 128 width
The set that characteristic pattern is constituted is denoted as A3;The input terminal of 6th warp block receives A3In all characteristic patterns, the 6th warp block
Output end export 64 breadth degree be W and height be H characteristic pattern, by this 64 width characteristic pattern constitute set be denoted as F6;8th
The input terminal of neural network block receives G1In all characteristic patterns, it is W that the output end of the 8th neural network block, which exports 64 breadth degree,
And height is the characteristic pattern of H, and the set that this 64 width characteristic pattern is constituted is denoted as S8;The input terminal of 7th warp block receives S1In
All characteristic patterns, the output end of the 7th warp block exports the characteristic pattern that 64 breadth degree are W and height is H, by this 64 width spy
The set that sign figure is constituted is denoted as F7;The input terminal of 4th fused layer receives F6In all characteristic patterns, S8In all characteristic patterns
And F7In all characteristic patterns, the output end for being added the 4th fused layer after mixing operation exports that 64 breadth degree are W and height is H
Characteristic pattern, by this 64 width characteristic pattern constitute set be denoted as A4, the output end of the 4th fused layer is the output end of hidden layer.
For output layer, input terminal receives A4In all characteristic patterns, output end export 12 breadth degree be W and height
For the characteristic pattern of H, the set that this 12 width characteristic pattern is constituted is denoted as O1。
Step 1_3: using the original road scene image of every in training set as input picture, it is empty to be input to Complete Disability difference
It is trained in the convolutional neural networks of hole, obtains the corresponding 12 width semanteme point of every original road scene image in training set
Prognostic chart is cut, by { Iq(i, j) } set that constitutes of corresponding 12 width semantic segmentation prognostic chart is denoted as
Step 1_4: the corresponding 12 width semantic segmentation prognostic chart of every original road scene image in training set is calculated
Composition set with corresponding true semantic segmentation image procossing at 12 width one-hot coding image constructions set between damage
Functional value is lost, it willWithBetween loss function value be denoted asUsing
Negative Log-liklihood (NLLLoss) function obtains.
Step 1_5: repeating step 1_3 and step 1_4 is V times total, obtains the empty convolutional neural networks training of Complete Disability difference
Model, and Q × V loss function value is obtained;Then the smallest loss function value of value is found out from Q × V loss function value;
Then it will be worth the corresponding weighted vector of the smallest loss function value and bias term to should be used as the empty convolutional neural networks of Complete Disability difference
The optimal bias term of best initial weights vector sum of training pattern, correspondence are denoted as WbestAnd bbest;Wherein, V > 1, takes in the present embodiment
V=500.
The specific steps of the test phase process are as follows:
Step 2_1: it enablesIndicate the road scene image to semantic segmentation;Wherein, 1≤i'≤W', 1≤j'≤
H', W' are indicatedWidth, H' indicateHeight,It indicatesMiddle coordinate position is
The pixel value of the pixel of (i, j).
Step 2_2: willR channel components, G channel components and channel B component be input to Complete Disability difference cavity volume
In product neural network training model, and utilize WbestAnd bbestIt is predicted, is obtainedCorresponding prediction semantic segmentation
Image is denoted asWherein,It indicatesMiddle coordinate position is the pixel of (i', j')
Pixel value.
In this particular embodiment, in step 1_2, transition convolution block is by the first convolutional layer for setting gradually
(Convolution, Conv), the first batch normalization layer (Batch Normalization, BN), the first active coating
(Activation, Act), the second convolutional layer, the second batch normalization layer, the second active coating, third convolutional layer, third batch are returned
One changes layer, third active coating composition, and the input terminal of the first convolutional layer is the input terminal of transition convolution block, the first batch normalization layer
Input terminal receive the first convolutional layer output end output all characteristic patterns, the input terminal of the first active coating receives the first batch
All characteristic patterns of the output end output of layer are normalized, the input terminal of the second convolutional layer receives the output end output of the first active coating
All characteristic patterns, second batch normalization layer input terminal receive the second convolutional layer output end output all characteristic patterns,
The input terminal of second active coating receives all characteristic patterns of the output end output of the second batch normalization layer, third convolutional layer it is defeated
Enter all characteristic patterns that end receives the output end output of the second active coating, the input terminal of third batch normalization layer receives third volume
All characteristic patterns of the output end output of lamination, the output end that the input terminal of third active coating receives third batch normalization layer are defeated
All characteristic patterns out, the output end of third active coating are the output end of transition convolution block;Wherein, the first convolutional layer, volume Two
The convolution kernel size (kernel_size) of lamination and third convolutional layer be 3 × 3, convolution kernel number (filters) be 64,
It is 1 that zero padding (padding) parameter, which is " same ", step-length (stride), the first active coating, the second active coating and third activation
The active mode of layer is " Relu ".
In this particular embodiment, in step 1_2, the structure of the 1st to the 4th neural network block is identical, by successively
Volume Four lamination, the first R type neural network block and the first Type B neural network block composition of setting, the input terminal of Volume Four lamination
The input terminal of neural network block where it, the input terminal of the first R type neural network block receive the output end of Volume Four lamination
All characteristic patterns of output, the input terminal of the first Type B neural network block receive the output end output of the first R type neural network block
All characteristic patterns, the output end of the first Type B neural network block are the output end of the neural network block where it;Wherein, the 1st mind
Convolution kernel size through the Volume Four lamination in network block be 3 × 3, convolution kernel number be 128, zero padding parameter be " same ", step
The convolution kernel number of the first R type neural network block and the first Type B neural network block in a length of 2, the 1st neural network block is
It is 256, zero padding parameter that the convolution kernel size of Volume Four lamination in 128, the 2nd neural network block, which is 3 × 3, convolution kernel number,
For " same ", step-length 2, the volume of the first R type neural network block and the first Type B neural network block in the 2nd neural network block
Product core number is 256, and the convolution kernel size of the Volume Four lamination in the 3rd neural network block is that 3 × 3, convolution kernel number is
512, zero padding parameter is " same ", step-length 2, the first R type neural network block and the first Type B mind in the 3rd neural network block
Convolution kernel number through network block is 512, the convolution kernel size of the Volume Four lamination in the 4th neural network block is 3 × 3,
Convolution kernel number is 1024, zero padding parameter is " same ", step-length 2, the first R type neural network in the 4th neural network block
The convolution kernel number of block and the first Type B neural network block is 1024.
In this particular embodiment, the structure of the 5th to the 8th neural network block is identical, by the 2nd R set gradually
Type neural network block and the second Type B neural network block composition, the input terminal of the 2nd R type neural network block are the nerve net where it
The input terminal of network block, the input terminal of the second Type B neural network block receive all of the output end output of the 2nd R type neural network block
Characteristic pattern, the output end of the second Type B neural network block are the output end of the neural network block where it;Wherein, the 5th nerve net
The convolution kernel number of the 2nd R type neural network block and the second Type B neural network block in network block is 512, the 6th neural network
The convolution kernel number of the 2nd R type neural network block and the second Type B neural network block in block is 256, the 7th neural network block
In the 2nd R type neural network block and the convolution kernel number of the second Type B neural network block be 128, in the 8th neural network block
The 2nd R type neural network block and the convolution kernel number of the second Type B neural network block be 64.
In this particular embodiment, the structure phase of the first R type neural network block and the 2nd R type neural network block
Together, the 5th convolutional layer, the 4th batch normalization layer, the 4th active coating, the first empty convolutional layer, the 5th batch by setting gradually
Amount normalization layer, the 5th active coating, the 6th convolutional layer, the 6th batch normalization layer, the 6th active coating composition, the 5th convolutional layer
Input terminal is the input terminal of the R type neural network block where it, and the input terminal of the 4th batch normalization layer receives the 5th convolutional layer
Output end output all characteristic patterns, the input terminal of the 4th active coating receives the output end output of the 4th batch normalization layer
All characteristic patterns, all characteristic patterns of the output end output of input terminal the 4th active coating of reception of the first empty convolutional layer, the 5th
The input terminal of batch normalization layer receives all characteristic patterns of the output end output of the first empty convolutional layer, the 5th active coating it is defeated
Enter all characteristic patterns that end receives the output end output of the 5th batch normalization layer, the input terminal of the 6th convolutional layer receives the 5th and swashs
All characteristic patterns of the output end output of layer living, the output end that the input terminal of the 6th batch normalization layer receives the 6th convolutional layer are defeated
All characteristic patterns out, the input terminal of the 6th active coating receive all features of the output end output of the 6th batch normalization layer
Figure will enter into all characteristic patterns of all characteristic patterns of the input terminal of the 5th convolutional layer and the output end output of the 6th active coating
Carry out after jump connection as place R type neural network block output end output all characteristic patterns;Wherein, in the 1st mind
Through in the first R type neural network block in network block, the convolution kernel size of the 5th convolutional layer and the 6th convolutional layer is 3 × 3, volume
It is " same ", step-length is 1 that product core number, which is 128, zero padding parameter, and the convolution kernel size of the first empty convolutional layer is 3 ×
3, convolution kernel number is 128, zero padding parameter is " same ", step-length 1, empty convolution (dilation) parameter be 2;At the 2nd
In the first R type neural network block in neural network block, the convolution kernel size of the 5th convolutional layer and the 6th convolutional layer is 3 × 3,
It is " same ", step-length is 1 that convolution kernel number, which is 256, zero padding parameter, and the convolution kernel size of the first empty convolutional layer is 3
× 3, convolution kernel number is 256, zero padding parameter is " same ", step-length 1, empty deconvolution parameter be 2;In the 3rd neural network
In the first R type neural network block in block, the convolution kernel size of the 5th convolutional layer and the 6th convolutional layer is 3 × 3, convolution kernel
It is " same ", step-length is 1 that number, which is 512, zero padding parameter, and the convolution kernel size of the first empty convolutional layer is 3 × 3, convolution
Core number is 512, zero padding parameter is " same ", step-length 1, empty deconvolution parameter be 2;In the 4th neural network block
In one R type neural network block, the convolution kernel size of the 5th convolutional layer and the 6th convolutional layer is that 3 × 3, convolution kernel number is
1024, it is 1 that zero padding parameter, which is " same ", step-length, and the convolution kernel size of the first empty convolutional layer is 3 × 3, convolution kernel number
It is for 1024, zero padding parameter " same ", step-length 1, empty deconvolution parameter be 2;The 2nd R type in the 5th neural network block
In neural network block, the convolution kernel size of the 5th convolutional layer and the 6th convolutional layer is that 3 × 3, convolution kernel number is 512, mends
Zero parameter is that " same ", step-length are 1, the convolution kernel size of the first empty convolutional layer be 3 × 3, convolution kernel number be 512,
Zero padding parameter is " same ", step-length 1, empty deconvolution parameter be 2;The 2nd R type neural network in the 6th neural network block
In block, the convolution kernel size of the 5th convolutional layer and the 6th convolutional layer is that 3 × 3, convolution kernel number is that 256, zero padding parameter is equal
It is 1 for " same ", step-length, it is 256, zero padding parameter that the convolution kernel size of the first empty convolutional layer, which is 3 × 3, convolution kernel number,
For " same ", step-length 1, empty deconvolution parameter be 2;In the 2nd R type neural network block in the 7th neural network block, the
The convolution kernel size of five convolutional layers and the 6th convolutional layer is that 3 × 3, convolution kernel number is that 128, zero padding parameter is
" same ", step-length are 1, and the convolution kernel size of the first empty convolutional layer is 3 × 3, convolution kernel number is 128, zero padding parameter is "
Same ", step-length 1, empty deconvolution parameter are 2;In the 2nd R type neural network block in the 8th neural network block, volume five
The convolution kernel size of lamination and the 6th convolutional layer is that 3 × 3, convolution kernel number is that 64, zero padding parameter is " same ", step-length
Be 1, the convolution kernel size of the first empty convolutional layer be 3 × 3, convolution kernel number be 64, zero padding parameter is " same ", step-length be
1, empty deconvolution parameter is 2;The active mode of 4th active coating, the 5th active coating and the 6th active coating is " Relu ".
In this particular embodiment, the structure phase of the first Type B neural network block and the second Type B neural network block
Together, the 7th convolutional layer, the 7th batch normalization layer, the 7th active coating, the second empty convolutional layer, the 8th batch by setting gradually
Amount normalization layer, the 8th active coating, the 8th convolutional layer, the 9th batch normalization layer, the 9th active coating composition, the 7th convolutional layer
Input terminal is the input terminal of the Type B neural network block where it, and the input terminal of the 7th batch normalization layer receives the 7th convolutional layer
Output end output all characteristic patterns, the input terminal of the 7th active coating receives the output end output of the 7th batch normalization layer
All characteristic patterns, all characteristic patterns of the output end output of input terminal the 7th active coating of reception of the second empty convolutional layer, the 8th
The input terminal of batch normalization layer receives all characteristic patterns of the output end output of the second empty convolutional layer, the 8th active coating it is defeated
Enter all characteristic patterns that end receives the output end output of the 8th batch normalization layer, the input terminal of the 8th convolutional layer receives the 8th and swashs
All characteristic patterns of the output end output of layer living, the output end that the input terminal of the 9th batch normalization layer receives the 8th convolutional layer are defeated
All characteristic patterns out, the input terminal of the 9th active coating receive all features of the output end output of the 9th batch normalization layer
Figure, the output end of the 9th active coating are the output end of the Type B neural network block where it;Wherein, in the 1st neural network block
The first Type B neural network block in, the convolution kernel size of the 7th convolutional layer and the 8th convolutional layer is that 3 × 3, convolution kernel number is equal
It is " same ", step-length for 128, zero padding parameter is 1, the convolution kernel size of the second empty convolutional layer is 3 × 3, convolution kernel
Number is that 128, zero padding parameter is " same ", step-length 1, empty convolution (dilation) parameter be 2;In the 2nd neural network block
In the first Type B neural network block in, the convolution kernel size of the 7th convolutional layer and the 8th convolutional layer is 3 × 3, convolution kernel number
It is 256, zero padding parameter be " same ", step-length is 1, the convolution kernel size of the second empty convolutional layer is 3 × 3, convolution kernel
Number is 256, zero padding parameter is " same ", step-length 1, empty deconvolution parameter be 2;The first B in the 3rd neural network block
In type neural network block, the convolution kernel size of the 7th convolutional layer and the 8th convolutional layer be 3 × 3, convolution kernel number be 512,
Zero padding parameter is that " same ", step-length are 1, and the convolution kernel size of the second empty convolutional layer is that 3 × 3, convolution kernel number is
512, zero padding parameter is " same ", step-length 1, empty deconvolution parameter be 2;The first Type B nerve in the 4th neural network block
In network block, the convolution kernel size of the 7th convolutional layer and the 8th convolutional layer is that 3 × 3, convolution kernel number is 1024, zero padding ginseng
Number is that " same ", step-length are 1, and it is 1024, zero padding that the convolution kernel size of the second empty convolutional layer, which is 3 × 3, convolution kernel number,
Parameter is " same ", step-length 1, empty deconvolution parameter be 2;The second Type B neural network block in the 5th neural network block
In, the convolution kernel size of the 7th convolutional layer and the 8th convolutional layer is that 3 × 3, convolution kernel number is that 512, zero padding parameter is
" same ", step-length are 1, and the convolution kernel size of the second empty convolutional layer is 3 × 3, convolution kernel number is 512, zero padding parameter is "
Same ", step-length 1, empty deconvolution parameter are 2;In the second Type B neural network block in the 6th neural network block, volume seven
The convolution kernel size of lamination and the 8th convolutional layer is that 3 × 3, convolution kernel number is that 256, zero padding parameter is " same ", walks
Length is 1, and the convolution kernel size of the second empty convolutional layer is 3 × 3, convolution kernel number is 256, zero padding parameter is " same ", step
A length of 1, empty deconvolution parameter is 2;In the second Type B neural network block in the 7th neural network block, the 7th convolutional layer and
It is 128, zero padding parameter be " same ", step-length is 1 that the convolution kernel size of eight convolutional layers, which is 3 × 3, convolution kernel number,
The convolution kernel size of second empty convolutional layer is 3 × 3, convolution kernel number is 128, zero padding parameter is " same ", step-length 1, sky
Hole deconvolution parameter is 2;In the second Type B neural network block in the 8th neural network block, the 7th convolutional layer and the 8th convolutional layer
Convolution kernel size be 3 × 3, convolution kernel number be 64, zero padding parameter be " same ", step-length be 1, the second cavity volume
The convolution kernel size of lamination be 3 × 3, convolution kernel number be 64, zero padding parameter is " same ", step-length 1, empty deconvolution parameter be
2;The active mode of 7th active coating, the 8th active coating and the 9th active coating is " Relu ".
In this particular embodiment, in step 1_2, the structure of the 1st to the 7th warp block is identical, by successively setting
Warp lamination, the tenth batch normalization layer, the tenth active coating composition set, the input terminal of warp lamination is the deconvolution where it
The input terminal of block, all characteristic patterns of the output end output of the input terminal reception warp lamination of the tenth batch normalization layer, the tenth
The input terminal of active coating receives all characteristic patterns of the output end output of the tenth batch normalization layer, the output end of the tenth active coating
The output end of warp block where it;Wherein, the convolution kernel size of the warp lamination in the 1st warp block be 4 × 4,
Convolution kernel number is 512, zero padding parameter is " same ", step-length 2, the volume of the 2nd and the warp lamination in the 3rd warp block
Product core size be 4 × 4, convolution kernel number be 256, zero padding parameter be " same ", step-length 2, the 4th and the 5th warp block
In warp lamination convolution kernel size be 4 × 4, convolution kernel number be 128, zero padding parameter be " same ", step-length 2, the 6th
Convolution kernel size with the warp lamination in the 7th warp block is 4 × 4, convolution kernel number is 64, zero padding parameter is
" same ", step-length 2, the active mode of the tenth active coating are " Relu ".
In this particular embodiment, in step 1_2,4 fused layers are Add fused layer.
In this particular embodiment, in step 1_2, output layer is returned by the 9th convolutional layer, the tenth batch set gradually
One changes layer, the 11st active coating composition, and the input terminal of the 9th convolutional layer is the input terminal of output layer, and the tenth batch normalizes layer
Input terminal receive the 9th convolutional layer output end output all characteristic patterns, the input terminal of the 11st active coating receives the 11st
All characteristic patterns of the output end output of batch normalization layer, the output end of the 11st active coating are the output end of output layer;Its
In, the convolution kernel size of the 9th convolutional layer be 1 × 1, convolution kernel number be 12, zero padding parameter be " same ", step-length 1, the tenth
The active mode of one active coating is " Relu ".
In order to further verify the feasibility and validity of the method for the present invention, tested.
The empty convolutional neural networks of Complete Disability difference are built using the deep learning frame Pytorch0.4.1 based on python
Framework.Road scene image is predicted come analysis and utilization the method for the present invention using road scene image database CamVid test set
How is the segmentation effect of (taking 233 width road scene images).Here, objective ginseng is commonly used using 3 of assessment semantic segmentation method
Amount be used as evaluation index, i.e., class accuracy (Class Acurracy), mean pixel accuracy rate (Mean Pixel Accuracy,
MPA), the ratio (Mean Intersection over Union, MIoU) of segmented image and label image intersection and union comes
The segmentation performance of evaluation and foreca semantic segmentation image.
Using the method for the present invention to every width road scene image in road scene image database CamVid test set into
Row prediction, obtains the corresponding prediction semantic segmentation image of every width road scene image, reflects the semantic segmentation effect of the method for the present invention
Class accuracy CA, mean pixel accuracy rate MPA, segmented image and the label image intersection of fruit and the ratio MIoU such as table 1 of union
It is listed.The data listed by the table 1 are it is found that the segmentation result of the road scene image obtained by the method for the present invention is preferable, table
The bright corresponding prediction semantic segmentation image of road scene image that obtained using the method for the present invention is feasible and effective.
Prediction result of the table 1 using the method for the present invention on test set
Fig. 2 a gives the 1st original road scene image of Same Scene;Fig. 2 b, which gives, utilizes the method for the present invention
Road scene image original shown in Fig. 2 a is predicted, obtained prediction semantic segmentation image;Fig. 3 a gives same
2nd original road scene image of scene;Fig. 3 b gives using the method for the present invention to road original shown in Fig. 3 a
Scene image predicted, obtained prediction semantic segmentation image;Fig. 4 a gives the 3rd original road field of Same Scene
Scape image;Fig. 4 b, which gives, predicts road scene image original shown in Fig. 4 a using the method for the present invention, obtains
Predict semantic segmentation image;Fig. 5 a gives the 4th original road scene image of Same Scene;Fig. 5 b, which gives, utilizes this
Inventive method predicts road scene image original shown in Fig. 5 a, obtained prediction semantic segmentation image.Comparison diagram
2a and Fig. 2 b, comparison diagram 3a and Fig. 3 b, comparison diagram 4a and Fig. 4 b, comparison diagram 5a and Fig. 5 b, it can be seen that utilize the method for the present invention
The segmentation precision of obtained prediction semantic segmentation image is higher.
Claims (8)
1. a kind of road scene semantic segmentation method based on the empty convolutional neural networks of Complete Disability difference, it is characterised in that including training
Two processes of stage and test phase;
The specific steps of the training stage process are as follows:
Step 1_1: choosing Q original road scene image and every original road scene image is true semantic point corresponding
Image, and composing training collection are cut, the q original road scene image in training set is denoted as { Iq(i, j) }, by training set
In with { Iq(i, j) } corresponding true semantic segmentation image is denoted asThen use one-hot coding technology by training set
In the corresponding true semantic segmentation image procossing of every original road scene image at 12 width one-hot coding images, willThe set for 12 width one-hot coding image constructions being processed into is denoted asWherein, road scene image is RGB color
Image, Q are positive integer, and Q >=200, q are positive integer, and 1≤q≤Q, 1≤i≤W, 1≤j≤H, W indicate { Iq(i, j) } width,
H indicates { Iq(i, j) } height, Iq(i, j) indicates { Iq(i, j) } in coordinate position be (i, j) pixel pixel value,It indicatesMiddle coordinate position is the pixel value of the pixel of (i, j);
Step 1_2: the empty convolutional neural networks of building Complete Disability difference: the empty convolutional neural networks of Complete Disability difference include input layer, hide
Layer and output layer, hidden layer include 1 transition convolution block, 8 neural network blocks, 7 warp blocks, 4 fused layers;
For input layer, input terminal receives R channel components, G channel components and the channel B component of a width input picture, defeated
Outlet exports the R channel components, G channel components and channel B component of input picture to hidden layer;Wherein, it is desirable that input layer it is defeated
The width for entering to hold received input picture is W, is highly H;
For hidden layer, the input terminal of transition convolution block is the input terminal of hidden layer, receives the defeated of the output end output of input layer
Enter the R channel components, G channel components and channel B component of image, it is W and height that the output end of transition convolution block, which exports 64 breadth degree,
Degree is the characteristic pattern of H, and the set that this 64 width characteristic pattern is constituted is denoted as G1;The input terminal of 1st neural network block receives G1In
All characteristic patterns, the output end of the 1st neural network block export 128 breadth degree and areAnd height isCharacteristic pattern, by this
The set that 128 width characteristic patterns are constituted is denoted as S1;The input terminal of 2nd neural network block receives S1In all characteristic patterns, the 2nd
The output end of neural network block exports 256 breadth degreeAnd height isCharacteristic pattern, by this 256 width characteristic pattern constitute
Set is denoted as S2;The input terminal of 3rd neural network block receives S2In all characteristic patterns, the output end of the 3rd neural network block
Exporting 512 breadth degree isAnd height isCharacteristic pattern, by this 512 width characteristic pattern constitute set be denoted as S3;4th mind
Input terminal through network block receives S3In all characteristic patterns, the output end of the 4th neural network block exports 1024 breadth degree and isAnd height isCharacteristic pattern, by this 1024 width characteristic pattern constitute set be denoted as S4;The input terminal of 1st warp block
Receive S4In all characteristic patterns, the output end of the 1st warp block exports 512 breadth degree and isAnd height isFeature
The set that this 512 width characteristic pattern is constituted is denoted as F by figure1;The input terminal of 5th neural network block receives S3In all features
Figure, the output end of the 5th neural network block export 512 breadth degree and areAnd height isCharacteristic pattern, by this 512 width feature
The set that figure is constituted is denoted as S5;The input terminal of 1st fused layer receives F1In all characteristic patterns and S5In all characteristic patterns,
The output end 512 breadth degree of output of the 1st fused layer are after addition mixing operationAnd height isCharacteristic pattern, by this 512
The set that width characteristic pattern is constituted is denoted as A1;The input terminal of 2nd warp block receives A1In all characteristic patterns, the 2nd deconvolution
The output end of block exports 256 breadth degreeAnd height isCharacteristic pattern, by this 256 width characteristic pattern constitute set be denoted as
F2;The input terminal of 6th neural network block receives S2In all characteristic patterns, the output end output 256 of the 6th neural network block
Breadth degree isAnd height isCharacteristic pattern, by this 256 width characteristic pattern constitute set be denoted as S6;3rd warp block
Input terminal receives S3In all characteristic patterns, the output end of the 3rd warp block exports 256 breadth degree and isAnd height is
Characteristic pattern, by this 256 width characteristic pattern constitute set be denoted as F3;The input terminal of 2nd fused layer receives F2In all features
Figure, S6In all characteristic patterns and F3In all characteristic patterns, be added the output end output 256 of the 2nd fused layer after mixing operation
Breadth degree isAnd height isCharacteristic pattern, by this 256 width characteristic pattern constitute set be denoted as A2;4th warp block
Input terminal receives A2In all characteristic patterns, the output end of the 4th warp block exports 128 breadth degree and isAnd height is
Characteristic pattern, by this 128 width characteristic pattern constitute set be denoted as F4;The input terminal of 7th neural network block receives S1In it is all
Characteristic pattern, the output end of the 7th neural network block export 128 breadth degree and areAnd height isCharacteristic pattern, by this 128 width
The set that characteristic pattern is constituted is denoted as S7;The input terminal of 5th warp block receives S2In all characteristic patterns, the 5th warp block
Output end export 128 breadth degree beAnd height isCharacteristic pattern, by this 128 width characteristic pattern constitute set be denoted as F5;
The input terminal of 3rd fused layer receives F4In all characteristic patterns, S7In all characteristic patterns and F5In all characteristic patterns, phase
Add the output end of the 3rd fused layer after mixing operation to export 128 breadth degree to beAnd height isCharacteristic pattern, by this 128 width
The set that characteristic pattern is constituted is denoted as A3;The input terminal of 6th warp block receives A3In all characteristic patterns, the 6th warp block
Output end export 64 breadth degree be W and height be H characteristic pattern, by this 64 width characteristic pattern constitute set be denoted as F6;8th
The input terminal of neural network block receives G1In all characteristic patterns, it is W that the output end of the 8th neural network block, which exports 64 breadth degree,
And height is the characteristic pattern of H, and the set that this 64 width characteristic pattern is constituted is denoted as S8;The input terminal of 7th warp block receives S1In
All characteristic patterns, the output end of the 7th warp block exports the characteristic pattern that 64 breadth degree are W and height is H, by this 64 width spy
The set that sign figure is constituted is denoted as F7;The input terminal of 4th fused layer receives F6In all characteristic patterns, S8In all characteristic patterns
And F7In all characteristic patterns, the output end for being added the 4th fused layer after mixing operation exports that 64 breadth degree are W and height is H
Characteristic pattern, by this 64 width characteristic pattern constitute set be denoted as A4, the output end of the 4th fused layer is the output end of hidden layer;
For output layer, input terminal receives A4In all characteristic patterns, output end export 12 breadth degree be W and height be H's
The set that this 12 width characteristic pattern is constituted is denoted as O by characteristic pattern1;
Step 1_3: using the original road scene image of every in training set as input picture, it is input to Complete Disability difference cavity volume
It is trained in product neural network, the corresponding 12 width semantic segmentation of every original road scene image obtained in training set is pre-
Mapping, by { Iq(i, j) } set that constitutes of corresponding 12 width semantic segmentation prognostic chart is denoted as
Step 1_4: the corresponding 12 width semantic segmentation prognostic chart of every original road scene image calculated in training set is constituted
Set with corresponding true semantic segmentation image procossing at 12 width one-hot coding image constructions set between loss letter
Numerical value, willWithBetween loss function value be denoted asUsing Negative
Log-liklihood function obtains;
Step 1_5: repeating step 1_3 and step 1_4 is V times total, obtains the empty convolutional neural networks training pattern of Complete Disability difference,
And Q × V loss function value is obtained;Then the smallest loss function value of value is found out from Q × V loss function value;Then
The corresponding weighted vector of the smallest loss function value and bias term will be worth and trained to should be used as the empty convolutional neural networks of Complete Disability difference
The optimal bias term of best initial weights vector sum of model, correspondence are denoted as WbestAnd bbest;Wherein, V > 1;
The specific steps of the test phase process are as follows:
Step 2_1: it enablesIndicate the road scene image to semantic segmentation;Wherein, 1≤i'≤W', 1≤j'≤H',
W' is indicatedWidth, H' indicateHeight,It indicatesMiddle coordinate position be (i,
J) pixel value of pixel;
Step 2_2: willR channel components, G channel components and channel B component be input to the empty convolution mind of Complete Disability difference
Through in network training model, and utilize WbestAnd bbestIt is predicted, is obtainedCorresponding prediction semantic segmentation image,
It is denoted asWherein,It indicatesMiddle coordinate position is the pixel of the pixel of (i', j')
Value.
2. the road scene semantic segmentation method according to claim 1 based on the empty convolutional neural networks of Complete Disability difference,
It is characterized in that in the step 1_2, transition convolution block is by the first convolutional layer, the first batch normalization layer, that set gradually
One active coating, the second convolutional layer, second batch normalization layer, the second active coating, third convolutional layer, third batch normalization layer,
Third active coating composition, the input terminal of the first convolutional layer are the input terminal of transition convolution block, and the first batch normalizes the input of layer
End receives all characteristic patterns of the output end output of the first convolutional layer, and the input terminal of the first active coating receives the first batch and normalizes
All characteristic patterns of the output end output of layer, the input terminal of the second convolutional layer receive all of the output end output of the first active coating
Characteristic pattern, the input terminal of the second batch normalization layer receive all characteristic patterns of the output end output of the second convolutional layer, and second swashs
The input terminal of layer living receives all characteristic patterns of the output end output of the second batch normalization layer, the input termination of third convolutional layer
All characteristic patterns of the output end output of the second active coating are received, the input terminal of third batch normalization layer receives third convolutional layer
All characteristic patterns of output end output, the input terminal of third active coating receive the institute of the output end output of third batch normalization layer
There is characteristic pattern, the output end of third active coating is the output end of transition convolution block;Wherein, the first convolutional layer, the second convolutional layer and
It is 64, zero padding parameter be " same ", step-length is 1 that the convolution kernel size of third convolutional layer, which is 3 × 3, convolution kernel number,
The active mode of first active coating, the second active coating and third active coating is " Relu ".
3. the road scene semantic segmentation method according to claim 1 based on the empty convolutional neural networks of Complete Disability difference,
In step 1_2 described in being characterized in that, the structure of the 1st to the 4th neural network block is identical, by the Volume Four set gradually
Lamination, the first R type neural network block and the first Type B neural network block composition, the input terminal of Volume Four lamination are the mind where it
Input terminal through network block, the input terminal of the first R type neural network block receive all spies of the output end output of Volume Four lamination
Sign figure, the input terminal of the first Type B neural network block receive all characteristic patterns of the output end output of the first R type neural network block,
The output end of first Type B neural network block is the output end of the neural network block where it;Wherein, in the 1st neural network block
Volume Four lamination convolution kernel size be 3 × 3, convolution kernel number be 128, zero padding parameter be " same ", step-length 2, the 1st
The convolution kernel number of the first R type neural network block and the first Type B neural network block in neural network block is 128, the 2nd mind
Convolution kernel size through the Volume Four lamination in network block be 3 × 3, convolution kernel number be 256, zero padding parameter be " same ", step
The convolution kernel number of the first R type neural network block and the first Type B neural network block in a length of 2, the 2nd neural network block is
It is 512, zero padding parameter that the convolution kernel size of Volume Four lamination in 256, the 3rd neural network block, which is 3 × 3, convolution kernel number,
For " same ", step-length 2, the volume of the first R type neural network block and the first Type B neural network block in the 3rd neural network block
Product core number is 512, and the convolution kernel size of the Volume Four lamination in the 4th neural network block is that 3 × 3, convolution kernel number is
1024, zero padding parameter is " same ", step-length 2, the first R type neural network block and the first Type B mind in the 4th neural network block
Convolution kernel number through network block is 1024;
The structure of 5th to the 8th neural network block is identical, by the 2nd R type neural network block set gradually and the second Type B
Neural network block composition, the input terminal of the 2nd R type neural network block are the input terminal of the neural network block where it, the second Type B
The input terminal of neural network block receives all characteristic patterns of the output end output of the 2nd R type neural network block, the second Type B nerve net
The output end of network block is the output end of the neural network block where it;Wherein, the 2nd R type nerve in the 5th neural network block
The convolution kernel number of network block and the second Type B neural network block is 512, the 2nd R type nerve net in the 6th neural network block
The convolution kernel number of network block and the second Type B neural network block is 256, the 2nd R type neural network in the 7th neural network block
The convolution kernel number of block and the second Type B neural network block is 128, the 2nd R type neural network block in the 8th neural network block
Convolution kernel number with the second Type B neural network block is 64.
4. the road scene semantic segmentation method according to claim 3 based on the empty convolutional neural networks of Complete Disability difference,
It is characterized in that the first R type neural network block is identical with the structure of the 2nd R type neural network block, by successively setting
The 5th convolutional layer set, the 4th batch normalization layer, the 4th active coating, the first empty convolutional layer, the 5th batch normalization layer, the
Five active coatings, the 6th convolutional layer, the 6th batch normalization layer, the 6th active coating composition, the input terminal of the 5th convolutional layer is its institute
R type neural network block input terminal, the 4th batch normalization layer input terminal receive the 5th convolutional layer output end output
All characteristic patterns, the input terminal of the 4th active coating receives all characteristic patterns of the output end output of the 4th batch normalization layer,
The input terminal of first empty convolutional layer receives all characteristic patterns of the output end output of the 4th active coating, the 5th batch normalization layer
Input terminal receive the first empty convolutional layer output end output all characteristic patterns, the input terminal of the 5th active coating receives the 5th
All characteristic patterns of the output end output of batch normalization layer, the input terminal of the 6th convolutional layer receive the output end of the 5th active coating
All characteristic patterns of output, the input terminal of the 6th batch normalization layer receive all features of the output end output of the 6th convolutional layer
Figure, the input terminal of the 6th active coating receive all characteristic patterns of the output end output of the 6th batch normalization layer, will enter into the
All characteristic patterns of the output end output of all characteristic patterns and the 6th active coating of the input terminal of five convolutional layers carry out jump connection
All characteristic patterns of the output end output of R type neural network block as where afterwards;Wherein, in the 1st neural network block
In first R type neural network block, the convolution kernel size of the 5th convolutional layer and the 6th convolutional layer is that 3 × 3, convolution kernel number is
128, it is 1 that zero padding parameter, which is " same ", step-length, and the convolution kernel size of the first empty convolutional layer is 3 × 3, convolution kernel number
It is for 128, zero padding parameter " same ", step-length 1, empty deconvolution parameter be 2;The first R type mind in the 2nd neural network block
Through in network block, it is 256, zero padding that the convolution kernel size of the 5th convolutional layer and the 6th convolutional layer, which is 3 × 3, convolution kernel number,
Parameter is that " same ", step-length are 1, the convolution kernel size of the first empty convolutional layer be 3 × 3, convolution kernel number be 256, benefit
Zero parameter is " same ", step-length 1, empty deconvolution parameter be 2;The first R type neural network block in the 3rd neural network block
In, the convolution kernel size of the 5th convolutional layer and the 6th convolutional layer is that 3 × 3, convolution kernel number is that 512, zero padding parameter is
" same ", step-length are 1, and the convolution kernel size of the first empty convolutional layer is 3 × 3, convolution kernel number is 512, zero padding parameter is "
Same ", step-length 1, empty deconvolution parameter are 2;In the first R type neural network block in the 4th neural network block, volume five
The convolution kernel size of lamination and the 6th convolutional layer is that 3 × 3, convolution kernel number is that 1024, zero padding parameter is " same ", walks
Length is 1, and the convolution kernel size of the first empty convolutional layer is 3 × 3, convolution kernel number is 1024, zero padding parameter is " same ", step
A length of 1, empty deconvolution parameter is 2;In the 2nd R type neural network block in the 5th neural network block, the 5th convolutional layer and
It is 512, zero padding parameter be " same ", step-length is 1 that the convolution kernel size of six convolutional layers, which is 3 × 3, convolution kernel number,
The convolution kernel size of first empty convolutional layer is 3 × 3, convolution kernel number is 512, zero padding parameter is " same ", step-length 1, sky
Hole deconvolution parameter is 2;In the 2nd R type neural network block in the 6th neural network block, the 5th convolutional layer and the 6th convolutional layer
Convolution kernel size be 3 × 3, convolution kernel number be 256, zero padding parameter be " same ", step-length be 1, the first cavity
The convolution kernel size of convolutional layer is 3 × 3, convolution kernel number is 256, zero padding parameter is " same ", step-length 1, empty convolution ginseng
Number is 2;In the 2nd R type neural network block in the 7th neural network block, the convolution kernel of the 5th convolutional layer and the 6th convolutional layer
It is 128, zero padding parameter be " same ", step-length is 1 that size, which is 3 × 3, convolution kernel number, the first empty convolutional layer
Convolution kernel size be 3 × 3, convolution kernel number be 128, zero padding parameter is " same ", step-length 1, empty deconvolution parameter be 2;In
In the 2nd R type neural network block in 8th neural network block, the convolution kernel size of the 5th convolutional layer and the 6th convolutional layer is
3 × 3, it be " same ", step-length is 1 that convolution kernel number, which is 64, zero padding parameter, the convolution kernel size of the first empty convolutional layer
For 3 × 3, convolution kernel number be 64, zero padding parameter is " same ", step-length 1, empty deconvolution parameter be 2;4th active coating,
The active mode of five active coatings and the 6th active coating is " Relu ".
5. the road scene semantic segmentation method according to claim 3 or 4 based on the empty convolutional neural networks of Complete Disability difference,
It is characterized in that the first Type B neural network block is identical with the structure of the second Type B neural network block, by successively
The 7th convolutional layer that is arranged, the 7th batch normalization layer, the 7th active coating, the second empty convolutional layer, the 8th batch normalization layer,
8th active coating, the 8th convolutional layer, the 9th batch normalization layer, the 9th active coating composition, the input terminal of the 7th convolutional layer is it
The input terminal of the Type B neural network block at place, the output end that the input terminal of the 7th batch normalization layer receives the 7th convolutional layer are defeated
All characteristic patterns out, the input terminal of the 7th active coating receive all features of the output end output of the 7th batch normalization layer
Figure, the input terminal of the second empty convolutional layer receive all characteristic patterns of the output end output of the 7th active coating, the 8th batch normalizing
The input terminal for changing layer receives all characteristic patterns that the output end of the second empty convolutional layer exports, and the input terminal of the 8th active coating receives
All characteristic patterns of the output end output of 8th batch normalization layer, the input terminal of the 8th convolutional layer receive the defeated of the 8th active coating
All characteristic patterns of outlet output, the input terminal of the 9th batch normalization layer receive all of the output end output of the 8th convolutional layer
Characteristic pattern, the input terminal of the 9th active coating receive all characteristic patterns of the output end output of the 9th batch normalization layer, and the 9th swashs
The output end of layer living is the output end of the Type B neural network block where it;Wherein, the first Type B in the 1st neural network block
In neural network block, the convolution kernel size of the 7th convolutional layer and the 8th convolutional layer is that 3 × 3, convolution kernel number is 128, mends
Zero parameter is that " same ", step-length are 1, the convolution kernel size of the second empty convolutional layer be 3 × 3, convolution kernel number be 128,
Zero padding parameter is " same ", step-length 1, empty deconvolution parameter be 2;The first Type B neural network in the 2nd neural network block
In block, the convolution kernel size of the 7th convolutional layer and the 8th convolutional layer is that 3 × 3, convolution kernel number is that 256, zero padding parameter is equal
It is 1 for " same ", step-length, it is 256, zero padding parameter that the convolution kernel size of the second empty convolutional layer, which is 3 × 3, convolution kernel number,
For " same ", step-length 1, empty deconvolution parameter be 2;In the first Type B neural network block in the 3rd neural network block, the
The convolution kernel size of seven convolutional layers and the 8th convolutional layer is that 3 × 3, convolution kernel number is that 512, zero padding parameter is
" same ", step-length are 1, and the convolution kernel size of the second empty convolutional layer is 3 × 3, convolution kernel number is 512, zero padding parameter is "
Same ", step-length 1, empty deconvolution parameter are 2;In the first Type B neural network block in the 4th neural network block, volume seven
The convolution kernel size of lamination and the 8th convolutional layer is that 3 × 3, convolution kernel number is that 1024, zero padding parameter is " same ", walks
Length is 1, and the convolution kernel size of the second empty convolutional layer is 3 × 3, convolution kernel number is 1024, zero padding parameter is " same ", step
A length of 1, empty deconvolution parameter is 2;In the second Type B neural network block in the 5th neural network block, the 7th convolutional layer and
It is 512, zero padding parameter be " same ", step-length is 1 that the convolution kernel size of eight convolutional layers, which is 3 × 3, convolution kernel number,
The convolution kernel size of second empty convolutional layer is 3 × 3, convolution kernel number is 512, zero padding parameter is " same ", step-length 1, sky
Hole deconvolution parameter is 2;In the second Type B neural network block in the 6th neural network block, the 7th convolutional layer and the 8th convolutional layer
Convolution kernel size be 3 × 3, convolution kernel number be 256, zero padding parameter be " same ", step-length be 1, the second cavity
The convolution kernel size of convolutional layer is 3 × 3, convolution kernel number is 256, zero padding parameter is " same ", step-length 1, empty convolution ginseng
Number is 2;In the second Type B neural network block in the 7th neural network block, the convolution kernel of the 7th convolutional layer and the 8th convolutional layer
It is 128, zero padding parameter be " same ", step-length is 1 that size, which is 3 × 3, convolution kernel number, the second empty convolutional layer
Convolution kernel size be 3 × 3, convolution kernel number be 128, zero padding parameter is " same ", step-length 1, empty deconvolution parameter be 2;In
In the second Type B neural network block in 8th neural network block, the convolution kernel size of the 7th convolutional layer and the 8th convolutional layer is
3 × 3, it be " same ", step-length is 1 that convolution kernel number, which is 64, zero padding parameter, the convolution kernel size of the second empty convolutional layer
For 3 × 3, convolution kernel number be 64, zero padding parameter is " same ", step-length 1, empty deconvolution parameter be 2;7th active coating,
The active mode of eight active coatings and the 9th active coating is " Relu ".
6. the road scene semantic segmentation method according to claim 1 based on the empty convolutional neural networks of Complete Disability difference,
In step 1_2 described in being characterized in that, the structure of the 1st to the 7th warp block is identical, by the deconvolution set gradually
Layer, the tenth batch normalization layer, the tenth active coating composition, the input terminal of warp lamination is the input of the warp block where it
End, the input terminal of the tenth batch normalization layer receive all characteristic patterns of the output end output of warp lamination, the tenth active coating
Input terminal receives all characteristic patterns of the output end output of the tenth batch normalization layer, and the output end of the tenth active coating is where it
Warp block output end;Wherein, the convolution kernel size of the warp lamination in the 1st warp block is 4 × 4, convolution kernel
Number is that 512, zero padding parameter is " same ", step-length 2, the convolution kernel size of the 2nd and the warp lamination in the 3rd warp block
It is 256 for 4 × 4, convolution kernel number, zero padding parameter is " same ", step-length 2, the warp in the 4th and the 5th warp block
The convolution kernel size of lamination be 4 × 4, convolution kernel number be 128, zero padding parameter be " same ", step-length 2, the 6th and the 7th
The convolution kernel size of warp lamination in warp block be 4 × 4, convolution kernel number be 64, zero padding parameter is that " same ", step-length are
2, the active mode of the tenth active coating is " Relu ".
7. the road scene semantic segmentation method according to claim 1 based on the empty convolutional neural networks of Complete Disability difference,
In step 1_2 described in being characterized in that, 4 fused layers are Add fused layer.
8. the road scene semantic segmentation method according to claim 1 based on the empty convolutional neural networks of Complete Disability difference,
In step 1_2 described in being characterized in that, output layer normalizes layer, the tenth by the 9th convolutional layer, the tenth batch set gradually
One active coating composition, the input terminal of the 9th convolutional layer are the input terminal of output layer, and the tenth batch normalizes the input termination of layer
All characteristic patterns of the output end output of the 9th convolutional layer are received, the input terminal of the 11st active coating receives the normalization of the tenth batch
All characteristic patterns of the output end output of layer, the output end of the 11st active coating are the output end of output layer;Wherein, the 9th convolution
The convolution kernel size of layer be 1 × 1, convolution kernel number be 12, zero padding parameter is " same ", the 1, the 11st active coating of step-length it is sharp
Mode living is " Relu ".
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910664797.8A CN110490205B (en) | 2019-07-23 | 2019-07-23 | Road scene semantic segmentation method based on full-residual-error hole convolutional neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910664797.8A CN110490205B (en) | 2019-07-23 | 2019-07-23 | Road scene semantic segmentation method based on full-residual-error hole convolutional neural network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110490205A true CN110490205A (en) | 2019-11-22 |
CN110490205B CN110490205B (en) | 2021-10-12 |
Family
ID=68548010
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910664797.8A Active CN110490205B (en) | 2019-07-23 | 2019-07-23 | Road scene semantic segmentation method based on full-residual-error hole convolutional neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110490205B (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111080646A (en) * | 2019-11-25 | 2020-04-28 | 杭州电子科技大学 | Improved image segmentation method based on wide-activation convolutional neural network |
CN111507990A (en) * | 2020-04-20 | 2020-08-07 | 南京航空航天大学 | Tunnel surface defect segmentation method based on deep learning |
CN111523546A (en) * | 2020-04-16 | 2020-08-11 | 湖南大学 | Image semantic segmentation method, system and computer storage medium |
CN112418228A (en) * | 2020-11-02 | 2021-02-26 | 暨南大学 | Image semantic segmentation method based on multi-feature fusion |
CN112446353A (en) * | 2020-12-14 | 2021-03-05 | 浙江工商大学 | Video image trace line detection method based on deep convolutional neural network |
CN113099066A (en) * | 2019-12-23 | 2021-07-09 | 浙江工商大学 | Large-capacity image steganography method based on multi-scale fusion cavity convolution residual error network |
CN113592009A (en) * | 2021-08-05 | 2021-11-02 | 杭州逗酷软件科技有限公司 | Image semantic segmentation method and device, storage medium and electronic equipment |
WO2022126377A1 (en) * | 2020-12-15 | 2022-06-23 | 中国科学院深圳先进技术研究院 | Traffic lane line detection method and apparatus, and terminal device and readable storage medium |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018128741A1 (en) * | 2017-01-06 | 2018-07-12 | Board Of Regents, The University Of Texas System | Segmenting generic foreground objects in images and videos |
CN108776969A (en) * | 2018-05-24 | 2018-11-09 | 复旦大学 | Breast ultrasound image lesion segmentation approach based on full convolutional network |
CN108985269A (en) * | 2018-08-16 | 2018-12-11 | 东南大学 | Converged network driving environment sensor model based on convolution sum cavity convolutional coding structure |
CN109271856A (en) * | 2018-08-03 | 2019-01-25 | 西安电子科技大学 | Remote sensing image object detection method based on expansion residual error convolution |
CN109447994A (en) * | 2018-11-05 | 2019-03-08 | 陕西师范大学 | In conjunction with the remote sensing image segmentation method of complete residual error and Fusion Features |
US20190096125A1 (en) * | 2017-09-28 | 2019-03-28 | Nec Laboratories America, Inc. | Generating occlusion-aware bird eye view representations of complex road scenes |
EP3499459A1 (en) * | 2017-12-18 | 2019-06-19 | FEI Company | Method, device and system for remote deep learning for microscopic image reconstruction and segmentation |
-
2019
- 2019-07-23 CN CN201910664797.8A patent/CN110490205B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018128741A1 (en) * | 2017-01-06 | 2018-07-12 | Board Of Regents, The University Of Texas System | Segmenting generic foreground objects in images and videos |
US20190096125A1 (en) * | 2017-09-28 | 2019-03-28 | Nec Laboratories America, Inc. | Generating occlusion-aware bird eye view representations of complex road scenes |
EP3499459A1 (en) * | 2017-12-18 | 2019-06-19 | FEI Company | Method, device and system for remote deep learning for microscopic image reconstruction and segmentation |
CN108776969A (en) * | 2018-05-24 | 2018-11-09 | 复旦大学 | Breast ultrasound image lesion segmentation approach based on full convolutional network |
CN109271856A (en) * | 2018-08-03 | 2019-01-25 | 西安电子科技大学 | Remote sensing image object detection method based on expansion residual error convolution |
CN108985269A (en) * | 2018-08-16 | 2018-12-11 | 东南大学 | Converged network driving environment sensor model based on convolution sum cavity convolutional coding structure |
CN109447994A (en) * | 2018-11-05 | 2019-03-08 | 陕西师范大学 | In conjunction with the remote sensing image segmentation method of complete residual error and Fusion Features |
Non-Patent Citations (4)
Title |
---|
CHEN X.等: ""Semantic Segmentation with Modified Deep Residual Networks"", 《CCPR 2016: PATTERN RECOGNITION》 * |
TRAN MINH QUAN等: ""FusionNet: A deep fully residual convolutional neural network for image segmentation in connectomics"", 《ARXIV:1612.05360V2》 * |
俞涛: ""基于深度学习的图像语义分割在自动驾驶中的应用"", 《万方数据库》 * |
王曌: ""面向道路场景理解的语义分割方法研究"", 《中国优秀硕士学位论文全文数据库·信息科技辑》 * |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111080646A (en) * | 2019-11-25 | 2020-04-28 | 杭州电子科技大学 | Improved image segmentation method based on wide-activation convolutional neural network |
CN111080646B (en) * | 2019-11-25 | 2023-09-05 | 杭州电子科技大学 | Improved image segmentation method based on wide-activation convolutional neural network |
CN113099066A (en) * | 2019-12-23 | 2021-07-09 | 浙江工商大学 | Large-capacity image steganography method based on multi-scale fusion cavity convolution residual error network |
CN113099066B (en) * | 2019-12-23 | 2022-09-30 | 浙江工商大学 | Large-capacity image steganography method based on multi-scale fusion cavity convolution residual error network |
CN111523546A (en) * | 2020-04-16 | 2020-08-11 | 湖南大学 | Image semantic segmentation method, system and computer storage medium |
CN111523546B (en) * | 2020-04-16 | 2023-06-16 | 湖南大学 | Image semantic segmentation method, system and computer storage medium |
CN111507990A (en) * | 2020-04-20 | 2020-08-07 | 南京航空航天大学 | Tunnel surface defect segmentation method based on deep learning |
CN112418228A (en) * | 2020-11-02 | 2021-02-26 | 暨南大学 | Image semantic segmentation method based on multi-feature fusion |
CN112418228B (en) * | 2020-11-02 | 2023-07-21 | 暨南大学 | Image semantic segmentation method based on multi-feature fusion |
CN112446353B (en) * | 2020-12-14 | 2023-05-02 | 浙江工商大学 | Video image trace line detection method based on depth convolution neural network |
CN112446353A (en) * | 2020-12-14 | 2021-03-05 | 浙江工商大学 | Video image trace line detection method based on deep convolutional neural network |
WO2022126377A1 (en) * | 2020-12-15 | 2022-06-23 | 中国科学院深圳先进技术研究院 | Traffic lane line detection method and apparatus, and terminal device and readable storage medium |
CN113592009A (en) * | 2021-08-05 | 2021-11-02 | 杭州逗酷软件科技有限公司 | Image semantic segmentation method and device, storage medium and electronic equipment |
Also Published As
Publication number | Publication date |
---|---|
CN110490205B (en) | 2021-10-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110490205A (en) | Road scene semantic segmentation method based on the empty convolutional neural networks of Complete Disability difference | |
CN100380396C (en) | Object detection apparatus, learning apparatus, object detection system, object detection method | |
CN110909673B (en) | Pedestrian re-identification method based on natural language description | |
CN105825235B (en) | A kind of image-recognizing method based on multi-characteristic deep learning | |
Zhao et al. | Pixel-level semantics guided image colorization | |
CN109740413A (en) | Pedestrian recognition methods, device, computer equipment and computer storage medium again | |
CN109840471A (en) | A kind of connecting way dividing method based on improvement Unet network model | |
CN110046550B (en) | Pedestrian attribute identification system and method based on multilayer feature learning | |
CN109635642A (en) | A kind of road scene dividing method based on residual error network and expansion convolution | |
CN109146944A (en) | A kind of space or depth perception estimation method based on the revoluble long-pending neural network of depth | |
CN106096661B (en) | The zero sample image classification method based on relative priority random forest | |
CN111191663A (en) | License plate number recognition method and device, electronic equipment and storage medium | |
CN109446933A (en) | A kind of road scene semantic segmentation method based on convolutional neural networks | |
CN112036260B (en) | Expression recognition method and system for multi-scale sub-block aggregation in natural environment | |
CN114373128A (en) | Remote sensing monitoring method for four mess of rivers and lakes based on category self-adaptive pseudo label generation | |
CN108681689B (en) | Frame rate enhanced gait recognition method and device based on generation of confrontation network | |
CN110458178A (en) | The multi-modal RGB-D conspicuousness object detection method spliced more | |
CN110298394A (en) | A kind of image-recognizing method and relevant apparatus | |
CN109460815A (en) | A kind of monocular depth estimation method | |
CN109508639A (en) | Road scene semantic segmentation method based on multiple dimensioned convolutional neural networks with holes | |
CN110222772B (en) | Medical image annotation recommendation method based on block-level active learning | |
CN113870254B (en) | Target object detection method and device, electronic equipment and storage medium | |
CN111428730B (en) | Weak supervision fine-grained object classification method | |
CN109448039A (en) | A kind of monocular depth estimation method based on depth convolutional neural networks | |
CN117372853A (en) | Underwater target detection algorithm based on image enhancement and attention mechanism |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |