CN110490205A

CN110490205A - Road scene semantic segmentation method based on the empty convolutional neural networks of Complete Disability difference

Info

Publication number: CN110490205A
Application number: CN201910664797.8A
Authority: CN
Inventors: 周武杰; 朱家懿; 叶绿; 雷景生; 王海江; 何成
Original assignee: Zhejiang University of Science and Technology ZUST
Current assignee: Zhejiang Lover Health Science and Technology Development Co Ltd; Zhejiang University of Science and Technology ZUST
Priority date: 2019-07-23
Filing date: 2019-07-23
Publication date: 2019-11-22
Anticipated expiration: 2039-07-23
Also published as: CN110490205B

Abstract

The invention discloses a kind of road scene semantic segmentation methods based on the empty convolutional neural networks of Complete Disability difference, it is in the empty convolutional neural networks of training stage building Complete Disability difference, it includes input layer, hidden layer and output layer, and hidden layer includes 1 transition convolution block, 8 neural network blocks, 7 warp blocks, 4 fused layers；The original road scene image of every in training set is input in the empty convolutional neural networks of Complete Disability difference and is trained, the corresponding 12 width semantic segmentation prognostic chart of every original road scene image is obtained；By calculate set that the corresponding 12 width semantic segmentation prognostic chart of every original road scene image is constituted and corresponding true semantic segmentation image procossing at 12 width one-hot coding image constructions set between loss function value, obtain the empty convolutional neural networks training pattern of Complete Disability difference；It is predicted in test phase using the empty convolutional neural networks training pattern of Complete Disability difference；Advantage is its segmentation accuracy height, and strong robustness.

Description

Road scene semantic segmentation method based on the empty convolutional neural networks of Complete Disability difference

Technical field

The present invention relates to a kind of semantic segmentation methods of deep learning, more particularly, to one kind based on the empty convolution of Complete Disability difference The road scene semantic segmentation method of neural network.

Background technique

The rise of intelligent transportation industry, so that semantic segmentation has more and more applications in intelligent transportation system, from Traffic scene understands and multiple target obstacle detection can all be realized to vision guided navigation by semantic segmentation technology.Currently, the most frequently used Semantic segmentation method have support vector machines, random forest scheduling algorithm.These traditional machine learning methods are concentrated mainly on two In classification task, for detecting and identifying certain objects, such as road surface, vehicle and pedestrian, and generally require to pass through height The feature of complexity is realized.

The semantic segmentation method of deep learning directly carries out the training of pixel scale end-to-end (end-to-end), It only needs to input the image in training set into training in model framework, obtains the corresponding weight of model, it can be to test set It is predicted.The powerful place of convolutional neural networks is its automatic learning characteristic of multilayered structure energy, and may learn The feature of many levels.Currently, the semantic segmentation frame based on deep learning is coding-decoding framework, cataloged procedure substantially In by pond layer gradually decrease location information, extract abstract characteristics；It is gradually recovered location information during decoding, it is general to decode There is direct connection between coding.And convolution (dilated convolutions) with holes is as side common in segmentation task Method has abandoned pond layer, expands perception domain by way of convolution with holes, and the convolution with holes perception domain of smaller value is smaller, learns It practises to the specific feature in part；The convolution with holes of the larger value has biggish perception domain, can learn to more abstract feature, These abstract features are more preferable to robustness such as size, the position and direction of object.

The method that existing road scene semantic segmentation method mostly uses greatly deep learning, using deep learning come to road It is simple and convenient that scene carries out semantic segmentation, it is often more important that, the application of deep learning greatly improves road scene image picture The precision of plain grade classification task.Currently, the road scene semantic segmentation method based on deep learning utilizes convolutional layer and pond layer The model combined is more, however the characteristic pattern for utilizing pondization operation to obtain with convolution operation merely is single and does not have representative Property, so that the characteristic information for the image that will lead to is reduced, the effect information for eventually resulting in reduction is relatively rough, segmentation essence It spends low.

Summary of the invention

Technical problem to be solved by the invention is to provide a kind of road fields based on the empty convolutional neural networks of Complete Disability difference Scape semantic segmentation method, segmentation accuracy is high, and strong robustness.

The technical scheme of the invention to solve the technical problem is: a kind of based on the empty convolutional Neural net of Complete Disability difference The road scene semantic segmentation method of network, it is characterised in that including two processes of training stage and test phase；

The specific steps of the training stage process are as follows:

Step 1_1: Q original road scene image and the corresponding true language of every original road scene image are chosen The q original road scene image in training set is denoted as { I by adopted segmented image, and composing training collection^q(i, j) }, it will instruct Practice and concentrates and { I^q(i, j) } corresponding true semantic segmentation image is denoted asThen it will be instructed using one-hot coding technology The corresponding true semantic segmentation image procossing of every original road scene image that white silk is concentrated, will at 12 width one-hot coding imagesThe set for 12 width one-hot coding image constructions being processed into is denoted asWherein, road scene image is RGB color Chromatic graph picture, Q are positive integer, and Q >=200, q are positive integer, and 1≤q≤Q, 1≤i≤W, 1≤j≤H, W indicate { I^q(i, j) } width Degree, H indicate { I^q(i, j) } height, I^q(i, j) indicates { I^q(i, j) } in coordinate position be (i, j) pixel pixel value,It indicatesMiddle coordinate position is the pixel value of the pixel of (i, j)；

Step 1_2: the empty convolutional neural networks of building Complete Disability difference: the empty convolutional neural networks of Complete Disability difference include input layer, Hidden layer and output layer, hidden layer include 1 transition convolution block, 8 neural network blocks, 7 warp blocks, 4 fused layers；

For input layer, input terminal receives R channel components, G channel components and the channel B component of a width input picture, Its output end exports the R channel components, G channel components and channel B component of input picture to hidden layer；Wherein, it is desirable that input layer The width of the received input picture of input terminal be W, be highly H；

For hidden layer, the input terminal of transition convolution block is the input terminal of hidden layer, receives the output end output of input layer Input picture R channel components, G channel components and channel B component, it is W that the output end of transition convolution block, which exports 64 breadth degree, And height is the characteristic pattern of H, and the set that this 64 width characteristic pattern is constituted is denoted as G₁；The input terminal of 1st neural network block receives G₁ In all characteristic patterns, the output end of the 1st neural network block exports 128 breadth degree and isAnd height isCharacteristic pattern, will The set that this 128 width characteristic pattern is constituted is denoted as S₁；The input terminal of 2nd neural network block receives S₁In all characteristic patterns, the 2nd The output end of a neural network block exports 256 breadth degreeAnd height isCharacteristic pattern, by this 256 width characteristic pattern constitute Set be denoted as S₂；The input terminal of 3rd neural network block receives S₂In all characteristic patterns, the output of the 3rd neural network block End exports 512 breadth degreeAnd height isCharacteristic pattern, by this 512 width characteristic pattern constitute set be denoted as S₃；4th The input terminal of neural network block receives S₃In all characteristic patterns, the output end of the 4th neural network block exports 1024 breadth degree ForAnd height isCharacteristic pattern, by this 1024 width characteristic pattern constitute set be denoted as S₄；The input of 1st warp block End receives S₄In all characteristic patterns, the output end of the 1st warp block exports 512 breadth degree and isAnd height isSpy The set that this 512 width characteristic pattern is constituted is denoted as F by sign figure₁；The input terminal of 5th neural network block receives S₃In all features Figure, the output end of the 5th neural network block export 512 breadth degree and areAnd height isCharacteristic pattern, by this 512 width feature The set that figure is constituted is denoted as S₅；The input terminal of 1st fused layer receives F₁In all characteristic patterns and S₅In all characteristic patterns, The output end 512 breadth degree of output of the 1st fused layer are after addition mixing operationAnd height isCharacteristic pattern, by this 512 The set that width characteristic pattern is constituted is denoted as A₁；The input terminal of 2nd warp block receives A₁In all characteristic patterns, the 2nd deconvolution The output end of block exports 256 breadth degreeAnd height isCharacteristic pattern, by this 256 width characteristic pattern constitute set be denoted as F₂；The input terminal of 6th neural network block receives S₂In all characteristic patterns, the output end output 256 of the 6th neural network block Breadth degree isAnd height isCharacteristic pattern, by this 256 width characteristic pattern constitute set be denoted as S₆；3rd warp block Input terminal receives S₃In all characteristic patterns, the output end of the 3rd warp block exports 256 breadth degree and isAnd height is Characteristic pattern, by this 256 width characteristic pattern constitute set be denoted as F₃；The input terminal of 2nd fused layer receives F₂In all features Figure, S₆In all characteristic patterns and F₃In all characteristic patterns, be added the output end output 256 of the 2nd fused layer after mixing operation Breadth degree isAnd height isCharacteristic pattern, by this 256 width characteristic pattern constitute set be denoted as A₂；4th warp block Input terminal receives A₂In all characteristic patterns, the output end of the 4th warp block exports 128 breadth degree and isAnd height is Characteristic pattern, by this 128 width characteristic pattern constitute set be denoted as F₄；The input terminal of 7th neural network block receives S₁In it is all Characteristic pattern, the output end of the 7th neural network block export 128 breadth degree and areAnd height isCharacteristic pattern, by this 128 width The set that characteristic pattern is constituted is denoted as S₇；The input terminal of 5th warp block receives S₂In all characteristic patterns, the 5th warp block Output end export 128 breadth degree beAnd height isCharacteristic pattern, by this 128 width characteristic pattern constitute set be denoted as F₅； The input terminal of 3rd fused layer receives F₄In all characteristic patterns, S₇In all characteristic patterns and F₅In all characteristic patterns, phase Add the output end of the 3rd fused layer after mixing operation to export 128 breadth degree to beAnd height isCharacteristic pattern, by this 128 width The set that characteristic pattern is constituted is denoted as A₃；The input terminal of 6th warp block receives A₃In all characteristic patterns, the 6th warp block Output end export 64 breadth degree be W and height be H characteristic pattern, by this 64 width characteristic pattern constitute set be denoted as F₆；8th The input terminal of neural network block receives G₁In all characteristic patterns, it is W that the output end of the 8th neural network block, which exports 64 breadth degree, And height is the characteristic pattern of H, and the set that this 64 width characteristic pattern is constituted is denoted as S₈；The input terminal of 7th warp block receives S₁In All characteristic patterns, the output end of the 7th warp block exports the characteristic pattern that 64 breadth degree are W and height is H, by this 64 width spy The set that sign figure is constituted is denoted as F₇；The input terminal of 4th fused layer receives F₆In all characteristic patterns, S₈In all characteristic patterns And F₇In all characteristic patterns, the output end for being added the 4th fused layer after mixing operation exports that 64 breadth degree are W and height is H Characteristic pattern, by this 64 width characteristic pattern constitute set be denoted as A₄, the output end of the 4th fused layer is the output end of hidden layer；

For output layer, input terminal receives A₄In all characteristic patterns, output end export 12 breadth degree be W and height For the characteristic pattern of H, the set that this 12 width characteristic pattern is constituted is denoted as O₁；

Step 1_3: using the original road scene image of every in training set as input picture, it is empty to be input to Complete Disability difference It is trained in the convolutional neural networks of hole, obtains the corresponding 12 width semanteme point of every original road scene image in training set Prognostic chart is cut, by { I^q(i, j) } set that constitutes of corresponding 12 width semantic segmentation prognostic chart is denoted as

Step 1_4: the corresponding 12 width semantic segmentation prognostic chart of every original road scene image in training set is calculated Composition set with corresponding true semantic segmentation image procossing at 12 width one-hot coding image constructions set between damage Functional value is lost, it willWithBetween loss function value be denoted asUsing Negative Log-liklihood function obtains；

Step 1_5: repeating step 1_3 and step 1_4 is V times total, obtains the empty convolutional neural networks training of Complete Disability difference Model, and Q × V loss function value is obtained；Then the smallest loss function value of value is found out from Q × V loss function value； Then it will be worth the corresponding weighted vector of the smallest loss function value and bias term to should be used as the empty convolutional neural networks of Complete Disability difference The optimal bias term of best initial weights vector sum of training pattern, correspondence are denoted as W^bestAnd b^best；Wherein, V > 1；

The specific steps of the test phase process are as follows:

Step 2_1: it enablesIndicate the road scene image to semantic segmentation；Wherein, 1≤i'≤W', 1≤j'≤ H', W' are indicatedWidth, H' indicateHeight,It indicatesMiddle coordinate position is The pixel value of the pixel of (i, j)；

Step 2_2: willR channel components, G channel components and channel B component be input to Complete Disability difference cavity volume In product neural network training model, and utilize W^bestAnd b^bestIt is predicted, is obtainedCorresponding prediction semantic segmentation figure Picture is denoted asWherein,It indicatesMiddle coordinate position is the picture of the pixel of (i', j') Element value.

In the step 1_2, transition convolution block is by the first convolutional layer, the first batch normalization layer, that set gradually One active coating, the second convolutional layer, second batch normalization layer, the second active coating, third convolutional layer, third batch normalization layer, Third active coating composition, the input terminal of the first convolutional layer are the input terminal of transition convolution block, and the first batch normalizes the input of layer End receives all characteristic patterns of the output end output of the first convolutional layer, and the input terminal of the first active coating receives the first batch and normalizes All characteristic patterns of the output end output of layer, the input terminal of the second convolutional layer receive all of the output end output of the first active coating Characteristic pattern, the input terminal of the second batch normalization layer receive all characteristic patterns of the output end output of the second convolutional layer, and second swashs The input terminal of layer living receives all characteristic patterns of the output end output of the second batch normalization layer, the input termination of third convolutional layer All characteristic patterns of the output end output of the second active coating are received, the input terminal of third batch normalization layer receives third convolutional layer All characteristic patterns of output end output, the input terminal of third active coating receive the institute of the output end output of third batch normalization layer There is characteristic pattern, the output end of third active coating is the output end of transition convolution block；Wherein, the first convolutional layer, the second convolutional layer and It is 64, zero padding parameter be " same ", step-length is 1 that the convolution kernel size of third convolutional layer, which is 3 × 3, convolution kernel number, The active mode of first active coating, the second active coating and third active coating is " Relu ".

In the step 1_2, the structure of the 1st to the 4th neural network block is identical, by the Volume Four set gradually Lamination, the first R type neural network block and the first Type B neural network block composition, the input terminal of Volume Four lamination are the mind where it Input terminal through network block, the input terminal of the first R type neural network block receive all spies of the output end output of Volume Four lamination Sign figure, the input terminal of the first Type B neural network block receive all characteristic patterns of the output end output of the first R type neural network block, The output end of first Type B neural network block is the output end of the neural network block where it；Wherein, in the 1st neural network block Volume Four lamination convolution kernel size be 3 × 3, convolution kernel number be 128, zero padding parameter be " same ", step-length 2, the 1st The convolution kernel number of the first R type neural network block and the first Type B neural network block in neural network block is 128, the 2nd mind Convolution kernel size through the Volume Four lamination in network block be 3 × 3, convolution kernel number be 256, zero padding parameter be " same ", step The convolution kernel number of the first R type neural network block and the first Type B neural network block in a length of 2, the 2nd neural network block is It is 512, zero padding parameter that the convolution kernel size of Volume Four lamination in 256, the 3rd neural network block, which is 3 × 3, convolution kernel number, For " same ", step-length 2, the volume of the first R type neural network block and the first Type B neural network block in the 3rd neural network block Product core number is 512, and the convolution kernel size of the Volume Four lamination in the 4th neural network block is that 3 × 3, convolution kernel number is 1024, zero padding parameter is " same ", step-length 2, the first R type neural network block and the first Type B mind in the 4th neural network block Convolution kernel number through network block is 1024；

The structure of 5th to the 8th neural network block is identical, the 2nd R type neural network block by setting gradually and Two Type B neural network blocks composition, the input terminal of the 2nd R type neural network block are the input terminal of the neural network block where it, the The input terminal of two Type B neural network blocks receives all characteristic patterns of the output end output of the 2nd R type neural network block, the second Type B The output end of neural network block is the output end of the neural network block where it；Wherein, the 2nd R in the 5th neural network block The convolution kernel number of type neural network block and the second Type B neural network block is 512, the 2nd R type in the 6th neural network block The convolution kernel number of neural network block and the second Type B neural network block is 256, the 2nd R type mind in the 7th neural network block Convolution kernel number through network block and the second Type B neural network block is 128, the 2nd R type nerve in the 8th neural network block The convolution kernel number of network block and the second Type B neural network block is 64.

The first R type neural network block is identical with the structure of the 2nd R type neural network block, by successively setting The 5th convolutional layer set, the 4th batch normalization layer, the 4th active coating, the first empty convolutional layer, the 5th batch normalization layer, the Five active coatings, the 6th convolutional layer, the 6th batch normalization layer, the 6th active coating composition, the input terminal of the 5th convolutional layer is its institute R type neural network block input terminal, the 4th batch normalization layer input terminal receive the 5th convolutional layer output end output All characteristic patterns, the input terminal of the 4th active coating receives all characteristic patterns of the output end output of the 4th batch normalization layer, The input terminal of first empty convolutional layer receives all characteristic patterns of the output end output of the 4th active coating, the 5th batch normalization layer Input terminal receive the first empty convolutional layer output end output all characteristic patterns, the input terminal of the 5th active coating receives the 5th All characteristic patterns of the output end output of batch normalization layer, the input terminal of the 6th convolutional layer receive the output end of the 5th active coating All characteristic patterns of output, the input terminal of the 6th batch normalization layer receive all features of the output end output of the 6th convolutional layer Figure, the input terminal of the 6th active coating receive all characteristic patterns of the output end output of the 6th batch normalization layer, will enter into the All characteristic patterns of the output end output of all characteristic patterns and the 6th active coating of the input terminal of five convolutional layers carry out jump connection All characteristic patterns of the output end output of R type neural network block as where afterwards；Wherein, in the 1st neural network block In first R type neural network block, the convolution kernel size of the 5th convolutional layer and the 6th convolutional layer is that 3 × 3, convolution kernel number is 128, it is 1 that zero padding parameter, which is " same ", step-length, and the convolution kernel size of the first empty convolutional layer is 3 × 3, convolution kernel number It is for 128, zero padding parameter " same ", step-length 1, empty deconvolution parameter be 2；The first R type mind in the 2nd neural network block Through in network block, it is 256, zero padding that the convolution kernel size of the 5th convolutional layer and the 6th convolutional layer, which is 3 × 3, convolution kernel number, Parameter is that " same ", step-length are 1, the convolution kernel size of the first empty convolutional layer be 3 × 3, convolution kernel number be 256, benefit Zero parameter is " same ", step-length 1, empty deconvolution parameter be 2；The first R type neural network block in the 3rd neural network block In, the convolution kernel size of the 5th convolutional layer and the 6th convolutional layer is that 3 × 3, convolution kernel number is that 512, zero padding parameter is " same ", step-length are 1, and the convolution kernel size of the first empty convolutional layer is 3 × 3, convolution kernel number is 512, zero padding parameter is " Same ", step-length 1, empty deconvolution parameter are 2；In the first R type neural network block in the 4th neural network block, volume five The convolution kernel size of lamination and the 6th convolutional layer is that 3 × 3, convolution kernel number is that 1024, zero padding parameter is " same ", walks Length is 1, and the convolution kernel size of the first empty convolutional layer is 3 × 3, convolution kernel number is 1024, zero padding parameter is " same ", step A length of 1, empty deconvolution parameter is 2；In the 2nd R type neural network block in the 5th neural network block, the 5th convolutional layer and It is 512, zero padding parameter be " same ", step-length is 1 that the convolution kernel size of six convolutional layers, which is 3 × 3, convolution kernel number, The convolution kernel size of first empty convolutional layer is 3 × 3, convolution kernel number is 512, zero padding parameter is " same ", step-length 1, sky Hole deconvolution parameter is 2；In the 2nd R type neural network block in the 6th neural network block, the 5th convolutional layer and the 6th convolutional layer Convolution kernel size be 3 × 3, convolution kernel number be 256, zero padding parameter be " same ", step-length be 1, the first cavity The convolution kernel size of convolutional layer is 3 × 3, convolution kernel number is 256, zero padding parameter is " same ", step-length 1, empty convolution ginseng Number is 2；In the 2nd R type neural network block in the 7th neural network block, the convolution kernel of the 5th convolutional layer and the 6th convolutional layer It is 128, zero padding parameter be " same ", step-length is 1 that size, which is 3 × 3, convolution kernel number, the first empty convolutional layer Convolution kernel size be 3 × 3, convolution kernel number be 128, zero padding parameter is " same ", step-length 1, empty deconvolution parameter be 2；In In the 2nd R type neural network block in 8th neural network block, the convolution kernel size of the 5th convolutional layer and the 6th convolutional layer is 3 × 3, it be " same ", step-length is 1 that convolution kernel number, which is 64, zero padding parameter, the convolution kernel size of the first empty convolutional layer For 3 × 3, convolution kernel number be 64, zero padding parameter is " same ", step-length 1, empty deconvolution parameter be 2；4th active coating, The active mode of five active coatings and the 6th active coating is " Relu ".

The first Type B neural network block is identical with the structure of the second Type B neural network block, by successively setting The 7th convolutional layer set, the 7th batch normalization layer, the 7th active coating, the second empty convolutional layer, the 8th batch normalization layer, the Eight active coatings, the 8th convolutional layer, the 9th batch normalization layer, the 9th active coating composition, the input terminal of the 7th convolutional layer is its institute Type B neural network block input terminal, the 7th batch normalization layer input terminal receive the 7th convolutional layer output end output All characteristic patterns, the input terminal of the 7th active coating receives all characteristic patterns of the output end output of the 7th batch normalization layer, The input terminal of second empty convolutional layer receives all characteristic patterns of the output end output of the 7th active coating, the 8th batch normalization layer Input terminal receive the second empty convolutional layer output end output all characteristic patterns, the input terminal of the 8th active coating receives the 8th All characteristic patterns of the output end output of batch normalization layer, the input terminal of the 8th convolutional layer receive the output end of the 8th active coating All characteristic patterns of output, the input terminal of the 9th batch normalization layer receive all features of the output end output of the 8th convolutional layer Figure, the input terminal of the 9th active coating receive all characteristic patterns of the output end output of the 9th batch normalization layer, the 9th active coating Output end be its where Type B neural network block output end；Wherein, the first Type B nerve in the 1st neural network block In network block, the convolution kernel size of the 7th convolutional layer and the 8th convolutional layer is that 3 × 3, convolution kernel number is 128, zero padding ginseng Number is that " same ", step-length are 1, and it is 128, zero padding that the convolution kernel size of the second empty convolutional layer, which is 3 × 3, convolution kernel number, Parameter is " same ", step-length 1, empty deconvolution parameter be 2；The first Type B neural network block in the 2nd neural network block In, the convolution kernel size of the 7th convolutional layer and the 8th convolutional layer is that 3 × 3, convolution kernel number is that 256, zero padding parameter is " same ", step-length are 1, and the convolution kernel size of the second empty convolutional layer is 3 × 3, convolution kernel number is 256, zero padding parameter is " Same ", step-length 1, empty deconvolution parameter are 2；In the first Type B neural network block in the 3rd neural network block, volume seven The convolution kernel size of lamination and the 8th convolutional layer is that 3 × 3, convolution kernel number is that 512, zero padding parameter is " same ", walks Length is 1, and the convolution kernel size of the second empty convolutional layer is 3 × 3, convolution kernel number is 512, zero padding parameter is " same ", step A length of 1, empty deconvolution parameter is 2；In the first Type B neural network block in the 4th neural network block, the 7th convolutional layer and It is 1024, zero padding parameter be " same ", step-length is 1 that the convolution kernel size of eight convolutional layers, which is 3 × 3, convolution kernel number, The convolution kernel size of second empty convolutional layer is 3 × 3, convolution kernel number is 1024, zero padding parameter is " same ", step-length 1, sky Hole deconvolution parameter is 2；In the second Type B neural network block in the 5th neural network block, the 7th convolutional layer and the 8th convolutional layer Convolution kernel size be 3 × 3, convolution kernel number be 512, zero padding parameter be " same ", step-length be 1, the second cavity The convolution kernel size of convolutional layer is 3 × 3, convolution kernel number is 512, zero padding parameter is " same ", step-length 1, empty convolution ginseng Number is 2；In the second Type B neural network block in the 6th neural network block, the convolution kernel of the 7th convolutional layer and the 8th convolutional layer It is 256, zero padding parameter be " same ", step-length is 1 that size, which is 3 × 3, convolution kernel number, the second empty convolutional layer Convolution kernel size be 3 × 3, convolution kernel number be 256, zero padding parameter is " same ", step-length 1, empty deconvolution parameter be 2；In In the second Type B neural network block in 7th neural network block, the convolution kernel size of the 7th convolutional layer and the 8th convolutional layer is 3 × 3, it be " same ", step-length is 1 that convolution kernel number, which is 128, zero padding parameter, and the convolution kernel of the second empty convolutional layer is big Small is 3 × 3, convolution kernel number is 128, zero padding parameter is " same ", step-length 1, empty deconvolution parameter be 2；In the 8th nerve In the second Type B neural network block in network block, the convolution kernel size of the 7th convolutional layer and the 8th convolutional layer is 3 × 3, convolution It is " same ", step-length is 1 that core number, which is 64, zero padding parameter, and the convolution kernel size of the second empty convolutional layer is 3 × 3, volume Product core number is 64, zero padding parameter is " same ", step-length 1, empty deconvolution parameter be 2；7th active coating, the 8th active coating and The active mode of 9th active coating is " Relu ".

In the step 1_2, the structure of the 1st to the 7th warp block is identical, by the deconvolution set gradually Layer, the tenth batch normalization layer, the tenth active coating composition, the input terminal of warp lamination is the input of the warp block where it End, the input terminal of the tenth batch normalization layer receive all characteristic patterns of the output end output of warp lamination, the tenth active coating Input terminal receives all characteristic patterns of the output end output of the tenth batch normalization layer, and the output end of the tenth active coating is where it Warp block output end；Wherein, the convolution kernel size of the warp lamination in the 1st warp block is 4 × 4, convolution kernel Number is that 512, zero padding parameter is " same ", step-length 2, the convolution kernel size of the 2nd and the warp lamination in the 3rd warp block It is 256 for 4 × 4, convolution kernel number, zero padding parameter is " same ", step-length 2, the warp in the 4th and the 5th warp block The convolution kernel size of lamination be 4 × 4, convolution kernel number be 128, zero padding parameter be " same ", step-length 2, the 6th and the 7th The convolution kernel size of warp lamination in warp block be 4 × 4, convolution kernel number be 64, zero padding parameter is that " same ", step-length are 2, the active mode of the tenth active coating is " Relu ".

In the step 1_2,4 fused layers are Add fused layer.

In the step 1_2, output layer normalizes layer, the tenth by the 9th convolutional layer, the tenth batch set gradually One active coating composition, the input terminal of the 9th convolutional layer are the input terminal of output layer, and the tenth batch normalizes the input termination of layer All characteristic patterns of the output end output of the 9th convolutional layer are received, the input terminal of the 11st active coating receives the normalization of the tenth batch All characteristic patterns of the output end output of layer, the output end of the 11st active coating are the output end of output layer；Wherein, the 9th convolution The convolution kernel size of layer be 1 × 1, convolution kernel number be 12, zero padding parameter is " same ", the 1, the 11st active coating of step-length it is sharp Mode living is " Relu ".

Compared with the prior art, the advantages of the present invention are as follows:

1) the method for the present invention constructs the empty convolutional neural networks of Complete Disability difference, and the convolutional layer for being 2 with step-length is instead of existing rank The common pond layer of section, since pond layer can cause irreversible characteristic loss to image, and semantic segmentation is to precision of prediction It is required that it is very high, therefore step-length has been selected to be substituted for 2 convolutional layer, the available effect identical with pond layer of the convolutional layer Fruit, and can effectively avoid irreversible information loss caused by pond, that is, characteristics of image has been effectively ensured and has not had excessive loss.

2) the method for the present invention expands network receptive field using empty convolutional layer, and due to pond layer the advantages of can more than have Reduction image size is imitated, receptive field can be expanded effectively also to guarantee to extract more global informations, therefore be 2 with step-length When convolutional layer substitutes pond layer, receptive field is not expanded effectively, has lost part global information, therefore empty convolution is added Layer, to guarantee that network receptive field is constant or even increases, empty convolutional layer is combined with the convolutional layer that step-length is 2, it is ensured that complete Residual error cavity convolutional neural networks extract most local feature and global characteristics.

3) the method for the present invention uses jump when building Complete Disability difference cavity convolutional neural networks and is connected to main company Mode is connect, to constitute Complete Disability difference network, residual error network has always very outstanding performance on semantic segmentation direction, therefore at this Jump connection is added in inventive method, it can be with the loss of effective compensation image in an encoding process, to guarantee last prediction essence Degree, and advanced features and low-level features have preferably been merged in connection of jumping, and avoid gradient disappearance or gradient explosion, thus Improve the robustness of the empty convolutional neural networks training pattern of Complete Disability difference.

Detailed description of the invention

Fig. 1 is the composed structure schematic diagram of the empty convolutional neural networks of Complete Disability difference constructed in the method for the present invention；

Fig. 2 a is the 1st original road scene image of Same Scene；

Fig. 2 b is to be predicted using the method for the present invention road scene image original shown in Fig. 2 a, obtained prediction Semantic segmentation image；

Fig. 3 a is the 2nd original road scene image of Same Scene；

Fig. 3 b is to be predicted using the method for the present invention road scene image original shown in Fig. 3 a, obtained prediction Semantic segmentation image；

Fig. 4 a is the 3rd original road scene image of Same Scene；

Fig. 4 b is to be predicted using the method for the present invention road scene image original shown in Fig. 4 a, obtained prediction Semantic segmentation image；

Fig. 5 a is the 4th original road scene image of Same Scene；

Fig. 5 b is to be predicted using the method for the present invention road scene image original shown in Fig. 5 a, obtained prediction Semantic segmentation image.

Specific embodiment

The present invention will be described in further detail below with reference to the embodiments of the drawings.

A kind of road scene semantic segmentation method based on the empty convolutional neural networks of Complete Disability difference proposed by the present invention, packet Include two processes of training stage and test phase.

The specific steps of the training stage process are as follows:

Step 1_1: Q original road scene image and the corresponding true language of every original road scene image are chosen The q original road scene image in training set is denoted as { I by adopted segmented image, and composing training collection^q(i, j) }, it will instruct Practice and concentrates and { I^q(i, j) } corresponding true semantic segmentation image is denoted asThen existing one-hot coding skill is used Art (one-hot) is by the corresponding true semantic segmentation image procossing of the original road scene image of every in training set at 12 width One-hot coding image, willThe set for 12 width one-hot coding image constructions being processed into is denoted asWherein, road Scene image is RGB color image, and Q is positive integer, Q >=200, and such as taking Q=367, q is positive integer, 1≤q≤Q, 1≤i≤W, 1 ≤ j≤H, W indicate { I^q(i, j) } width, H indicate { I^q(i, j) } height, such as take W=480, H=360, I^q(i, j) is indicated {I^q(i, j) } in coordinate position be (i, j) pixel pixel value,It indicatesMiddle coordinate position is The pixel value of the pixel of (i, j).

Here, original road scene image directly selects 367 in road scene image database CamVid training set Width image.

Step 1_2: the empty convolutional neural networks of building Complete Disability difference: as shown in Figure 1, the empty convolutional neural networks packet of Complete Disability difference Input layer, hidden layer and output layer are included, hidden layer includes 1 transition convolution block, 8 neural network blocks, 7 warp blocks, 4 Fused layer.

For input layer, input terminal receives R channel components, G channel components and the channel B component of a width input picture, Its output end exports the R channel components, G channel components and channel B component of input picture to hidden layer；Wherein, it is desirable that input layer The width of the received input picture of input terminal be W, be highly H.

For hidden layer, the input terminal of transition convolution block is the input terminal of hidden layer, receives the output end output of input layer Input picture R channel components, G channel components and channel B component, it is W that the output end of transition convolution block, which exports 64 breadth degree, And height is the characteristic pattern of H, and the set that this 64 width characteristic pattern is constituted is denoted as G₁；The input terminal of 1st neural network block receives G₁ In all characteristic patterns, the output end of the 1st neural network block exports 128 breadth degree and isAnd height isCharacteristic pattern, will The set that this 128 width characteristic pattern is constituted is denoted as S₁；The input terminal of 2nd neural network block receives S₁In all characteristic patterns, the 2nd The output end of a neural network block exports 256 breadth degreeAnd height isCharacteristic pattern, by this 256 width characteristic pattern constitute Set be denoted as S₂；The input terminal of 3rd neural network block receives S₂In all characteristic patterns, the output of the 3rd neural network block End exports 512 breadth degreeAnd height isCharacteristic pattern, by this 512 width characteristic pattern constitute set be denoted as S₃；4th The input terminal of neural network block receives S₃In all characteristic patterns, the output end of the 4th neural network block exports 1024 breadth degree ForAnd height isCharacteristic pattern, by this 1024 width characteristic pattern constitute set be denoted as S₄；The input of 1st warp block End receives S₄In all characteristic patterns, the output end of the 1st warp block exports 512 breadth degree and isAnd height isSpy The set that this 512 width characteristic pattern is constituted is denoted as F by sign figure₁；The input terminal of 5th neural network block receives S₃In all features Figure, the output end of the 5th neural network block export 512 breadth degree and areAnd height isCharacteristic pattern, by this 512 width feature The set that figure is constituted is denoted as S₅；The input terminal of 1st fused layer receives F₁In all characteristic patterns and S₅In all characteristic patterns, The output end 512 breadth degree of output of the 1st fused layer are after addition mixing operationAnd height isCharacteristic pattern, by this 512 The set that width characteristic pattern is constituted is denoted as A₁；The input terminal of 2nd warp block receives A₁In all characteristic patterns, the 2nd deconvolution The output end of block exports 256 breadth degreeAnd height isCharacteristic pattern, by this 256 width characteristic pattern constitute set be denoted as F₂；The input terminal of 6th neural network block receives S₂In all characteristic patterns, the output end output 256 of the 6th neural network block Breadth degree isAnd height isCharacteristic pattern, by this 256 width characteristic pattern constitute set be denoted as S₆；3rd warp block Input terminal receives S₃In all characteristic patterns, the output end of the 3rd warp block exports 256 breadth degree and isAnd height is Characteristic pattern, by this 256 width characteristic pattern constitute set be denoted as F₃；The input terminal of 2nd fused layer receives F₂In all features Figure, S₆In all characteristic patterns and F₃In all characteristic patterns, be added the output end output 256 of the 2nd fused layer after mixing operation Breadth degree isAnd height isCharacteristic pattern, by this 256 width characteristic pattern constitute set be denoted as A₂；4th warp block Input terminal receives A₂In all characteristic patterns, the output end of the 4th warp block exports 128 breadth degree and isAnd height is Characteristic pattern, by this 128 width characteristic pattern constitute set be denoted as F₄；The input terminal of 7th neural network block receives S₁In it is all Characteristic pattern, the output end of the 7th neural network block export 128 breadth degree and areAnd height isCharacteristic pattern, by this 128 width The set that characteristic pattern is constituted is denoted as S₇；The input terminal of 5th warp block receives S₂In all characteristic patterns, the 5th warp block Output end export 128 breadth degree beAnd height isCharacteristic pattern, by this 128 width characteristic pattern constitute set be denoted as F₅； The input terminal of 3rd fused layer receives F₄In all characteristic patterns, S₇In all characteristic patterns and F₅In all characteristic patterns, phase Add the output end of the 3rd fused layer after mixing operation to export 128 breadth degree to beAnd height isCharacteristic pattern, by this 128 width The set that characteristic pattern is constituted is denoted as A₃；The input terminal of 6th warp block receives A₃In all characteristic patterns, the 6th warp block Output end export 64 breadth degree be W and height be H characteristic pattern, by this 64 width characteristic pattern constitute set be denoted as F₆；8th The input terminal of neural network block receives G₁In all characteristic patterns, it is W that the output end of the 8th neural network block, which exports 64 breadth degree, And height is the characteristic pattern of H, and the set that this 64 width characteristic pattern is constituted is denoted as S₈；The input terminal of 7th warp block receives S₁In All characteristic patterns, the output end of the 7th warp block exports the characteristic pattern that 64 breadth degree are W and height is H, by this 64 width spy The set that sign figure is constituted is denoted as F₇；The input terminal of 4th fused layer receives F₆In all characteristic patterns, S₈In all characteristic patterns And F₇In all characteristic patterns, the output end for being added the 4th fused layer after mixing operation exports that 64 breadth degree are W and height is H Characteristic pattern, by this 64 width characteristic pattern constitute set be denoted as A₄, the output end of the 4th fused layer is the output end of hidden layer.

For output layer, input terminal receives A₄In all characteristic patterns, output end export 12 breadth degree be W and height For the characteristic pattern of H, the set that this 12 width characteristic pattern is constituted is denoted as O₁。

Step 1_4: the corresponding 12 width semantic segmentation prognostic chart of every original road scene image in training set is calculated Composition set with corresponding true semantic segmentation image procossing at 12 width one-hot coding image constructions set between damage Functional value is lost, it willWithBetween loss function value be denoted asUsing Negative Log-liklihood (NLLLoss) function obtains.

Step 1_5: repeating step 1_3 and step 1_4 is V times total, obtains the empty convolutional neural networks training of Complete Disability difference Model, and Q × V loss function value is obtained；Then the smallest loss function value of value is found out from Q × V loss function value； Then it will be worth the corresponding weighted vector of the smallest loss function value and bias term to should be used as the empty convolutional neural networks of Complete Disability difference The optimal bias term of best initial weights vector sum of training pattern, correspondence are denoted as W^bestAnd b^best；Wherein, V > 1, takes in the present embodiment V=500.

The specific steps of the test phase process are as follows:

Step 2_1: it enablesIndicate the road scene image to semantic segmentation；Wherein, 1≤i'≤W', 1≤j'≤ H', W' are indicatedWidth, H' indicateHeight,It indicatesMiddle coordinate position is The pixel value of the pixel of (i, j).

Step 2_2: willR channel components, G channel components and channel B component be input to Complete Disability difference cavity volume In product neural network training model, and utilize W^bestAnd b^bestIt is predicted, is obtainedCorresponding prediction semantic segmentation Image is denoted asWherein,It indicatesMiddle coordinate position is the pixel of (i', j') Pixel value.

In this particular embodiment, in step 1_2, transition convolution block is by the first convolutional layer for setting gradually (Convolution, Conv), the first batch normalization layer (Batch Normalization, BN), the first active coating (Activation, Act), the second convolutional layer, the second batch normalization layer, the second active coating, third convolutional layer, third batch are returned One changes layer, third active coating composition, and the input terminal of the first convolutional layer is the input terminal of transition convolution block, the first batch normalization layer Input terminal receive the first convolutional layer output end output all characteristic patterns, the input terminal of the first active coating receives the first batch All characteristic patterns of the output end output of layer are normalized, the input terminal of the second convolutional layer receives the output end output of the first active coating All characteristic patterns, second batch normalization layer input terminal receive the second convolutional layer output end output all characteristic patterns, The input terminal of second active coating receives all characteristic patterns of the output end output of the second batch normalization layer, third convolutional layer it is defeated Enter all characteristic patterns that end receives the output end output of the second active coating, the input terminal of third batch normalization layer receives third volume All characteristic patterns of the output end output of lamination, the output end that the input terminal of third active coating receives third batch normalization layer are defeated All characteristic patterns out, the output end of third active coating are the output end of transition convolution block；Wherein, the first convolutional layer, volume Two The convolution kernel size (kernel_size) of lamination and third convolutional layer be 3 × 3, convolution kernel number (filters) be 64, It is 1 that zero padding (padding) parameter, which is " same ", step-length (stride), the first active coating, the second active coating and third activation The active mode of layer is " Relu ".

In this particular embodiment, in step 1_2, the structure of the 1st to the 4th neural network block is identical, by successively Volume Four lamination, the first R type neural network block and the first Type B neural network block composition of setting, the input terminal of Volume Four lamination The input terminal of neural network block where it, the input terminal of the first R type neural network block receive the output end of Volume Four lamination All characteristic patterns of output, the input terminal of the first Type B neural network block receive the output end output of the first R type neural network block All characteristic patterns, the output end of the first Type B neural network block are the output end of the neural network block where it；Wherein, the 1st mind Convolution kernel size through the Volume Four lamination in network block be 3 × 3, convolution kernel number be 128, zero padding parameter be " same ", step The convolution kernel number of the first R type neural network block and the first Type B neural network block in a length of 2, the 1st neural network block is It is 256, zero padding parameter that the convolution kernel size of Volume Four lamination in 128, the 2nd neural network block, which is 3 × 3, convolution kernel number, For " same ", step-length 2, the volume of the first R type neural network block and the first Type B neural network block in the 2nd neural network block Product core number is 256, and the convolution kernel size of the Volume Four lamination in the 3rd neural network block is that 3 × 3, convolution kernel number is 512, zero padding parameter is " same ", step-length 2, the first R type neural network block and the first Type B mind in the 3rd neural network block Convolution kernel number through network block is 512, the convolution kernel size of the Volume Four lamination in the 4th neural network block is 3 × 3, Convolution kernel number is 1024, zero padding parameter is " same ", step-length 2, the first R type neural network in the 4th neural network block The convolution kernel number of block and the first Type B neural network block is 1024.

In this particular embodiment, the structure of the 5th to the 8th neural network block is identical, by the 2nd R set gradually Type neural network block and the second Type B neural network block composition, the input terminal of the 2nd R type neural network block are the nerve net where it The input terminal of network block, the input terminal of the second Type B neural network block receive all of the output end output of the 2nd R type neural network block Characteristic pattern, the output end of the second Type B neural network block are the output end of the neural network block where it；Wherein, the 5th nerve net The convolution kernel number of the 2nd R type neural network block and the second Type B neural network block in network block is 512, the 6th neural network The convolution kernel number of the 2nd R type neural network block and the second Type B neural network block in block is 256, the 7th neural network block In the 2nd R type neural network block and the convolution kernel number of the second Type B neural network block be 128, in the 8th neural network block The 2nd R type neural network block and the convolution kernel number of the second Type B neural network block be 64.

In this particular embodiment, the structure phase of the first R type neural network block and the 2nd R type neural network block Together, the 5th convolutional layer, the 4th batch normalization layer, the 4th active coating, the first empty convolutional layer, the 5th batch by setting gradually Amount normalization layer, the 5th active coating, the 6th convolutional layer, the 6th batch normalization layer, the 6th active coating composition, the 5th convolutional layer Input terminal is the input terminal of the R type neural network block where it, and the input terminal of the 4th batch normalization layer receives the 5th convolutional layer Output end output all characteristic patterns, the input terminal of the 4th active coating receives the output end output of the 4th batch normalization layer All characteristic patterns, all characteristic patterns of the output end output of input terminal the 4th active coating of reception of the first empty convolutional layer, the 5th The input terminal of batch normalization layer receives all characteristic patterns of the output end output of the first empty convolutional layer, the 5th active coating it is defeated Enter all characteristic patterns that end receives the output end output of the 5th batch normalization layer, the input terminal of the 6th convolutional layer receives the 5th and swashs All characteristic patterns of the output end output of layer living, the output end that the input terminal of the 6th batch normalization layer receives the 6th convolutional layer are defeated All characteristic patterns out, the input terminal of the 6th active coating receive all features of the output end output of the 6th batch normalization layer Figure will enter into all characteristic patterns of all characteristic patterns of the input terminal of the 5th convolutional layer and the output end output of the 6th active coating Carry out after jump connection as place R type neural network block output end output all characteristic patterns；Wherein, in the 1st mind Through in the first R type neural network block in network block, the convolution kernel size of the 5th convolutional layer and the 6th convolutional layer is 3 × 3, volume It is " same ", step-length is 1 that product core number, which is 128, zero padding parameter, and the convolution kernel size of the first empty convolutional layer is 3 × 3, convolution kernel number is 128, zero padding parameter is " same ", step-length 1, empty convolution (dilation) parameter be 2；At the 2nd In the first R type neural network block in neural network block, the convolution kernel size of the 5th convolutional layer and the 6th convolutional layer is 3 × 3, It is " same ", step-length is 1 that convolution kernel number, which is 256, zero padding parameter, and the convolution kernel size of the first empty convolutional layer is 3 × 3, convolution kernel number is 256, zero padding parameter is " same ", step-length 1, empty deconvolution parameter be 2；In the 3rd neural network In the first R type neural network block in block, the convolution kernel size of the 5th convolutional layer and the 6th convolutional layer is 3 × 3, convolution kernel It is " same ", step-length is 1 that number, which is 512, zero padding parameter, and the convolution kernel size of the first empty convolutional layer is 3 × 3, convolution Core number is 512, zero padding parameter is " same ", step-length 1, empty deconvolution parameter be 2；In the 4th neural network block In one R type neural network block, the convolution kernel size of the 5th convolutional layer and the 6th convolutional layer is that 3 × 3, convolution kernel number is 1024, it is 1 that zero padding parameter, which is " same ", step-length, and the convolution kernel size of the first empty convolutional layer is 3 × 3, convolution kernel number It is for 1024, zero padding parameter " same ", step-length 1, empty deconvolution parameter be 2；The 2nd R type in the 5th neural network block In neural network block, the convolution kernel size of the 5th convolutional layer and the 6th convolutional layer is that 3 × 3, convolution kernel number is 512, mends Zero parameter is that " same ", step-length are 1, the convolution kernel size of the first empty convolutional layer be 3 × 3, convolution kernel number be 512, Zero padding parameter is " same ", step-length 1, empty deconvolution parameter be 2；The 2nd R type neural network in the 6th neural network block In block, the convolution kernel size of the 5th convolutional layer and the 6th convolutional layer is that 3 × 3, convolution kernel number is that 256, zero padding parameter is equal It is 1 for " same ", step-length, it is 256, zero padding parameter that the convolution kernel size of the first empty convolutional layer, which is 3 × 3, convolution kernel number, For " same ", step-length 1, empty deconvolution parameter be 2；In the 2nd R type neural network block in the 7th neural network block, the The convolution kernel size of five convolutional layers and the 6th convolutional layer is that 3 × 3, convolution kernel number is that 128, zero padding parameter is " same ", step-length are 1, and the convolution kernel size of the first empty convolutional layer is 3 × 3, convolution kernel number is 128, zero padding parameter is " Same ", step-length 1, empty deconvolution parameter are 2；In the 2nd R type neural network block in the 8th neural network block, volume five The convolution kernel size of lamination and the 6th convolutional layer is that 3 × 3, convolution kernel number is that 64, zero padding parameter is " same ", step-length Be 1, the convolution kernel size of the first empty convolutional layer be 3 × 3, convolution kernel number be 64, zero padding parameter is " same ", step-length be 1, empty deconvolution parameter is 2；The active mode of 4th active coating, the 5th active coating and the 6th active coating is " Relu ".

In this particular embodiment, the structure phase of the first Type B neural network block and the second Type B neural network block Together, the 7th convolutional layer, the 7th batch normalization layer, the 7th active coating, the second empty convolutional layer, the 8th batch by setting gradually Amount normalization layer, the 8th active coating, the 8th convolutional layer, the 9th batch normalization layer, the 9th active coating composition, the 7th convolutional layer Input terminal is the input terminal of the Type B neural network block where it, and the input terminal of the 7th batch normalization layer receives the 7th convolutional layer Output end output all characteristic patterns, the input terminal of the 7th active coating receives the output end output of the 7th batch normalization layer All characteristic patterns, all characteristic patterns of the output end output of input terminal the 7th active coating of reception of the second empty convolutional layer, the 8th The input terminal of batch normalization layer receives all characteristic patterns of the output end output of the second empty convolutional layer, the 8th active coating it is defeated Enter all characteristic patterns that end receives the output end output of the 8th batch normalization layer, the input terminal of the 8th convolutional layer receives the 8th and swashs All characteristic patterns of the output end output of layer living, the output end that the input terminal of the 9th batch normalization layer receives the 8th convolutional layer are defeated All characteristic patterns out, the input terminal of the 9th active coating receive all features of the output end output of the 9th batch normalization layer Figure, the output end of the 9th active coating are the output end of the Type B neural network block where it；Wherein, in the 1st neural network block The first Type B neural network block in, the convolution kernel size of the 7th convolutional layer and the 8th convolutional layer is that 3 × 3, convolution kernel number is equal It is " same ", step-length for 128, zero padding parameter is 1, the convolution kernel size of the second empty convolutional layer is 3 × 3, convolution kernel Number is that 128, zero padding parameter is " same ", step-length 1, empty convolution (dilation) parameter be 2；In the 2nd neural network block In the first Type B neural network block in, the convolution kernel size of the 7th convolutional layer and the 8th convolutional layer is 3 × 3, convolution kernel number It is 256, zero padding parameter be " same ", step-length is 1, the convolution kernel size of the second empty convolutional layer is 3 × 3, convolution kernel Number is 256, zero padding parameter is " same ", step-length 1, empty deconvolution parameter be 2；The first B in the 3rd neural network block In type neural network block, the convolution kernel size of the 7th convolutional layer and the 8th convolutional layer be 3 × 3, convolution kernel number be 512, Zero padding parameter is that " same ", step-length are 1, and the convolution kernel size of the second empty convolutional layer is that 3 × 3, convolution kernel number is 512, zero padding parameter is " same ", step-length 1, empty deconvolution parameter be 2；The first Type B nerve in the 4th neural network block In network block, the convolution kernel size of the 7th convolutional layer and the 8th convolutional layer is that 3 × 3, convolution kernel number is 1024, zero padding ginseng Number is that " same ", step-length are 1, and it is 1024, zero padding that the convolution kernel size of the second empty convolutional layer, which is 3 × 3, convolution kernel number, Parameter is " same ", step-length 1, empty deconvolution parameter be 2；The second Type B neural network block in the 5th neural network block In, the convolution kernel size of the 7th convolutional layer and the 8th convolutional layer is that 3 × 3, convolution kernel number is that 512, zero padding parameter is " same ", step-length are 1, and the convolution kernel size of the second empty convolutional layer is 3 × 3, convolution kernel number is 512, zero padding parameter is " Same ", step-length 1, empty deconvolution parameter are 2；In the second Type B neural network block in the 6th neural network block, volume seven The convolution kernel size of lamination and the 8th convolutional layer is that 3 × 3, convolution kernel number is that 256, zero padding parameter is " same ", walks Length is 1, and the convolution kernel size of the second empty convolutional layer is 3 × 3, convolution kernel number is 256, zero padding parameter is " same ", step A length of 1, empty deconvolution parameter is 2；In the second Type B neural network block in the 7th neural network block, the 7th convolutional layer and It is 128, zero padding parameter be " same ", step-length is 1 that the convolution kernel size of eight convolutional layers, which is 3 × 3, convolution kernel number, The convolution kernel size of second empty convolutional layer is 3 × 3, convolution kernel number is 128, zero padding parameter is " same ", step-length 1, sky Hole deconvolution parameter is 2；In the second Type B neural network block in the 8th neural network block, the 7th convolutional layer and the 8th convolutional layer Convolution kernel size be 3 × 3, convolution kernel number be 64, zero padding parameter be " same ", step-length be 1, the second cavity volume The convolution kernel size of lamination be 3 × 3, convolution kernel number be 64, zero padding parameter is " same ", step-length 1, empty deconvolution parameter be 2；The active mode of 7th active coating, the 8th active coating and the 9th active coating is " Relu ".

In this particular embodiment, in step 1_2, the structure of the 1st to the 7th warp block is identical, by successively setting Warp lamination, the tenth batch normalization layer, the tenth active coating composition set, the input terminal of warp lamination is the deconvolution where it The input terminal of block, all characteristic patterns of the output end output of the input terminal reception warp lamination of the tenth batch normalization layer, the tenth The input terminal of active coating receives all characteristic patterns of the output end output of the tenth batch normalization layer, the output end of the tenth active coating The output end of warp block where it；Wherein, the convolution kernel size of the warp lamination in the 1st warp block be 4 × 4, Convolution kernel number is 512, zero padding parameter is " same ", step-length 2, the volume of the 2nd and the warp lamination in the 3rd warp block Product core size be 4 × 4, convolution kernel number be 256, zero padding parameter be " same ", step-length 2, the 4th and the 5th warp block In warp lamination convolution kernel size be 4 × 4, convolution kernel number be 128, zero padding parameter be " same ", step-length 2, the 6th Convolution kernel size with the warp lamination in the 7th warp block is 4 × 4, convolution kernel number is 64, zero padding parameter is " same ", step-length 2, the active mode of the tenth active coating are " Relu ".

In this particular embodiment, in step 1_2,4 fused layers are Add fused layer.

In this particular embodiment, in step 1_2, output layer is returned by the 9th convolutional layer, the tenth batch set gradually One changes layer, the 11st active coating composition, and the input terminal of the 9th convolutional layer is the input terminal of output layer, and the tenth batch normalizes layer Input terminal receive the 9th convolutional layer output end output all characteristic patterns, the input terminal of the 11st active coating receives the 11st All characteristic patterns of the output end output of batch normalization layer, the output end of the 11st active coating are the output end of output layer；Its In, the convolution kernel size of the 9th convolutional layer be 1 × 1, convolution kernel number be 12, zero padding parameter be " same ", step-length 1, the tenth The active mode of one active coating is " Relu ".

In order to further verify the feasibility and validity of the method for the present invention, tested.

The empty convolutional neural networks of Complete Disability difference are built using the deep learning frame Pytorch0.4.1 based on python Framework.Road scene image is predicted come analysis and utilization the method for the present invention using road scene image database CamVid test set How is the segmentation effect of (taking 233 width road scene images).Here, objective ginseng is commonly used using 3 of assessment semantic segmentation method Amount be used as evaluation index, i.e., class accuracy (Class Acurracy), mean pixel accuracy rate (Mean Pixel Accuracy, MPA), the ratio (Mean Intersection over Union, MIoU) of segmented image and label image intersection and union comes The segmentation performance of evaluation and foreca semantic segmentation image.

Using the method for the present invention to every width road scene image in road scene image database CamVid test set into Row prediction, obtains the corresponding prediction semantic segmentation image of every width road scene image, reflects the semantic segmentation effect of the method for the present invention Class accuracy CA, mean pixel accuracy rate MPA, segmented image and the label image intersection of fruit and the ratio MIoU such as table 1 of union It is listed.The data listed by the table 1 are it is found that the segmentation result of the road scene image obtained by the method for the present invention is preferable, table The bright corresponding prediction semantic segmentation image of road scene image that obtained using the method for the present invention is feasible and effective.

Prediction result of the table 1 using the method for the present invention on test set

Fig. 2 a gives the 1st original road scene image of Same Scene；Fig. 2 b, which gives, utilizes the method for the present invention Road scene image original shown in Fig. 2 a is predicted, obtained prediction semantic segmentation image；Fig. 3 a gives same 2nd original road scene image of scene；Fig. 3 b gives using the method for the present invention to road original shown in Fig. 3 a Scene image predicted, obtained prediction semantic segmentation image；Fig. 4 a gives the 3rd original road field of Same Scene Scape image；Fig. 4 b, which gives, predicts road scene image original shown in Fig. 4 a using the method for the present invention, obtains Predict semantic segmentation image；Fig. 5 a gives the 4th original road scene image of Same Scene；Fig. 5 b, which gives, utilizes this Inventive method predicts road scene image original shown in Fig. 5 a, obtained prediction semantic segmentation image.Comparison diagram 2a and Fig. 2 b, comparison diagram 3a and Fig. 3 b, comparison diagram 4a and Fig. 4 b, comparison diagram 5a and Fig. 5 b, it can be seen that utilize the method for the present invention The segmentation precision of obtained prediction semantic segmentation image is higher.

Claims

1. a kind of road scene semantic segmentation method based on the empty convolutional neural networks of Complete Disability difference, it is characterised in that including training Two processes of stage and test phase；

The specific steps of the training stage process are as follows:

Step 1_1: choosing Q original road scene image and every original road scene image is true semantic point corresponding Image, and composing training collection are cut, the q original road scene image in training set is denoted as { I^q(i, j) }, by training set In with { I^q(i, j) } corresponding true semantic segmentation image is denoted asThen use one-hot coding technology by training set In the corresponding true semantic segmentation image procossing of every original road scene image at 12 width one-hot coding images, willThe set for 12 width one-hot coding image constructions being processed into is denoted asWherein, road scene image is RGB color Image, Q are positive integer, and Q >=200, q are positive integer, and 1≤q≤Q, 1≤i≤W, 1≤j≤H, W indicate { I^q(i, j) } width, H indicates { I^q(i, j) } height, I^q(i, j) indicates { I^q(i, j) } in coordinate position be (i, j) pixel pixel value,It indicatesMiddle coordinate position is the pixel value of the pixel of (i, j)；

Step 1_2: the empty convolutional neural networks of building Complete Disability difference: the empty convolutional neural networks of Complete Disability difference include input layer, hide Layer and output layer, hidden layer include 1 transition convolution block, 8 neural network blocks, 7 warp blocks, 4 fused layers；

For input layer, input terminal receives R channel components, G channel components and the channel B component of a width input picture, defeated Outlet exports the R channel components, G channel components and channel B component of input picture to hidden layer；Wherein, it is desirable that input layer it is defeated The width for entering to hold received input picture is W, is highly H；

For hidden layer, the input terminal of transition convolution block is the input terminal of hidden layer, receives the defeated of the output end output of input layer Enter the R channel components, G channel components and channel B component of image, it is W and height that the output end of transition convolution block, which exports 64 breadth degree, Degree is the characteristic pattern of H, and the set that this 64 width characteristic pattern is constituted is denoted as G₁；The input terminal of 1st neural network block receives G₁In All characteristic patterns, the output end of the 1st neural network block export 128 breadth degree and areAnd height isCharacteristic pattern, by this The set that 128 width characteristic patterns are constituted is denoted as S₁；The input terminal of 2nd neural network block receives S₁In all characteristic patterns, the 2nd The output end of neural network block exports 256 breadth degreeAnd height isCharacteristic pattern, by this 256 width characteristic pattern constitute Set is denoted as S₂；The input terminal of 3rd neural network block receives S₂In all characteristic patterns, the output end of the 3rd neural network block Exporting 512 breadth degree isAnd height isCharacteristic pattern, by this 512 width characteristic pattern constitute set be denoted as S₃；4th mind Input terminal through network block receives S₃In all characteristic patterns, the output end of the 4th neural network block exports 1024 breadth degree and isAnd height isCharacteristic pattern, by this 1024 width characteristic pattern constitute set be denoted as S₄；The input terminal of 1st warp block Receive S₄In all characteristic patterns, the output end of the 1st warp block exports 512 breadth degree and isAnd height isFeature The set that this 512 width characteristic pattern is constituted is denoted as F by figure₁；The input terminal of 5th neural network block receives S₃In all features Figure, the output end of the 5th neural network block export 512 breadth degree and areAnd height isCharacteristic pattern, by this 512 width feature The set that figure is constituted is denoted as S₅；The input terminal of 1st fused layer receives F₁In all characteristic patterns and S₅In all characteristic patterns, The output end 512 breadth degree of output of the 1st fused layer are after addition mixing operationAnd height isCharacteristic pattern, by this 512 The set that width characteristic pattern is constituted is denoted as A₁；The input terminal of 2nd warp block receives A₁In all characteristic patterns, the 2nd deconvolution The output end of block exports 256 breadth degreeAnd height isCharacteristic pattern, by this 256 width characteristic pattern constitute set be denoted as F₂；The input terminal of 6th neural network block receives S₂In all characteristic patterns, the output end output 256 of the 6th neural network block Breadth degree isAnd height isCharacteristic pattern, by this 256 width characteristic pattern constitute set be denoted as S₆；3rd warp block Input terminal receives S₃In all characteristic patterns, the output end of the 3rd warp block exports 256 breadth degree and isAnd height is Characteristic pattern, by this 256 width characteristic pattern constitute set be denoted as F₃；The input terminal of 2nd fused layer receives F₂In all features Figure, S₆In all characteristic patterns and F₃In all characteristic patterns, be added the output end output 256 of the 2nd fused layer after mixing operation Breadth degree isAnd height isCharacteristic pattern, by this 256 width characteristic pattern constitute set be denoted as A₂；4th warp block Input terminal receives A₂In all characteristic patterns, the output end of the 4th warp block exports 128 breadth degree and isAnd height is Characteristic pattern, by this 128 width characteristic pattern constitute set be denoted as F₄；The input terminal of 7th neural network block receives S₁In it is all Characteristic pattern, the output end of the 7th neural network block export 128 breadth degree and areAnd height isCharacteristic pattern, by this 128 width The set that characteristic pattern is constituted is denoted as S₇；The input terminal of 5th warp block receives S₂In all characteristic patterns, the 5th warp block Output end export 128 breadth degree beAnd height isCharacteristic pattern, by this 128 width characteristic pattern constitute set be denoted as F₅； The input terminal of 3rd fused layer receives F₄In all characteristic patterns, S₇In all characteristic patterns and F₅In all characteristic patterns, phase Add the output end of the 3rd fused layer after mixing operation to export 128 breadth degree to beAnd height isCharacteristic pattern, by this 128 width The set that characteristic pattern is constituted is denoted as A₃；The input terminal of 6th warp block receives A₃In all characteristic patterns, the 6th warp block Output end export 64 breadth degree be W and height be H characteristic pattern, by this 64 width characteristic pattern constitute set be denoted as F₆；8th The input terminal of neural network block receives G₁In all characteristic patterns, it is W that the output end of the 8th neural network block, which exports 64 breadth degree, And height is the characteristic pattern of H, and the set that this 64 width characteristic pattern is constituted is denoted as S₈；The input terminal of 7th warp block receives S₁In All characteristic patterns, the output end of the 7th warp block exports the characteristic pattern that 64 breadth degree are W and height is H, by this 64 width spy The set that sign figure is constituted is denoted as F₇；The input terminal of 4th fused layer receives F₆In all characteristic patterns, S₈In all characteristic patterns And F₇In all characteristic patterns, the output end for being added the 4th fused layer after mixing operation exports that 64 breadth degree are W and height is H Characteristic pattern, by this 64 width characteristic pattern constitute set be denoted as A₄, the output end of the 4th fused layer is the output end of hidden layer；

For output layer, input terminal receives A₄In all characteristic patterns, output end export 12 breadth degree be W and height be H's The set that this 12 width characteristic pattern is constituted is denoted as O by characteristic pattern₁；

Step 1_3: using the original road scene image of every in training set as input picture, it is input to Complete Disability difference cavity volume It is trained in product neural network, the corresponding 12 width semantic segmentation of every original road scene image obtained in training set is pre- Mapping, by { I^q(i, j) } set that constitutes of corresponding 12 width semantic segmentation prognostic chart is denoted as

Step 1_4: the corresponding 12 width semantic segmentation prognostic chart of every original road scene image calculated in training set is constituted Set with corresponding true semantic segmentation image procossing at 12 width one-hot coding image constructions set between loss letter Numerical value, willWithBetween loss function value be denoted asUsing Negative Log-liklihood function obtains；

Step 1_5: repeating step 1_3 and step 1_4 is V times total, obtains the empty convolutional neural networks training pattern of Complete Disability difference, And Q × V loss function value is obtained；Then the smallest loss function value of value is found out from Q × V loss function value；Then The corresponding weighted vector of the smallest loss function value and bias term will be worth and trained to should be used as the empty convolutional neural networks of Complete Disability difference The optimal bias term of best initial weights vector sum of model, correspondence are denoted as W^bestAnd b^best；Wherein, V > 1；

The specific steps of the test phase process are as follows:

Step 2_1: it enablesIndicate the road scene image to semantic segmentation；Wherein, 1≤i'≤W', 1≤j'≤H', W' is indicatedWidth, H' indicateHeight,It indicatesMiddle coordinate position be (i, J) pixel value of pixel；

Step 2_2: willR channel components, G channel components and channel B component be input to the empty convolution mind of Complete Disability difference Through in network training model, and utilize W^bestAnd b^bestIt is predicted, is obtainedCorresponding prediction semantic segmentation image, It is denoted asWherein,It indicatesMiddle coordinate position is the pixel of the pixel of (i', j') Value.

2. the road scene semantic segmentation method according to claim 1 based on the empty convolutional neural networks of Complete Disability difference, It is characterized in that in the step 1_2, transition convolution block is by the first convolutional layer, the first batch normalization layer, that set gradually One active coating, the second convolutional layer, second batch normalization layer, the second active coating, third convolutional layer, third batch normalization layer, Third active coating composition, the input terminal of the first convolutional layer are the input terminal of transition convolution block, and the first batch normalizes the input of layer End receives all characteristic patterns of the output end output of the first convolutional layer, and the input terminal of the first active coating receives the first batch and normalizes All characteristic patterns of the output end output of layer, the input terminal of the second convolutional layer receive all of the output end output of the first active coating Characteristic pattern, the input terminal of the second batch normalization layer receive all characteristic patterns of the output end output of the second convolutional layer, and second swashs The input terminal of layer living receives all characteristic patterns of the output end output of the second batch normalization layer, the input termination of third convolutional layer All characteristic patterns of the output end output of the second active coating are received, the input terminal of third batch normalization layer receives third convolutional layer All characteristic patterns of output end output, the input terminal of third active coating receive the institute of the output end output of third batch normalization layer There is characteristic pattern, the output end of third active coating is the output end of transition convolution block；Wherein, the first convolutional layer, the second convolutional layer and It is 64, zero padding parameter be " same ", step-length is 1 that the convolution kernel size of third convolutional layer, which is 3 × 3, convolution kernel number, The active mode of first active coating, the second active coating and third active coating is " Relu ".

3. the road scene semantic segmentation method according to claim 1 based on the empty convolutional neural networks of Complete Disability difference, In step 1_2 described in being characterized in that, the structure of the 1st to the 4th neural network block is identical, by the Volume Four set gradually Lamination, the first R type neural network block and the first Type B neural network block composition, the input terminal of Volume Four lamination are the mind where it Input terminal through network block, the input terminal of the first R type neural network block receive all spies of the output end output of Volume Four lamination Sign figure, the input terminal of the first Type B neural network block receive all characteristic patterns of the output end output of the first R type neural network block, The output end of first Type B neural network block is the output end of the neural network block where it；Wherein, in the 1st neural network block Volume Four lamination convolution kernel size be 3 × 3, convolution kernel number be 128, zero padding parameter be " same ", step-length 2, the 1st The convolution kernel number of the first R type neural network block and the first Type B neural network block in neural network block is 128, the 2nd mind Convolution kernel size through the Volume Four lamination in network block be 3 × 3, convolution kernel number be 256, zero padding parameter be " same ", step The convolution kernel number of the first R type neural network block and the first Type B neural network block in a length of 2, the 2nd neural network block is It is 512, zero padding parameter that the convolution kernel size of Volume Four lamination in 256, the 3rd neural network block, which is 3 × 3, convolution kernel number, For " same ", step-length 2, the volume of the first R type neural network block and the first Type B neural network block in the 3rd neural network block Product core number is 512, and the convolution kernel size of the Volume Four lamination in the 4th neural network block is that 3 × 3, convolution kernel number is 1024, zero padding parameter is " same ", step-length 2, the first R type neural network block and the first Type B mind in the 4th neural network block Convolution kernel number through network block is 1024；

The structure of 5th to the 8th neural network block is identical, by the 2nd R type neural network block set gradually and the second Type B Neural network block composition, the input terminal of the 2nd R type neural network block are the input terminal of the neural network block where it, the second Type B The input terminal of neural network block receives all characteristic patterns of the output end output of the 2nd R type neural network block, the second Type B nerve net The output end of network block is the output end of the neural network block where it；Wherein, the 2nd R type nerve in the 5th neural network block The convolution kernel number of network block and the second Type B neural network block is 512, the 2nd R type nerve net in the 6th neural network block The convolution kernel number of network block and the second Type B neural network block is 256, the 2nd R type neural network in the 7th neural network block The convolution kernel number of block and the second Type B neural network block is 128, the 2nd R type neural network block in the 8th neural network block Convolution kernel number with the second Type B neural network block is 64.

4. the road scene semantic segmentation method according to claim 3 based on the empty convolutional neural networks of Complete Disability difference, It is characterized in that the first R type neural network block is identical with the structure of the 2nd R type neural network block, by successively setting The 5th convolutional layer set, the 4th batch normalization layer, the 4th active coating, the first empty convolutional layer, the 5th batch normalization layer, the Five active coatings, the 6th convolutional layer, the 6th batch normalization layer, the 6th active coating composition, the input terminal of the 5th convolutional layer is its institute R type neural network block input terminal, the 4th batch normalization layer input terminal receive the 5th convolutional layer output end output All characteristic patterns, the input terminal of the 4th active coating receives all characteristic patterns of the output end output of the 4th batch normalization layer, The input terminal of first empty convolutional layer receives all characteristic patterns of the output end output of the 4th active coating, the 5th batch normalization layer Input terminal receive the first empty convolutional layer output end output all characteristic patterns, the input terminal of the 5th active coating receives the 5th All characteristic patterns of the output end output of batch normalization layer, the input terminal of the 6th convolutional layer receive the output end of the 5th active coating All characteristic patterns of output, the input terminal of the 6th batch normalization layer receive all features of the output end output of the 6th convolutional layer Figure, the input terminal of the 6th active coating receive all characteristic patterns of the output end output of the 6th batch normalization layer, will enter into the All characteristic patterns of the output end output of all characteristic patterns and the 6th active coating of the input terminal of five convolutional layers carry out jump connection All characteristic patterns of the output end output of R type neural network block as where afterwards；Wherein, in the 1st neural network block In first R type neural network block, the convolution kernel size of the 5th convolutional layer and the 6th convolutional layer is that 3 × 3, convolution kernel number is 128, it is 1 that zero padding parameter, which is " same ", step-length, and the convolution kernel size of the first empty convolutional layer is 3 × 3, convolution kernel number It is for 128, zero padding parameter " same ", step-length 1, empty deconvolution parameter be 2；The first R type mind in the 2nd neural network block Through in network block, it is 256, zero padding that the convolution kernel size of the 5th convolutional layer and the 6th convolutional layer, which is 3 × 3, convolution kernel number, Parameter is that " same ", step-length are 1, the convolution kernel size of the first empty convolutional layer be 3 × 3, convolution kernel number be 256, benefit Zero parameter is " same ", step-length 1, empty deconvolution parameter be 2；The first R type neural network block in the 3rd neural network block In, the convolution kernel size of the 5th convolutional layer and the 6th convolutional layer is that 3 × 3, convolution kernel number is that 512, zero padding parameter is " same ", step-length are 1, and the convolution kernel size of the first empty convolutional layer is 3 × 3, convolution kernel number is 512, zero padding parameter is " Same ", step-length 1, empty deconvolution parameter are 2；In the first R type neural network block in the 4th neural network block, volume five The convolution kernel size of lamination and the 6th convolutional layer is that 3 × 3, convolution kernel number is that 1024, zero padding parameter is " same ", walks Length is 1, and the convolution kernel size of the first empty convolutional layer is 3 × 3, convolution kernel number is 1024, zero padding parameter is " same ", step A length of 1, empty deconvolution parameter is 2；In the 2nd R type neural network block in the 5th neural network block, the 5th convolutional layer and It is 512, zero padding parameter be " same ", step-length is 1 that the convolution kernel size of six convolutional layers, which is 3 × 3, convolution kernel number, The convolution kernel size of first empty convolutional layer is 3 × 3, convolution kernel number is 512, zero padding parameter is " same ", step-length 1, sky Hole deconvolution parameter is 2；In the 2nd R type neural network block in the 6th neural network block, the 5th convolutional layer and the 6th convolutional layer Convolution kernel size be 3 × 3, convolution kernel number be 256, zero padding parameter be " same ", step-length be 1, the first cavity The convolution kernel size of convolutional layer is 3 × 3, convolution kernel number is 256, zero padding parameter is " same ", step-length 1, empty convolution ginseng Number is 2；In the 2nd R type neural network block in the 7th neural network block, the convolution kernel of the 5th convolutional layer and the 6th convolutional layer It is 128, zero padding parameter be " same ", step-length is 1 that size, which is 3 × 3, convolution kernel number, the first empty convolutional layer Convolution kernel size be 3 × 3, convolution kernel number be 128, zero padding parameter is " same ", step-length 1, empty deconvolution parameter be 2；In In the 2nd R type neural network block in 8th neural network block, the convolution kernel size of the 5th convolutional layer and the 6th convolutional layer is 3 × 3, it be " same ", step-length is 1 that convolution kernel number, which is 64, zero padding parameter, the convolution kernel size of the first empty convolutional layer For 3 × 3, convolution kernel number be 64, zero padding parameter is " same ", step-length 1, empty deconvolution parameter be 2；4th active coating, The active mode of five active coatings and the 6th active coating is " Relu ".

5. the road scene semantic segmentation method according to claim 3 or 4 based on the empty convolutional neural networks of Complete Disability difference, It is characterized in that the first Type B neural network block is identical with the structure of the second Type B neural network block, by successively The 7th convolutional layer that is arranged, the 7th batch normalization layer, the 7th active coating, the second empty convolutional layer, the 8th batch normalization layer, 8th active coating, the 8th convolutional layer, the 9th batch normalization layer, the 9th active coating composition, the input terminal of the 7th convolutional layer is it The input terminal of the Type B neural network block at place, the output end that the input terminal of the 7th batch normalization layer receives the 7th convolutional layer are defeated All characteristic patterns out, the input terminal of the 7th active coating receive all features of the output end output of the 7th batch normalization layer Figure, the input terminal of the second empty convolutional layer receive all characteristic patterns of the output end output of the 7th active coating, the 8th batch normalizing The input terminal for changing layer receives all characteristic patterns that the output end of the second empty convolutional layer exports, and the input terminal of the 8th active coating receives All characteristic patterns of the output end output of 8th batch normalization layer, the input terminal of the 8th convolutional layer receive the defeated of the 8th active coating All characteristic patterns of outlet output, the input terminal of the 9th batch normalization layer receive all of the output end output of the 8th convolutional layer Characteristic pattern, the input terminal of the 9th active coating receive all characteristic patterns of the output end output of the 9th batch normalization layer, and the 9th swashs The output end of layer living is the output end of the Type B neural network block where it；Wherein, the first Type B in the 1st neural network block In neural network block, the convolution kernel size of the 7th convolutional layer and the 8th convolutional layer is that 3 × 3, convolution kernel number is 128, mends Zero parameter is that " same ", step-length are 1, the convolution kernel size of the second empty convolutional layer be 3 × 3, convolution kernel number be 128, Zero padding parameter is " same ", step-length 1, empty deconvolution parameter be 2；The first Type B neural network in the 2nd neural network block In block, the convolution kernel size of the 7th convolutional layer and the 8th convolutional layer is that 3 × 3, convolution kernel number is that 256, zero padding parameter is equal It is 1 for " same ", step-length, it is 256, zero padding parameter that the convolution kernel size of the second empty convolutional layer, which is 3 × 3, convolution kernel number, For " same ", step-length 1, empty deconvolution parameter be 2；In the first Type B neural network block in the 3rd neural network block, the The convolution kernel size of seven convolutional layers and the 8th convolutional layer is that 3 × 3, convolution kernel number is that 512, zero padding parameter is " same ", step-length are 1, and the convolution kernel size of the second empty convolutional layer is 3 × 3, convolution kernel number is 512, zero padding parameter is " Same ", step-length 1, empty deconvolution parameter are 2；In the first Type B neural network block in the 4th neural network block, volume seven The convolution kernel size of lamination and the 8th convolutional layer is that 3 × 3, convolution kernel number is that 1024, zero padding parameter is " same ", walks Length is 1, and the convolution kernel size of the second empty convolutional layer is 3 × 3, convolution kernel number is 1024, zero padding parameter is " same ", step A length of 1, empty deconvolution parameter is 2；In the second Type B neural network block in the 5th neural network block, the 7th convolutional layer and It is 512, zero padding parameter be " same ", step-length is 1 that the convolution kernel size of eight convolutional layers, which is 3 × 3, convolution kernel number, The convolution kernel size of second empty convolutional layer is 3 × 3, convolution kernel number is 512, zero padding parameter is " same ", step-length 1, sky Hole deconvolution parameter is 2；In the second Type B neural network block in the 6th neural network block, the 7th convolutional layer and the 8th convolutional layer Convolution kernel size be 3 × 3, convolution kernel number be 256, zero padding parameter be " same ", step-length be 1, the second cavity The convolution kernel size of convolutional layer is 3 × 3, convolution kernel number is 256, zero padding parameter is " same ", step-length 1, empty convolution ginseng Number is 2；In the second Type B neural network block in the 7th neural network block, the convolution kernel of the 7th convolutional layer and the 8th convolutional layer It is 128, zero padding parameter be " same ", step-length is 1 that size, which is 3 × 3, convolution kernel number, the second empty convolutional layer Convolution kernel size be 3 × 3, convolution kernel number be 128, zero padding parameter is " same ", step-length 1, empty deconvolution parameter be 2；In In the second Type B neural network block in 8th neural network block, the convolution kernel size of the 7th convolutional layer and the 8th convolutional layer is 3 × 3, it be " same ", step-length is 1 that convolution kernel number, which is 64, zero padding parameter, the convolution kernel size of the second empty convolutional layer For 3 × 3, convolution kernel number be 64, zero padding parameter is " same ", step-length 1, empty deconvolution parameter be 2；7th active coating, The active mode of eight active coatings and the 9th active coating is " Relu ".

6. the road scene semantic segmentation method according to claim 1 based on the empty convolutional neural networks of Complete Disability difference, In step 1_2 described in being characterized in that, the structure of the 1st to the 7th warp block is identical, by the deconvolution set gradually Layer, the tenth batch normalization layer, the tenth active coating composition, the input terminal of warp lamination is the input of the warp block where it End, the input terminal of the tenth batch normalization layer receive all characteristic patterns of the output end output of warp lamination, the tenth active coating Input terminal receives all characteristic patterns of the output end output of the tenth batch normalization layer, and the output end of the tenth active coating is where it Warp block output end；Wherein, the convolution kernel size of the warp lamination in the 1st warp block is 4 × 4, convolution kernel Number is that 512, zero padding parameter is " same ", step-length 2, the convolution kernel size of the 2nd and the warp lamination in the 3rd warp block It is 256 for 4 × 4, convolution kernel number, zero padding parameter is " same ", step-length 2, the warp in the 4th and the 5th warp block The convolution kernel size of lamination be 4 × 4, convolution kernel number be 128, zero padding parameter be " same ", step-length 2, the 6th and the 7th The convolution kernel size of warp lamination in warp block be 4 × 4, convolution kernel number be 64, zero padding parameter is that " same ", step-length are 2, the active mode of the tenth active coating is " Relu ".

7. the road scene semantic segmentation method according to claim 1 based on the empty convolutional neural networks of Complete Disability difference, In step 1_2 described in being characterized in that, 4 fused layers are Add fused layer.

8. the road scene semantic segmentation method according to claim 1 based on the empty convolutional neural networks of Complete Disability difference, In step 1_2 described in being characterized in that, output layer normalizes layer, the tenth by the 9th convolutional layer, the tenth batch set gradually One active coating composition, the input terminal of the 9th convolutional layer are the input terminal of output layer, and the tenth batch normalizes the input termination of layer All characteristic patterns of the output end output of the 9th convolutional layer are received, the input terminal of the 11st active coating receives the normalization of the tenth batch All characteristic patterns of the output end output of layer, the output end of the 11st active coating are the output end of output layer；Wherein, the 9th convolution The convolution kernel size of layer be 1 × 1, convolution kernel number be 12, zero padding parameter is " same ", the 1, the 11st active coating of step-length it is sharp Mode living is " Relu ".