CN110059698A

CN110059698A - The semantic segmentation method and system based on the dense reconstruction in edge understood for streetscape

Info

Publication number: CN110059698A
Application number: CN201910359119.0A
Authority: CN
Inventors: 陈羽中; 林洋洋; 柯逍; 黄腾达
Original assignee: Fuzhou University
Current assignee: Fuzhou University
Priority date: 2019-04-30
Filing date: 2019-04-30
Publication date: 2019-07-26
Anticipated expiration: 2039-04-30
Also published as: CN110059698B

Abstract

The present invention relates to a kind of semantic segmentation method and system based on the dense reconstruction in edge understood for streetscape, this method comprises: pre-processing to training set input picture, making image standardization and obtaining the pretreatment image of identical size；Generic features are extracted with convolutional network, then obtain three-level context space pyramid fusion feature, extract coding characteristic using the cascade of this two parts as coding network；Half input size coding feature is obtained using coding characteristic, edge feature is obtained based on convolutional network, in conjunction with half input size coding feature, using the dense net of combination of edge feature as decoding network, reconstructed image resolution obtains decoding feature；The edge penalty of semantic segmentation loss and back-up surveillance is calculated, is that target is trained deep neural network to minimize the two weighted sum loss；Segmented image is treated using deep neural network model and carries out semantic segmentation, exports segmentation result.This method and system are conducive to improve the accuracy and robustness of image, semantic segmentation.

Description

The semantic segmentation method and system based on the dense reconstruction in edge understood for streetscape

Technical field

The present invention relates to technical field of computer vision, and in particular to a kind of for the dense heavy based on edge of streetscape understanding The semantic segmentation method and system built.

Background technique

Image, semantic segmentation be computer vision in artificial intelligence field an important branch, be in machine vision about The important ring of image understanding.Image, semantic segmentation is exactly that each of image pixel is accurately referred to its affiliated class Not, make the visual representation content of itself and image itself consistent, so image, semantic segmentation task is also referred to as the image of Pixel-level Classification task.

Since image, semantic segmentation has certain similitude with image classification, so miscellaneous image classification network is normal It is replaceable as the backbone network of image, semantic segmentation network, and between each other often after rejecting last full articulamentum.Sometimes Convolution can be finally used by removing the pond layer in backbone network or the modifications such as convolution with holes being used to obtain larger sized feature The convolutional layer that core is 1 obtains semantic segmentation result.With image classification in contrast, image, semantic segmentation difficulty it is higher, Because it not only needs global contextual information, it is also necessary to determine the class of each pixel in conjunction with fine local message Not, it so usually extracting more global feature using backbone network, is then carried out in conjunction with the shallow-layer feature in backbone network special Sign resolution reconstruction is restored to original image size.First become smaller the feature to become larger again based on characteristic size, so usually the former Referred to as coding network, the latter are known as decoding network.Simultaneously in an encoding process, in order to more preferably capture the spy of different size object Sign usually combines different feeling wild and dimensional information, such as spatial pyramid pond with holes technology, but the technology expands volume The interval of product core, has ignored interior pixels point, while also could not make up oneself expression in conjunction with more global contextual information The deficiency of ability.Meanwhile in existing semantic segmentation method, previous stage feature is usually only simply based in decoding process Restore resolution ratio, the information made up in cataloged procedure then in conjunction with the shallow-layer feature of correspondingly-sized is lost, both could not be effective The validity feature during resolution reconstruction is reused in ground, also could not pointedly solve object after image resolution ratio is rebuild The problem of obscurity boundary.

Summary of the invention

The purpose of the present invention is to provide a kind of semantic segmentation methods based on the dense reconstruction in edge understood for streetscape And system, this method and system are conducive to improve accuracy and robustness that image, semantic is divided.

To achieve the above object, the technical scheme is that it is a kind of for streetscape understand based on the dense reconstruction in edge Semantic segmentation method, comprising the following steps:

Step A: pre-processing training set input picture, and allowing image to subtract its image mean value first makes its standardization, Then the shearing for carrying out uniform sizes to image at random obtains the pretreatment image of identical size；

Step B: generic features F is extracted with convolutional network_backbone, then it is based on generic features F_backboneObtain three-level or more Literary spatial pyramid fusion feature F_tspp, for capturing multiple dimensioned contextual information, then cascaded using this two parts as coding Network extracts coding characteristic F_encoder；

Step C: expand coding characteristic F_encoderIt is special to obtain half input size coding to the half of input image size for size Levy F_us, middle layer feature is chosen from the convolutional networkCalculate edge featureIt is special in conjunction with half input size coding Levy F_us, with combination of edge featureDense net be decoding network, carry out image resolution ratio reconstruction, calculate decoding feature F_decoder；

Step D: with decoding feature F_decoderAnd edge featureSemantic segmentation probability graph and marginal probability are obtained respectively Figure, in training set semantic image mark calculate edge image mark, using semantic segmentation probability graph and marginal probability figure with And corresponding mark calculates separately to obtain the edge penalty of semantic segmentation loss and back-up surveillance, to minimize the two weighting Entire depth neural network is trained with loss is target；

Step E: segmented image is treated using trained deep neural network model and carries out semantic segmentation, output segmentation knot Fruit.

Further, in the step B, generic features F is extracted with convolutional network_backbone, then it is based on generic features F_backboneObtain three-level context space pyramid fusion feature F_tspp, then cascaded using this two parts as coding network and extracted Coding characteristic F_encoder, comprising the following steps:

Step B1: generic features F is extracted to pretreatment image using convolutional network_backbone；

Step B2: using 1 × 1 convolution to feature F_backboneFeature Dimension Reduction is carried out, feature is obtained

Step B3: to F_backboneWhole image carries out average pond, then reuses arest neighbors demosaicing to full size, Image level feature F is obtained using 1 × 1 convolution_image；

Step B4: being r with porosity_asConvolution kernel to F_backboneIt carries out convolution with holes and obtains featureThen splice three Grade contextual featureF_imageWithFusion Features are carried out using 1 × 1 convolution afterwards, obtaining porosity is r_asThree-level context Fusion featureThe same distribution for keeping input in convolution process using batch standardization, uses line rectification function as activation Function；Wherein, convolutional calculation formula with holes is as follows:

Wherein,It indicates in output coordinate m_asThe use porosity of position is r_asConvolution with holes processing result, x_as[m_as+r_as·k_as] indicate input x_asIn coordinate m_asOn position in porosity be r_asAnd convolution kernel coordinate with holes is k_asWhen institute it is right The input reference pixel answered, w_as[k_as] indicate in convolution kernel with holes as k_asThe weight of position；

Step B5: repeating previous step using different porositys, until obtaining n_tsppA feature, then by this n_tsppA spy Sign withAnd F_imageSpliced, obtains three-level context space pyramid fusion feature F_tspp；

Step B6: using 1 × 1 convolution to feature F_tsppDimensionality reduction is carried out, is then carried out again with the dropout in deep learning Regularization obtains coding characteristic F to the end_encoder。

Further, in the step C, expand coding characteristic F_encoderSize is obtained to the half of input image size Half input size coding feature F_us, middle layer feature is chosen from the convolutional networkCalculate edge featureIn conjunction with Half input size coding feature F_us, with combination of edge featureDense net be decoding network, carry out image resolution ratio reconstruction, Calculate decoding feature F_decoder, comprising the following steps:

Step C1: the ratio that definition initially enters picture size and characteristic size is the output stride of this feature, using most Neighbour's interpolation processing coding characteristic F_encoder, obtain the characteristic pattern F that output stride is 2_us；

Step C2: the middle layer feature that output stride is os is chosen from the convolutional network for extracting generic featuresFirst Dimensionality reduction is carried out using 1 × 1 convolution, is then expanded using bilinear interpolationEdge feature is obtained again

Step C3: splicing feature F_usWithAfter 1 × 1 convolution dimensionality reduction, reuses 3 × 3 convolution extraction feature and obtain Decode feature F_decoder；

Step C4: choosing than in step C2 smaller output stride os, if all the processing of output stride is completed, decodes Feature extraction is completed, and F is otherwise spliced_usAnd F_decoderAs new F_us, and repeat step C2 to C3.

Further, in the step D, with decoding feature F_decoderAnd edge featureIt is general that semantic segmentation is obtained respectively Rate figure and marginal probability figure calculate edge image mark with the semantic image mark in training set, utilize semantic segmentation probability graph It calculates separately to obtain the edge penalty of semantic segmentation loss and back-up surveillance with marginal probability figure and corresponding mark, with Minimizing the two weighted sum loss is target to be trained to entire depth neural network, comprising the following steps:

Step D1: with bilinear interpolation by feature F_decoderWith all featuresZoom to the size with input picture It is identical, and semantic segmentation probability and marginal probability are obtained as 1 × 1 convolutional calculation of activation primitive by using softmax, Softmax calculation formula is as follows:

Wherein, σ_cFor the probability of c classification, e is natural Exponents, γ_cAnd γ_kIt is special to respectively indicate the un-activation that classification is c and k Value indicative, C are total classification number；

Step D2: the semantic segmentation mark of training set is subjected to one-hot coding, then calculates and obtains edge mark, edge mark It is as follows to infuse calculation formula:

Wherein, y_edge(i, j, c) andFor the edge mark and semantic tagger of the position coordinate (i, j) c class, (i_u, j_u) indicate (i, j) coordinate under 8 neighborhood U₈In one group of coordinate, sgn () be sign function；

Step D3: using the corresponding mark of the probability graph at both semantic segmentation and edge, the friendship of Pixel-level is calculated separately Entropy is pitched, corresponding semantic segmentation loss L is obtained_sWith the edge penalty of back-up surveillanceThen it calculates weighted sum and loses L:

Wherein,For edge featureCorresponding penalty values, α_osForThe shared weight in final loss；

Finally by stochastic gradient descent optimization method, model parameter is updated using backpropagation iteration, is added with minimizing L is weighed and lost to train entire depth neural network, obtains deep neural network model to the end.

The present invention also provides a kind of semantic segmentation systems based on the dense reconstruction in edge understood for streetscape, comprising:

Preprocessing module for pre-processing training set input picture, including allows image to subtract its image mean value to make It is standardized, and the shearing for carrying out uniform sizes to image at random obtains the pretreatment image of identical size；

Coding characteristic extraction module, for extracting generic features F with convolutional network_backbone, then it is based on generic features F_backboneObtain three-level context space pyramid fusion feature F_tspp, for capturing multiple dimensioned contextual information, then with this Two parts cascade extracts coding characteristic F as coding network_encoder；

Characteristic extracting module is decoded, for expanding coding characteristic F_encoderSize is obtained to the half of input image size Half input size coding feature F_us, middle layer feature is chosen from the convolutional networkCalculate edge featureIn conjunction with Half input size coding feature F_us, with combination of edge featureDense net be decoding network, carry out image resolution ratio weight It builds, extracts decoding feature F_decoder；

Neural metwork training module, for using decoding feature F_decoderAnd edge featureIt is general that semantic segmentation is obtained respectively Rate figure and marginal probability figure calculate edge image mark with the semantic image mark in training set, utilize semantic segmentation probability graph It calculates separately to obtain the edge penalty of semantic segmentation loss and back-up surveillance with marginal probability figure and corresponding mark, with Minimizing the two weighted sum loss is target to be trained to entire depth neural network, obtains deep neural network model； And

Semantic segmentation module carries out semantic point for treating segmented image using trained deep neural network model It cuts, exports segmentation result.

Compared to the prior art, the beneficial effects of the present invention are: more rulers after backbone network first in coding network Three-level context space pyramid fusion feature has been used in degree feature capture, has pointedly utilized internal feature and global characteristics Optimize the feature of original different feeling open country, to enrich coding characteristic ability to express.Then it is combined in decoding network Interbed feature derives from and is aided with the edge feature of supervision, pointedly inclined to being easy to produce in feature resolution reconstruction process The marginal portion of difference is adjusted, and optimizes the semantic segmentation between different objects as a result, carrying out feature with the mode of dense net simultaneously Resolution reconstruction preferably reconstruction features to be reused.Compared with the conventional method, the present invention can obtain in encoded more Powerful contextual information ability to express, the obscurity boundary that jointing edge supervision can more effectively be corrected between object in decoding process are asked Topic, while feature is more effectively utilized using the reuse performance of dense web frame, make network be easier to train, thus most After can obtain more accurate semantic segmentation result.

Detailed description of the invention

Fig. 1 is the method implementation flow chart of the embodiment of the present invention.

Fig. 2 is the system structure diagram of the embodiment of the present invention.

Specific embodiment

Below in conjunction with the accompanying drawings and specific embodiment, the present invention is described in further details.

The present invention provides a kind of semantic segmentation methods based on the dense reconstruction in edge understood for streetscape, such as Fig. 1 institute Show, comprising the following steps:

Step A: pre-processing training set input picture, and allowing image to subtract its image mean value first makes its standardization, Then the shearing for carrying out uniform sizes to image at random obtains the pretreatment image of identical size.

Step B: generic features F is extracted with general convolutional network_backbone, then it is based on generic features F_backboneObtain three Grade context space pyramid fusion feature F_tspp, for capturing multiple dimensioned contextual information, then with described in step B this two Part cascade extracts coding characteristic F as coding network_encoder；Specifically includes the following steps:

Step B1: using general convolutional network, (the present embodiment is using the xception provided in deeplabv3+ network Network) generic features F is extracted to pretreatment image_backbone；

Step B4: being r with porosity_asConvolution kernel to F_backboneIt carries out convolution with holes and obtains featureThen splice three Grade contextual featureF_imageWithFusion Features are carried out using 1 × 1 convolution afterwards, obtaining porosity is r_asThree-level context Fusion featureThe same distribution for keeping input in convolution process using batch standardization, uses line rectification function as activation letter Number；Wherein, convolutional calculation formula with holes is as follows:

Step B5: repeating previous step using different porositys, until obtaining n_tspp(the present embodiment is 3 spies to a feature Sign, porosity is respectively 6,12,18), then by this n_tsppA feature withAnd F_imageSpliced, it is empty to obtain three-level context Between pyramid fusion feature F_tspp；

Step C: expand coding characteristic F_encoderIt is special to obtain half input size coding to the half of input image size for size Levy F_us, middle layer feature is chosen from the convolutional networkCalculate edge featureIt is special in conjunction with half input size coding Levy F_us, with combination of edge featureDense net be decoding network, carry out image resolution ratio reconstruction, calculate decoding feature F_decoder；Specifically includes the following steps:

Step D: with decoding feature F_decoderAnd edge featureSemantic segmentation probability graph and marginal probability are obtained respectively Figure, in training set semantic image mark calculate edge image mark, using semantic segmentation probability graph and marginal probability figure with And corresponding mark calculates separately to obtain the edge penalty of semantic segmentation loss and back-up surveillance, to minimize the two weighting Entire depth neural network is trained with loss is target；Specifically includes the following steps:

Wherein,For edge featureCorresponding penalty values, α_osForShared weight, α in final loss_os MeetAnd each α_osIt is equal；

The present invention also provides the semantic segmentation systems understood for streetscape for realizing the above method, as shown in Fig. 2, Include:

Characteristic extracting module is decoded, for expanding coding characteristic F_encoderSize is obtained to the half of input image size Half input size coding feature F_us, middle layer feature is chosen from the convolutional networkCalculate edge featureIn conjunction with Half input size coding feature F_us, with combination of edge featureDense net be decoding network, carry out image resolution ratio reconstruction, Extract decoding feature F_decoder；

The above are preferred embodiments of the present invention, all any changes made according to the technical solution of the present invention, and generated function is made When with range without departing from technical solution of the present invention, all belong to the scope of protection of the present invention.

Claims

1. a kind of semantic segmentation method based on the dense reconstruction in edge understood for streetscape, which is characterized in that including following step It is rapid:

Step B: generic features F is extracted with convolutional network_backbone, then it is based on generic features F_backboneObtain three-level context space Pyramid fusion feature F_tspp, for capturing multiple dimensioned contextual information, then cascaded using this two parts as coding network and mentioned Take coding characteristic F_encoder；

Step C: expand coding characteristic F_encoderSize obtains half input size coding feature to the half of input image size F_us, middle layer feature is chosen from the convolutional networkCalculate edge featureIn conjunction with half input size coding feature F_us, with combination of edge featureDense net be decoding network, carry out image resolution ratio reconstruction, calculate decoding feature F_decoder；

Step D: with decoding feature F_decoderAnd edge featureSemantic segmentation probability graph and marginal probability figure are obtained respectively, with Semantic image mark in training set calculates edge image mark, using semantic segmentation probability graph and marginal probability figure and respectively Corresponding mark calculates separately to obtain the edge penalty of semantic segmentation loss and back-up surveillance, to minimize the two weighted sum loss Entire depth neural network is trained for target；

Step E: segmented image is treated using trained deep neural network model and carries out semantic segmentation, exports segmentation result.

2. the semantic segmentation method based on the dense reconstruction in edge according to claim 1 understood for streetscape, feature It is, in the step B, extracts generic features F with convolutional network_backbone, then it is based on generic features F_backboneIt obtains in three-level Hereafter spatial pyramid fusion feature F_tspp, then cascaded using this two parts as coding network and extract coding characteristic F_encoder, packet Include following steps:

Step B3: to F_backboneWhole image carries out average pond, then reuses arest neighbors demosaicing to full size, then pass through It crosses 1 × 1 convolution and obtains image level feature F_image；

Step B4: being r with porosity_asConvolution kernel to F_backboneIt carries out convolution with holes and obtains featureThen splice in three-level Following traitsF_imageWithFusion Features are carried out using 1 × 1 convolution afterwards, obtaining porosity is r_asThree-level context fusion FeatureThe same distribution for keeping input in convolution process using batch standardization, uses line rectification function as activation primitive； Wherein, convolutional calculation formula with holes is as follows:

Wherein,It indicates in output coordinate m_asThe use porosity of position is r_asConvolution with holes processing result, x_as[m_as +r_as·k_as] indicate input x_asIn coordinate m_asOn position in porosity be r_asAnd convolution kernel coordinate with holes is k_asWhen it is corresponding defeated Enter reference pixel, w_as[k_as] indicate in convolution kernel with holes as k_asThe weight of position；

Step B5: repeating previous step using different porositys, until obtaining n_tsppA feature, then by this n_tsppA feature withAnd F_imageSpliced, obtains three-level context space pyramid fusion feature F_tspp；

Step B6: using 1 × 1 convolution to feature F_tsppDimensionality reduction is carried out, then carries out canonical with the dropout in deep learning again Change, obtains coding characteristic F to the end_encoder。

3. the semantic segmentation method based on the dense reconstruction in edge according to claim 2 understood for streetscape, feature It is, in the step C, expands coding characteristic F_encoderSize obtains half input size and compiles to the half of input image size Code feature F_us, middle layer feature is chosen from the convolutional networkCalculate edge featureIt is compiled in conjunction with half input size Code feature F_us, with combination of edge featureDense net be decoding network, carry out image resolution ratio reconstruction, calculate decoding feature F_decoder, comprising the following steps:

Step C1: the ratio that definition initially enters picture size and characteristic size is the output stride of this feature, uses arest neighbors Interpolation processing coding characteristic F_encoder, obtain the characteristic pattern F that output stride is 2_us；

Step C2: the middle layer feature that output stride is os is chosen from the convolutional network for extracting generic featuresFirst use 1 × 1 convolution carries out dimensionality reduction, is then expanded using bilinear interpolationEdge feature is obtained again

Step C3: splicing feature F_usWithAfter 1 × 1 convolution dimensionality reduction, reuses 3 × 3 convolution extraction feature and decoded Feature F_decoder；

Step C4: choosing than in step C2 smaller output stride os, if all the processing of output stride is completed, decodes feature It extracts and completes, otherwise splice F_usAnd F_decoderAs new F_us, and repeat step C2 to C3.

4. the semantic segmentation method based on the dense reconstruction in edge according to claim 3 understood for streetscape, feature It is, in the step D, with decoding feature F_decoderAnd edge featureIt obtains semantic segmentation probability graph respectively and edge is general Rate figure calculates edge image mark with the semantic image mark in training set, utilizes semantic segmentation probability graph and marginal probability figure And corresponding mark calculates separately to obtain the edge penalty of semantic segmentation loss and back-up surveillance, both to minimize plus Power and loss are target to be trained to entire depth neural network, comprising the following steps:

Step D1: with bilinear interpolation by feature F_decoderWith all featuresZoom to it is identical as the size of input picture, And semantic segmentation probability and marginal probability, softmax are obtained as 1 × 1 convolutional calculation of activation primitive by using softmax Calculation formula is as follows:

Wherein, σ_cFor the probability of c classification, e is natural Exponents, γ_cAnd γ_kThe un-activation characteristic value that classification is c and k is respectively indicated, C is total classification number；

Step D2: carrying out one-hot coding for the semantic segmentation mark of training set, then calculate and obtain edge mark, edge mark meter It is as follows to calculate formula:

Wherein, y_edge(i, j, c) andFor the edge mark and semantic tagger of the position coordinate (i, j) c class, (i_u,j_u) indicate 8 neighborhood U under (i, j) coordinate₈In one group of coordinate, sgn () be sign function；

Step D3: using the corresponding mark of the probability graph at both semantic segmentation and edge, calculating separately the cross entropy of Pixel-level, Obtain corresponding semantic segmentation loss L_sWith the edge penalty of back-up surveillanceThen it calculates weighted sum and loses L:

Finally by stochastic gradient descent optimization method, model parameter is updated using backpropagation iteration, to minimize weighted sum L is lost to train entire depth neural network, obtains deep neural network model to the end.

5. a kind of semantic segmentation system based on the dense reconstruction in edge understood for streetscape characterized by comprising

Preprocessing module for pre-processing training set input picture, including allows image to subtract its image mean value to make its mark Standardization, and the pretreatment image of the identical size of shearing acquisition of uniform sizes is carried out to image at random；

Coding characteristic extraction module, for extracting generic features F with convolutional network_backbone, then it is based on generic features F_backboneIt obtains Take three-level context space pyramid fusion feature F_tspp, for capturing multiple dimensioned contextual information, then with this two parts grade Connection extracts coding characteristic F as coding network_encoder；

Characteristic extracting module is decoded, for expanding coding characteristic F_encoderSize obtains half and inputs to the half of input image size Size coding feature F_us, middle layer feature is chosen from the convolutional networkCalculate edge featureIt is inputted in conjunction with half Size coding feature F_us, with combination of edge featureDense net be decoding network, carry out image resolution ratio reconstruction, extract solution Code feature F_decoder；

Neural metwork training module, for using decoding feature F_decoderAnd edge featureSemantic segmentation probability graph is obtained respectively With marginal probability figure, edge image mark is calculated with the semantic image mark in training set, utilizes semantic segmentation probability graph and side Edge probability graph and corresponding mark calculate separately to obtain the edge penalty of semantic segmentation loss and back-up surveillance, with minimum Changing the two weighted sum loss is target to be trained to entire depth neural network, obtains deep neural network model；And

Semantic segmentation module carries out semantic segmentation for treating segmented image using trained deep neural network model, defeated Segmentation result out.