CN110059698B - Semantic segmentation method and system based on edge dense reconstruction for street view understanding - Google Patents

Semantic segmentation method and system based on edge dense reconstruction for street view understanding Download PDF

Info

Publication number
CN110059698B
CN110059698B CN201910359119.0A CN201910359119A CN110059698B CN 110059698 B CN110059698 B CN 110059698B CN 201910359119 A CN201910359119 A CN 201910359119A CN 110059698 B CN110059698 B CN 110059698B
Authority
CN
China
Prior art keywords
edge
features
feature
semantic segmentation
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910359119.0A
Other languages
Chinese (zh)
Other versions
CN110059698A (en
Inventor
陈羽中
林洋洋
柯逍
黄腾达
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fuzhou University
Original Assignee
Fuzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fuzhou University filed Critical Fuzhou University
Priority to CN201910359119.0A priority Critical patent/CN110059698B/en
Publication of CN110059698A publication Critical patent/CN110059698A/en
Application granted granted Critical
Publication of CN110059698B publication Critical patent/CN110059698B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a semantic segmentation method and a semantic segmentation system based on edge dense reconstruction for street view understanding, wherein the method comprises the following steps: preprocessing input images of the training set to standardize the images and obtain preprocessed images with the same size; extracting general features by using a convolutional network, then acquiring three-level context space pyramid fusion features, and extracting coding features by using the two parts which are cascaded as a coding network; acquiring half-input-size coding features by using the coding features, acquiring edge features based on a convolutional network, reconstructing image resolution by taking a dense network fused with the edge features as a decoding network by combining the half-input-size coding features, and acquiring the decoding features; calculating semantic segmentation loss and edge loss of auxiliary supervision, and training the deep neural network by taking minimized weighting and loss of the semantic segmentation loss and the edge loss of the auxiliary supervision as targets; and performing semantic segmentation on the image to be segmented by using the deep neural network model, and outputting a segmentation result. The method and the system are beneficial to improving the accuracy and the robustness of the image semantic segmentation.

Description

Semantic segmentation method and system based on edge dense reconstruction and used for street view understanding
Technical Field
The invention relates to the technical field of computer vision, in particular to a semantic segmentation method and a semantic segmentation system based on edge dense reconstruction for street view understanding.
Background
Image semantic segmentation is an important branch of computer vision in the field of artificial intelligence, and is an important ring for understanding images in machine vision. The image semantic segmentation is to accurately classify each pixel in the image into the category to which the pixel belongs, so that the pixel is consistent with the visual representation content of the image, and therefore, the image semantic segmentation task is also called as a pixel-level image classification task.
Because the image semantic segmentation and the image classification have certain similarity, various image classification networks are often used as backbone networks of the image semantic segmentation networks after the final full connection layer is removed, and can be replaced mutually. Sometimes, larger-sized features are obtained by removing the pooling layer in the backbone network or modifying with a punctured convolution and the like, and finally, semantic segmentation results are obtained by using a convolution layer with a convolution kernel of 1. Compared with image classification, the difficulty of image semantic segmentation is higher, because the classification needs to be combined with fine local information to determine the category of each pixel point, the backbone network is often used to extract more global features, and then the shallow features in the backbone network are combined to reconstruct the feature resolution to restore the original image size. Based on the feature size getting smaller and then larger, the former is often called an encoding network and the latter is called a decoding network. Meanwhile, in the encoding process, in order to better capture the characteristics of objects with different sizes, different receptive fields and scale information are often combined, for example, a pyramid pooling technology with a hole space is adopted, but the technology expands the interval of convolution kernels, ignores internal pixel points, and cannot combine more global context information to make up the deficiency of self expression capability. Meanwhile, in the existing semantic segmentation method, the resolution is often restored simply based on the previous-level features in the decoding process, and then the shallow features of the corresponding size are combined to make up for the information loss in the encoding process, so that the effective features in the resolution reconstruction process cannot be effectively reused, and the problem of fuzzy object boundaries after the image resolution reconstruction cannot be solved in a targeted manner.
Disclosure of Invention
The invention aims to provide a semantic segmentation method and a semantic segmentation system based on edge dense reconstruction for street view understanding, and the method and the system are favorable for improving the accuracy and the robustness of image semantic segmentation.
In order to achieve the purpose, the technical scheme of the invention is as follows: a semantic segmentation method based on edge dense reconstruction for street view understanding comprises the following steps:
step A: preprocessing an input image of a training set, firstly, subtracting an image mean value of the input image from the image to standardize the input image, and then randomly shearing the image with uniform size to obtain a preprocessed image with the same size;
and B: extracting general features F with convolutional networks backbone Based on the general feature F backbone Obtaining three-level context space pyramid fusion characteristic F tspp Used for capturing multi-scale context information and then extracting coding features F by using the two parts which are cascaded as a coding network encoder
And C: extended coding feature F encoder Obtaining half input size coding feature F from size to half input image size us Selecting intermediate layer features from the convolutional network
Figure BDA0002046339370000021
Computing edge features
Figure BDA0002046339370000022
Combining half-input size coding features F us To fuse edge features
Figure BDA0002046339370000023
The dense network is a decoding network, image resolution reconstruction is carried out, and decoding characteristics F are calculated decoder
Step D: using decoding features F decoder And edge features
Figure BDA0002046339370000024
Respectively acquiring a semantic segmentation probability graph and an edge probability graph, calculating edge image labels by using semantic image labels in a training set, respectively calculating by using the semantic segmentation probability graph and the edge probability graph and respective corresponding labels to obtain semantic segmentation loss and edge loss for auxiliary supervision, and training the whole deep neural network by using minimum weighting and loss of the semantic segmentation probability graph and the edge probability graph as targets;
step E: and performing semantic segmentation on the image to be segmented by using the trained deep neural network model, and outputting a segmentation result.
Further, in the step B, a convolution network is used for extracting the general features F backbone Based on the general feature F backbone Obtaining three-level context space pyramid fusion characteristic F tspp Then, the two parts are cascaded to be used as a coding network to extract the coding characteristics F encoder The method comprises the following steps:
step B1: extraction of generic features F from preprocessed images using convolutional networks backbone
And step B2: using 1 × 1 convolution to feature F backbone Performing feature dimension reduction to obtain features
Figure BDA0002046339370000025
And step B3: to F is aligned with backbone The whole image is subjected to average pooling, then the original size is restored by using nearest neighbor interpolation, and image-level features F are obtained by 1 × 1 convolution image
And step B4: with a porosity of r as By convolution kernel of F backbone Performing a convolution with a hole to obtain a feature
Figure BDA0002046339370000026
Then concatenate the three level context features
Figure BDA0002046339370000027
F image And
Figure BDA0002046339370000028
then using 1 × 1 convolution to perform feature fusion to obtain porosity r as Three level context fusion feature of
Figure BDA00020463393700000313
In the convolution process, batch standardization is used for keeping the same distribution of input, and a linear rectification function is used as an activation function; the calculation formula of the convolution with the hole is as follows:
Figure BDA0002046339370000032
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA00020463393700000314
is shown at output coordinate m as Porosity of site used is r as Is processed by the punctured convolution of (1) as [m as +r as ·k as ]Representing an input x as At coordinate m as At a position of porosity of r as And the coordinates of the convolution kernel with holes are k as Input reference pixel, w, corresponding to as [k as ]Representing the punctured convolution kernel as k as A weight of the location;
and step B5: repeating the above steps using different porosities until n is obtained tspp A feature, then n is tspp A feature of
Figure BDA0002046339370000034
And F image Splicing to obtain a three-level context space pyramid fusion feature F tspp
Step B6: using 1 × 1 convolution to feature F tspp Dimension reduction is carried out, then, the discriminant in deep learning is used for regularization, and the final coding feature F is obtained encoder
Further, in the step C, the coding feature F is enlarged encoder Obtaining half input size coding feature F from size to half input image size us Selecting intermediate layer features from the convolutional network
Figure BDA0002046339370000035
Computing edge features
Figure BDA0002046339370000036
Combining half-input size coding features F us To fuse edge features
Figure BDA0002046339370000037
The dense network is a decoding network, image resolution reconstruction is carried out, and decoding characteristics F are calculated decoder The method comprises the following steps:
step C1: defining the ratio of the size of the initial input image to the size of the feature as the output step of the feature, and encoding the feature F using nearest neighbor interpolation encoder Obtaining a characteristic diagram F with the output step of 2 us
And step C2: selecting middle layer characteristics with output stride of os from convolution network for extracting general characteristics
Figure BDA0002046339370000038
Dimension reduction using 1 × 1 convolution and then expansion using bilinear interpolation
Figure BDA0002046339370000039
Multiplying edge features
Figure BDA00020463393700000310
And C3: splicing feature F us And
Figure BDA00020463393700000311
after dimension reduction is carried out by using 1 × 1 convolution, the decoding feature F is obtained by using 3 × 3 convolution to extract features decoder
And C4: selecting an output step os smaller than that in the step C2, finishing the extraction of decoding characteristics if all the output steps are processed, or splicing F us And F decoder As new F us And repeating steps C2 to C3.
Further, in the step D, the decoding characteristic F is used decoder And edge features
Figure BDA00020463393700000312
Respectively acquiring a semantic segmentation probability graph and an edge probability graph, calculating edge image labels by using semantic image labels in a training set, and respectively calculating to obtain semantic segmentation loss and edges for assisting supervision by using the semantic segmentation probability graph, the edge probability graph and respective corresponding labelsLoss, training the whole deep neural network with the aim of minimizing the weighted sum of the loss and the loss, comprising the following steps:
step D1: using bilinear interpolation to convert feature F decoder And all the features
Figure BDA0002046339370000041
Scaled to the same size as the input image and derived the semantic segmentation probability and edge probability by a 1 × 1 convolution calculation using softmax as the activation function, the softmax calculation formula is as follows:
Figure BDA0002046339370000042
wherein σ c Is the probability of class c, e is the natural index, γ c And gamma k Respectively representing the unactivated characteristic values of the categories C and k, wherein C is the total number of the categories;
step D2: carrying out one-hot coding on the semantic segmentation labels of the training set, and then calculating to obtain edge labels, wherein an edge label calculation formula is as follows:
Figure BDA0002046339370000043
wherein, y edge (i, j, c) and
Figure BDA0002046339370000044
edge labeling and semantic labeling for coordinate (i, j) location class c, (i) u ,j u ) Representing an 8 neighborhood U in (i, j) coordinates 8 Sgn () is a sign function;
and D3: respectively calculating the cross entropy of the pixel level by using probability graphs and corresponding labels of semantic segmentation and edges to obtain corresponding semantic segmentation loss L s And edge loss with assistance to supervision
Figure BDA0002046339370000045
The weight sum loss L is then calculated:
Figure BDA0002046339370000046
wherein the content of the first and second substances,
Figure BDA0002046339370000047
as edge features
Figure BDA0002046339370000048
Corresponding loss value, α os Is composed of
Figure BDA0002046339370000049
The weight occupied in the final loss;
and finally, updating the model parameters by utilizing back propagation iteration through a random gradient descent optimization method to train the whole deep neural network by minimizing weighting and loss L, so as to obtain a final deep neural network model.
The invention also provides a semantic segmentation system based on edge dense reconstruction for street view understanding, which comprises the following steps:
the preprocessing module is used for preprocessing the input images of the training set, and comprises the steps of subtracting the image mean value of the images from the images to standardize the images, and randomly shearing the images in a uniform size to obtain preprocessed images in the same size;
a coding feature extraction module for extracting general features F by using a convolution network backbone Based on the general feature F backbone Obtaining three-level context space pyramid fusion characteristic F tspp Used for capturing multi-scale context information and then extracting coding features F by using the two parts which are cascaded as a coding network encoder
A decoding feature extraction module for expanding the coding features F encoder The size is half of the size of the input image, and a half-input-size coding feature F is obtained us Selecting intermediate layer features from the convolutional network
Figure BDA0002046339370000051
Computing edge features
Figure BDA0002046339370000052
Combining half-input size coding features F us To fuse edge features
Figure BDA0002046339370000053
The dense network is a decoding network, image resolution reconstruction is carried out, and decoding characteristics F are extracted decoder
Neural network training module for using the decoding feature F decoder And edge features
Figure BDA0002046339370000054
Respectively acquiring a semantic segmentation probability map and an edge probability map, calculating edge image labels by using semantic image labels in a training set, respectively calculating semantic segmentation loss and edge loss for auxiliary supervision by using the semantic segmentation probability map and the edge probability map and respective corresponding labels, and training the whole deep neural network by using minimum weighting and loss of the semantic segmentation probability map and the edge probability map as targets to obtain a deep neural network model; and
and the semantic segmentation module is used for performing semantic segmentation on the image to be segmented by utilizing the trained deep neural network model and outputting a segmentation result.
Compared with the prior art, the invention has the beneficial effects that: firstly, three-level context space pyramid fusion features are used in multi-scale feature capture after a backbone network in a coding network, and the characteristics of original different receptive fields are optimized by pertinently utilizing internal features and global features, so that the coding feature expression capability is enriched. And then combining the edge features derived from the intermediate layer features and supplemented with supervision in a decoding network, adjusting edge parts which are easy to generate deviation in the feature resolution reconstruction process in a targeted manner, optimizing semantic segmentation results among different objects, and reconstructing the feature resolution in a dense network manner to better reuse the reconstructed features. Compared with the prior art, the method can obtain stronger context information expression capability after coding, can more effectively correct the problem of boundary ambiguity between objects by combining edge supervision in the decoding process, and simultaneously utilizes the reuse performance of a dense network structure to more effectively utilize the characteristics so as to make the network easier to train, thereby finally obtaining more accurate semantic segmentation results.
Drawings
FIG. 1 is a flow chart of a method implementation of an embodiment of the present invention.
Fig. 2 is a schematic system structure according to an embodiment of the present invention.
Detailed Description
The invention is described in further detail below with reference to the figures and the embodiments.
The invention provides a semantic segmentation method based on edge dense reconstruction for street view understanding, which comprises the following steps as shown in figure 1:
step A: preprocessing the input images of the training set, firstly, subtracting the image mean value of the images from the images to standardize the images, and then randomly shearing the images in a uniform size to obtain preprocessed images in the same size.
And B, step B: extracting general features F using general convolutional networks backbone Based on the general feature F backbone Obtaining three-level context space pyramid fusion characteristic F tspp For capturing multi-scale context information, and then extracting the coding features F by using the two parts cascaded in the step B as a coding network encoder (ii) a The method specifically comprises the following steps:
step B1: general feature F is extracted from the preprocessed image by using a general convolution network (the embodiment adopts an xception network provided in a depeplabv 3+ network) backbone
And step B2: using 1 × 1 convolution to feature F backbone Performing feature dimension reduction to obtain features
Figure BDA0002046339370000061
And step B3: to F backbone The whole image is subjected to average pooling, then the original size is restored by using nearest neighbor interpolation, and image-level features F are obtained by 1 × 1 convolution image
And step B4: with a porosity of r as By convolution kernel of F backbone Performing a convolution with a hole to obtain a feature
Figure BDA0002046339370000062
Then concatenate the three level context features
Figure BDA0002046339370000063
F image And
Figure BDA0002046339370000064
then using 1 × 1 convolution to perform feature fusion to obtain porosity r as Three-level context fusion feature of
Figure BDA0002046339370000065
In the convolution process, batch standardization is used for keeping the same distribution of input, and a linear rectification function is used as an activation function; the calculation formula of the convolution with the hole is as follows:
Figure BDA0002046339370000066
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0002046339370000067
is expressed in the output coordinate m as Porosity of site of use r as Processed result of the punctured convolution of (1), x as [m as +r as ·k as ]Representing an input x as At the coordinate m as Position at porosity of r as And the coordinates of the convolution kernel with holes are k as Input reference pixel, w, corresponding to as [k as ]Representing the punctured convolution kernel as k as A weight of the location;
and step B5: repeating the above steps using different porosities until n is obtained tspp The number of features (3 features in this example, the porosity is 6, 12, 18 respectively) is then determined tspp A feature of
Figure BDA0002046339370000068
And F image Splicing to obtain a three-level context space pyramid fusion feature F tspp
And step B6: using 1 × 1 convolution to feature F tspp Dimension reduction is carried out, then, regularization is carried out by dropout in deep learning, and the final coding feature F is obtained encoder
Step C: enlarging the coding feature F encoder Obtaining half input size coding feature F from size to half input image size us Selecting intermediate layer features from the convolutional network
Figure BDA0002046339370000071
Computing edge features
Figure BDA0002046339370000072
Combining half-input size coding features F us To fuse edge features
Figure BDA0002046339370000073
The dense network is a decoding network, image resolution reconstruction is carried out, and decoding characteristics F are calculated decoder (ii) a The method specifically comprises the following steps:
step C1: defining the ratio of the size of the initial input image to the size of the feature as the output step of the feature, and encoding the feature F using nearest neighbor interpolation encoder Obtaining a characteristic diagram F with the output step of 2 us
And step C2: selecting middle layer characteristics with output stride of os from convolution network for extracting general characteristics
Figure BDA0002046339370000074
Dimension reduction using 1 × 1 convolution and then expansion using bilinear interpolation
Figure BDA0002046339370000075
Multiplying edge features
Figure BDA0002046339370000076
And C3: splicing feature F us And
Figure BDA0002046339370000077
after dimension reduction by using 1 × 1 convolution, using 3 × 3 convolution to extract features to obtain decoding features F decoder
And C4: selecting an output step os smaller than that in the step C2, finishing the extraction of decoding characteristics if all the output steps are processed, or splicing F us And F decoder As new F us And repeating steps C2 to C3.
Step D: using decoding features F decoder And edge features
Figure BDA0002046339370000078
Respectively acquiring a semantic segmentation probability map and an edge probability map, calculating edge image labels by using semantic image labels in a training set, respectively calculating semantic segmentation loss and edge loss for auxiliary supervision by using the semantic segmentation probability map and the edge probability map and respective corresponding labels, and training the whole deep neural network by using minimum weighting and loss of the semantic segmentation probability map and the edge probability map as targets; the method specifically comprises the following steps:
step D1: using bilinear interpolation to interpolate feature F decoder And all the features
Figure BDA0002046339370000079
Scaled to the same size as the input image and derived the semantic segmentation probability and edge probability by a 1 × 1 convolution calculation using softmax as the activation function, the softmax calculation formula is as follows:
Figure BDA00020463393700000710
wherein σ c Is the probability of class c, e is the natural index, γ c And gamma k Respectively representing the unactivated characteristic values of the categories C and k, wherein C is the total number of the categories;
step D2: carrying out single-hot coding on the semantic segmentation labels of the training set, and then calculating to obtain edge labels, wherein the edge label calculation formula is as follows:
Figure BDA0002046339370000081
wherein, y edge (i, j, c) and
Figure BDA0002046339370000082
edge labeling and semantic labeling for coordinate (i, j) location class c, (i) u ,j u ) Representing 8 neighborhoods U in (i, j) coordinates 8 Sgn () is a sign function;
and D3: respectively calculating the cross entropy of the pixel level by using probability graphs and corresponding labels of semantic segmentation and edges to obtain corresponding semantic segmentation loss L s And edge loss with assistance to supervision
Figure BDA0002046339370000083
The weight sum loss L is then calculated:
Figure BDA0002046339370000084
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0002046339370000085
as edge features
Figure BDA0002046339370000086
Corresponding loss value, α os Is composed of
Figure BDA0002046339370000087
The weight occupied in the final loss, α os Satisfy the requirement of
Figure BDA0002046339370000088
And each alpha os Equal;
and finally, updating the model parameters by utilizing back propagation iteration through a random gradient descent optimization method to train the whole deep neural network by minimizing weighting and loss L, so as to obtain a final deep neural network model.
Step E: and performing semantic segmentation on the image to be segmented by using the trained deep neural network model, and outputting a segmentation result.
The invention also provides a semantic segmentation system for street view understanding, which is used for implementing the method, and as shown in fig. 2, the semantic segmentation system comprises:
the preprocessing module is used for preprocessing the input images of the training set, and comprises subtracting the image mean value of the images to standardize the images, and randomly shearing the images in uniform size to obtain preprocessed images in the same size;
a coding feature extraction module for extracting general features F by using a convolution network backbone Based on the general feature F backbone Obtaining three-level context space pyramid fusion characteristic F tspp Used for capturing multi-scale context information and then extracting coding features F by using the two parts which are cascaded as a coding network encoder
A decoding feature extraction module for enlarging the encoding feature F encoder The size is half of the size of the input image, and a half-input-size coding feature F is obtained us Selecting intermediate layer features from the convolutional network
Figure BDA0002046339370000089
Computing edge features
Figure BDA00020463393700000810
Combining half-input size coding features F us To fuse edge features
Figure BDA00020463393700000811
The dense network is a decoding network, image resolution reconstruction is carried out, and decoding characteristics F are extracted decoder
Neural network training module for using the decoded features F decoder And edge features
Figure BDA00020463393700000812
Respectively acquiring a semantic segmentation probability map and an edge probability map, calculating edge image labels by using semantic image labels in a training set, respectively calculating semantic segmentation loss and edge loss for auxiliary supervision by using the semantic segmentation probability map and the edge probability map and respective corresponding labels, and training the whole deep neural network by using minimum weighting and loss of the semantic segmentation probability map and the edge probability map as targets to obtain a deep neural network model; and
and the semantic segmentation module is used for performing semantic segmentation on the image to be segmented by using the trained deep neural network model and outputting a segmentation result.
The above are preferred embodiments of the present invention, and all changes made according to the technical scheme of the present invention that produce functional effects do not exceed the scope of the technical scheme of the present invention belong to the protection scope of the present invention.

Claims (3)

1. A semantic segmentation method based on edge dense reconstruction for street view understanding is characterized by comprising the following steps:
step A: preprocessing an input image of a training set, firstly, subtracting an image mean value of the input image from the image to standardize the input image, and then randomly shearing the image with uniform size to obtain a preprocessed image with the same size;
and B: extracting general features F with convolutional networks backbone Based on the general feature F backbone Obtaining three-level context space pyramid fusion feature F tspp For capturing multi-scale context information and then extracting coding features F encoder
And C: enlarging the coding feature F encoder Obtaining half input size coding feature F from size to half input image size us Selecting intermediate layer features from the convolutional network
Figure FDA0003807705320000011
Computing edge features
Figure FDA0003807705320000012
Combining half-input size coding features F us To fuse edge features
Figure FDA0003807705320000013
The dense network is a decoding network, image resolution reconstruction is carried out, and decoding characteristics F are calculated decoder
Step D: using decoding features F decoder And edge features
Figure FDA0003807705320000014
Respectively acquiring a semantic segmentation probability map and an edge probability map, calculating edge image labels by using semantic image labels in a training set, respectively calculating semantic segmentation loss and edge loss for auxiliary supervision by using the semantic segmentation probability map and the edge probability map and respective corresponding labels, and training the whole deep neural network by using minimum weighting and loss of the semantic segmentation probability map and the edge probability map as targets;
step E: performing semantic segmentation on an image to be segmented by using the trained deep neural network model, and outputting a segmentation result;
in the step B, extracting general characteristics F by using a convolution network backbone Based on the general feature F backbone Obtaining three-level context space pyramid fusion feature F tspp Then extracting the coding feature F encoder The method comprises the following steps:
step B1: extraction of generic features F from preprocessed images using convolutional networks backbone
And step B2: using 1 × 1 convolution to feature F backbone Performing feature dimension reduction to obtain features
Figure FDA0003807705320000015
And step B3: to F backbone The whole image is subjected to average pooling, then the original size is restored by using nearest neighbor interpolation, and image-level features F are obtained by 1 × 1 convolution image
And step B4: with a porosity of r as Of the convolution kernelTo F backbone Performing a convolution with a hole to obtain a feature
Figure FDA0003807705320000016
Then concatenate the three level context features
Figure FDA0003807705320000017
F image And
Figure FDA0003807705320000018
then using 1 × 1 convolution to perform feature fusion to obtain the porosity of r as Three level context fusion feature of
Figure FDA0003807705320000021
In the convolution process, batch standardization is used for keeping the same distribution of input, and a linear rectification function is used as an activation function; the calculation formula of the convolution with the hole is as follows:
Figure FDA0003807705320000022
wherein, the first and the second end of the pipe are connected with each other,
Figure FDA0003807705320000023
is shown at output coordinate m as Porosity of site used is r as Processed result of the punctured convolution of (1), x as [m as +r as ·k as ]Representing an input x as At the coordinate m as At a position of porosity of r as And the coordinates of the convolution kernel with holes are k as Input reference pixel, w, corresponding to time as [k as ]Representing the punctured convolution kernel as k as A weight of the location;
and step B5: repeating the above steps using different porosities until n is obtained tspp A feature, then n is tspp A feature of
Figure FDA0003807705320000024
And F image Splicing to obtain a three-level context space pyramid fusion feature F tspp
Step B6: using 1 × 1 convolution to feature F tspp Dimension reduction is carried out, then, the discriminant in deep learning is used for regularization, and the final coding feature F is obtained encoder
In said step C, the coding feature F is enlarged encoder Obtaining half input size coding feature F from size to half input image size us Selecting intermediate layer features from the convolutional network
Figure FDA0003807705320000025
Computing edge features
Figure FDA0003807705320000026
Combining half-input size coding features F us To fuse edge features
Figure FDA0003807705320000027
The dense network is a decoding network, image resolution reconstruction is carried out, and decoding characteristics F are calculated decoder The method comprises the following steps:
step C1: defining the ratio of the size of the initial input image to the size of the feature as the output stride of the feature, and encoding the feature F using nearest neighbor interpolation encoder Obtaining a characteristic diagram F with the output step of 2 us
And C2: selecting middle layer characteristics with output stride of os from convolution network for extracting general characteristics
Figure FDA0003807705320000028
Dimension reduction using 1 × 1 convolution and then expansion using bilinear interpolation
Figure FDA0003807705320000029
Multiplying edge features
Figure FDA00038077053200000210
And C3: splicing feature F us And
Figure FDA00038077053200000211
after dimension reduction by using 1 × 1 convolution, using 3 × 3 convolution to extract features to obtain decoding features F decoder
And C4: selecting an output step os smaller than that in the step C2, finishing the extraction of decoding characteristics if all the output steps are processed, or splicing F us And F decoder As new F us And repeating steps C2 to C3.
2. The method of claim 1, wherein in step D, a decoding feature F is used decoder And edge features
Figure FDA0003807705320000031
Respectively acquiring a semantic segmentation probability graph and an edge probability graph, calculating edge image labels by using semantic image labels in a training set, respectively calculating semantic segmentation loss and edge loss for auxiliary supervision by using the semantic segmentation probability graph and the edge probability graph and respective corresponding labels, and training the whole deep neural network by using minimized weighting and loss of the semantic segmentation loss and the edge probability graph as targets, wherein the method comprises the following steps:
step D1: using bilinear interpolation to interpolate feature F decoder And all the features
Figure FDA0003807705320000032
Scaled to the same size as the input image and derived the semantic segmentation probability and edge probability by a 1 × 1 convolution calculation using softmax as the activation function, the softmax calculation formula is as follows:
Figure FDA0003807705320000033
wherein σ c Is the probability of class c, e is the natural index, γ c And gamma k Respectively representing the unactivated characteristic values of the categories C and k, wherein C is the total number of the categories;
step D2: carrying out one-hot coding on the semantic segmentation labels of the training set, and then calculating to obtain edge labels, wherein an edge label calculation formula is as follows:
Figure FDA0003807705320000034
wherein, y edge (i, j, c) and
Figure FDA0003807705320000035
edge labeling and semantic labeling for coordinate (i, j) location class c, (i) u ,j u ) Representing 8 neighborhoods U in (i, j) coordinates 8 Sgn () is a sign function;
and D3: respectively calculating the cross entropy of the pixel level by using probability graphs and corresponding labels of semantic segmentation and edges to obtain corresponding semantic segmentation loss L s And edge loss with assistance to supervision
Figure FDA0003807705320000036
The weight sum loss L is then calculated:
Figure FDA0003807705320000037
wherein alpha is os Is composed of
Figure FDA0003807705320000038
The weight occupied in the final loss;
and finally, updating the model parameters by utilizing back propagation iteration through a random gradient descent optimization method to train the whole deep neural network by minimizing weighting and loss L, so as to obtain a final deep neural network model.
3. A semantic segmentation system based on edge dense reconstruction for street view understanding for implementing the method of claim 1, comprising:
the preprocessing module is used for preprocessing the input images of the training set, and comprises subtracting the image mean value of the images to standardize the images, and randomly shearing the images in uniform size to obtain preprocessed images in the same size;
a coding feature extraction module for extracting general features F by using a convolution network backbone Based on the general feature F backbone Obtaining three-level context space pyramid fusion characteristic F tspp For capturing multi-scale context information and then extracting coding features F encoder
A decoding feature extraction module for enlarging the encoding feature F encoder The size is half of the size of the input image, and a half-input-size coding feature F is obtained us Selecting intermediate layer features from the convolutional network
Figure FDA0003807705320000041
Computing edge features
Figure FDA0003807705320000042
Combining half-input size coding features F us To fuse edge features
Figure FDA0003807705320000043
The dense network is a decoding network, image resolution reconstruction is carried out, and decoding characteristics F are extracted decoder
Neural network training module for using the decoding feature F decoder And edge features
Figure FDA0003807705320000044
Respectively obtaining a semantic segmentation probability map and an edge probability map, calculating edge image labels by using the semantic image labels in a training set, and utilizing the semantic segmentation probability map, the edge probability map and the respective semantic segmentation probability mapCorresponding labels are respectively calculated to obtain semantic segmentation loss and edge loss for auxiliary supervision, and the whole deep neural network is trained by taking minimum weighting and loss of the semantic segmentation loss and the edge loss for auxiliary supervision as targets to obtain a deep neural network model; and
and the semantic segmentation module is used for performing semantic segmentation on the image to be segmented by utilizing the trained deep neural network model and outputting a segmentation result.
CN201910359119.0A 2019-04-30 2019-04-30 Semantic segmentation method and system based on edge dense reconstruction for street view understanding Active CN110059698B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910359119.0A CN110059698B (en) 2019-04-30 2019-04-30 Semantic segmentation method and system based on edge dense reconstruction for street view understanding

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910359119.0A CN110059698B (en) 2019-04-30 2019-04-30 Semantic segmentation method and system based on edge dense reconstruction for street view understanding

Publications (2)

Publication Number Publication Date
CN110059698A CN110059698A (en) 2019-07-26
CN110059698B true CN110059698B (en) 2022-12-23

Family

ID=67321810

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910359119.0A Active CN110059698B (en) 2019-04-30 2019-04-30 Semantic segmentation method and system based on edge dense reconstruction for street view understanding

Country Status (1)

Country Link
CN (1) CN110059698B (en)

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110517278B (en) * 2019-08-07 2022-04-29 北京旷视科技有限公司 Image segmentation and training method and device of image segmentation network and computer equipment
CN110598846B (en) * 2019-08-15 2022-05-03 北京航空航天大学 Hierarchical recurrent neural network decoder and decoding method
CN110599514B (en) * 2019-09-23 2022-10-04 北京达佳互联信息技术有限公司 Image segmentation method and device, electronic equipment and storage medium
CN110895814B (en) * 2019-11-30 2023-04-18 南京工业大学 Aero-engine hole-finding image damage segmentation method based on context coding network
CN113051983B (en) * 2019-12-28 2022-08-23 中移(成都)信息通信科技有限公司 Method for training field crop disease recognition model and field crop disease recognition
CN111341438B (en) * 2020-02-25 2023-04-28 中国科学技术大学 Image processing method, device, electronic equipment and medium
CN111429473B (en) * 2020-02-27 2023-04-07 西北大学 Chest film lung field segmentation model establishment and segmentation method based on multi-scale feature fusion
CN111340047B (en) * 2020-02-28 2021-05-11 江苏实达迪美数据处理有限公司 Image semantic segmentation method and system based on multi-scale feature and foreground and background contrast
CN112150478B (en) * 2020-08-31 2021-06-22 温州医科大学 Method and system for constructing semi-supervised image segmentation framework
CN112700462A (en) * 2020-12-31 2021-04-23 北京迈格威科技有限公司 Image segmentation method and device, electronic equipment and storage medium
CN113128353B (en) * 2021-03-26 2023-10-24 安徽大学 Emotion perception method and system oriented to natural man-machine interaction
CN113706545B (en) * 2021-08-23 2024-03-26 浙江工业大学 Semi-supervised image segmentation method based on dual-branch nerve discrimination dimension reduction
CN114627086B (en) * 2022-03-18 2023-04-28 江苏省特种设备安全监督检验研究院 Crane surface damage detection method based on characteristic pyramid network
CN115953394B (en) * 2023-03-10 2023-06-23 中国石油大学(华东) Ocean mesoscale vortex detection method and system based on target segmentation
CN116978011B (en) * 2023-08-23 2024-03-15 广州新华学院 Image semantic communication method and system for intelligent target recognition

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10095977B1 (en) * 2017-10-04 2018-10-09 StradVision, Inc. Learning method and learning device for improving image segmentation and testing method and testing device using the same
CN109241972A (en) * 2018-08-20 2019-01-18 电子科技大学 Image, semantic dividing method based on deep learning
CN109509192A (en) * 2018-10-18 2019-03-22 天津大学 Merge the semantic segmentation network in Analysis On Multi-scale Features space and semantic space

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10095977B1 (en) * 2017-10-04 2018-10-09 StradVision, Inc. Learning method and learning device for improving image segmentation and testing method and testing device using the same
CN109241972A (en) * 2018-08-20 2019-01-18 电子科技大学 Image, semantic dividing method based on deep learning
CN109509192A (en) * 2018-10-18 2019-03-22 天津大学 Merge the semantic segmentation network in Analysis On Multi-scale Features space and semantic space

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Pyramid Context Contrast for Semantic Segmentation;YuZhong Chen;《IEEE Access》;20191127;全文 *
基于深度神经网络的小目标语义分割算法研究;胡太;《中国优秀硕士学位论文全文数据库》;20190115;全文 *

Also Published As

Publication number Publication date
CN110059698A (en) 2019-07-26

Similar Documents

Publication Publication Date Title
CN110059698B (en) Semantic segmentation method and system based on edge dense reconstruction for street view understanding
CN110059768B (en) Semantic segmentation method and system for fusion point and region feature for street view understanding
CN110059769B (en) Semantic segmentation method and system based on pixel rearrangement reconstruction and used for street view understanding
CN111047551B (en) Remote sensing image change detection method and system based on U-net improved algorithm
CN109446992B (en) Remote sensing image building extraction method and system based on deep learning, storage medium and electronic equipment
CN110322495B (en) Scene text segmentation method based on weak supervised deep learning
CN110889449A (en) Edge-enhanced multi-scale remote sensing image building semantic feature extraction method
CN109919830B (en) Method for restoring image with reference eye based on aesthetic evaluation
CN115797931A (en) Remote sensing image semantic segmentation method based on double-branch feature fusion
CN113850825A (en) Remote sensing image road segmentation method based on context information and multi-scale feature fusion
CN110992270A (en) Multi-scale residual attention network image super-resolution reconstruction method based on attention
CN113888550B (en) Remote sensing image road segmentation method combining super-resolution and attention mechanism
CN110070091A (en) The semantic segmentation method and system rebuild based on dynamic interpolation understood for streetscape
CN111340047B (en) Image semantic segmentation method and system based on multi-scale feature and foreground and background contrast
CN111414923B (en) Indoor scene three-dimensional reconstruction method and system based on single RGB image
CN109886159B (en) Face detection method under non-limited condition
CN112700418B (en) Crack detection method based on improved coding and decoding network model
CN112232351A (en) License plate recognition system based on deep neural network
CN115424017B (en) Building inner and outer contour segmentation method, device and storage medium
CN114283285A (en) Cross consistency self-training remote sensing image semantic segmentation network training method and device
CN114463340B (en) Agile remote sensing image semantic segmentation method guided by edge information
CN111985372A (en) Remote sensing image water body extraction system for deep learning
CN114998671A (en) Visual feature learning device based on convolution mask, acquisition device and storage medium
CN112800851B (en) Water body contour automatic extraction method and system based on full convolution neuron network
Lu et al. Edge-reinforced convolutional neural network for road detection in very-high-resolution remote sensing imagery

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant