CN113449640B

CN113449640B - Remote sensing image building semantic segmentation edge optimization method based on multitask CNN + GCN

Info

Publication number: CN113449640B
Application number: CN202110725267.7A
Authority: CN
Inventors: 刘修国; 邓睿哲; 陈奇; 张丛珊
Original assignee: China University of Geosciences
Current assignee: China University of Geosciences
Priority date: 2021-06-29
Filing date: 2021-06-29
Publication date: 2022-02-11
Anticipated expiration: 2041-06-29
Also published as: CN113449640A

Abstract

The invention provides a multitask CNN + GCN-based remote sensing image building semantic segmentation edge optimization method, which adopts CNN to extract high-level semantic features of a building from a remote sensing image and GCN to rapidly perform graph reasoning on a high-resolution original image; then, the deep features with lower resolution from the CNN are mapped to the original image again by adopting a plurality of times of up-sampling, transverse connection and convolution operations, and the building edge extraction and the primary building semantic segmentation are carried out according to the deep features; integrating the deep features with the edge extraction result, and constraining the edge of the primary building semantic segmentation result; and finally, a graph feature self-adaptive optimization module is adopted to promote GCN features to effectively optimize constrained building semantic segmentation results, and the building semantic segmentation results with excellent edge performance are output. The invention has the beneficial effects that: the edge details of the CNN-based remote sensing image building semantic segmentation result are adaptively optimized, and the precision and the application value of the building automatic mapping result are improved.

Description

Remote sensing image building semantic segmentation edge optimization method based on multitask CNN + GCN

Technical Field

The invention relates to the field of surveying and mapping science and technology, in particular to a method for optimizing semantic segmentation edges of a remote sensing image building by utilizing a multitask CNN + GCN semantic segmentation model, and particularly relates to a method for optimizing the semantic segmentation edges of the remote sensing image building based on the multitask CNN + GCN.

Background

Accurate building vector contour information is obtained from the high-resolution remote sensing image, and important basis can be provided for various application fields such as city planning, land survey, illegal building detection and military reconnaissance. Because the human-based visual interpretation and label labeling of the high-resolution remote sensing image are quite high in working time cost, the CNN-based semantic segmentation method is used for intelligently and quickly extracting the remote sensing image building, and the contour extraction and regularization processing are adopted to generate building vector data, so that the method is a more economic and efficient information acquisition mode. However, the CNN-based remote sensing image semantic segmentation method has the disadvantages that local detail textures disappear due to multiple downsampling, the receptive field is limited to be only linearly and slowly increased along with the network depth, the network is difficult to capture large-range global semantic information, the final semantic segmentation result is difficult to be satisfactorily represented on the edge, and the realization of high-precision remote sensing image building automation vectors is still difficult. Therefore, the remote sensing image building semantic segmentation edge optimization technology has important significance for improving the precision, quality and application value of the automatic vector.

The existing CNN-based remote sensing image building semantic segmentation edge optimization technology can be mainly divided into an edge optimization method based on traditional structure modeling, an edge optimization method based on CNN feature enhancement, an edge optimization method based on edge information guidance and an edge optimization method based on graph information integration, but the methods do not consider the characteristics of CNN invariance due to pooling scale, translation and rotation and the characteristics of convolution gradual abstraction characteristics, are difficult to extract spatial fine position information of each pixel point, and how to accurately sense the spatial precise position of each pixel point in a non-CNN manner to optimize a CNN-based remote sensing image building semantic segmentation result does not have a clear solution at present.

Disclosure of Invention

The invention aims to solve the technical problem that the CNN adopted in the prior art is difficult to accurately sense the accurate position of each pixel point in space, and provides a multitask CNN + GCN-based semantic segmentation edge optimization method for a remote sensing image building, which mainly comprises the following steps:

s1, constructing a remote sensing image building sample set;

s2, constructing a multitask CNN + GCN semantic segmentation model by using the ResNet and the GCN as frameworks according to the building sample set;

s3, remapping the resolution of the high-level semantic features of the building output in the step S2 to the original image, and performing building edge extraction and primary building semantic segmentation to respectively obtain a building edge extraction probability map and a primary building semantic segmentation result;

s4, performing feature integration on the building edge extraction probability graph output in the step S3 and the high-level semantic features of the building output in the step S2, and performing semantic segmentation on the building features based on edge constraint to obtain a building semantic segmentation result based on edge constraint;

s5, a graph feature adaptive optimization module is adopted to enable the GCN reasoning features output in the step S1 to optimize the building semantic segmentation result based on the edge constraint output in the step S4, and a building semantic segmentation result with excellent edge performance is output;

and S6, training the multitask CNN + GCN semantic segmentation model by using the three building semantic segmentation results output by the steps S3, S4 and S5 and the building edge extraction probability graph by adopting a backward propagation and random gradient descent algorithm to obtain a trained multitask CNN + GCN semantic segmentation model after reaching preset precision, and inputting the remote sensing image of the actual building into the trained multitask CNN + GCN semantic segmentation model to obtain a more optimized actual building edge semantic segmentation result.

Further, the specific steps of constructing the remote sensing image building sample set are as follows:

s11, converting the original building vector grid into a binary building semantic label image, and adjusting the remote sensing image range to be the same as the binary image;

s12, extracting the outline of the building binary image label to obtain a building edge binary image label;

s13, sliding the remote sensing image, the building semantic label image and the building edge label image at the same time by using a window with a preset size and a preset step length, counting pixel points in the window, marking the proportion of the building by using the number of the pixel points, and performing image cutting work if the proportion is greater than a preset proportion value;

s14, after finishing the cutting work, according to the following steps: a ratio of 3 divides the sample set into a training set and a test set, with 70% of the data being used for training and 30% of the data being used for testing the trained performance.

Further, the ResNet extraction of the high-level semantic features of the building is implemented by a series of convolution and pooling operations, and the remapping of the resolution back to the original image is implemented by a number of upsampling, cross-linking, and convolution operations.

Further, the building semantic segmentation based on edge constraint is obtained by building semantic tag supervised learning, wherein a CNN deep layer feature map is set as CF, a building edge extraction probability map is set as BP, the deep layer semantic features are integrated and constrained by edge information in a CF x (1+ BP) mode, and then the building semantic segmentation based on edge constraint is completed by adopting a full convolution network.

Further, the process of graph reasoning for the original image by using the GCN has the mathematical expression:

where A is an adjacency matrix representing the spatial positions and mutual potential relationships of the pixels, and H^(l+1)And H^(l)Is the vertex feature of the l-th and l + 1-th layers,

is the training weight learnable by the l-th layer, sigma is the nonlinear activation function, when l is 0, H⁰The dimension of the two-dimensional graph structure after dimension transformation is carried out on the input remote sensing image, the dimension is (m multiplied by m) multiplied by n, m multiplied by m is a preset size of a window, n is an input image channel, and m and n are positive integers.

Further, based on the basic principle that any Laplace matrix can be diagonal, the adjacent matrix A is quickly constructed through convolution and matrix operation, and the mathematical expression of the adjacent matrix A is as follows:

A＝φ(BP)diag(ρ(BP))φ^T(BP)

wherein BP isExtracting probability graph of building edge, phi (-) using conventional convolution operation for dimension change, rho (-) being self-adaptive pooling, diag (-) being diagonalization operation, (.)^TFor the operation of matrix transposition, the multiplication combination law of matrix operation omits the storage of a huge intermediate result adjacent matrix A, reduces the video memory overhead, ensures the network efficiency, and realizes the process of rapid reasoning of GCN on high-resolution original images, and the mathematical expression is as follows:

further, the graph feature adaptive optimization module optimizes the CNN building semantic segmentation result by enhancing the spatial detail features, the graph feature adaptive optimization module takes the building semantic segmentation result based on the edge constraint and the cascade feature of the GCN feature as input, generates an attention map with abundant spatial local details by a conventional convolution operation, and optimizes the building semantic segmentation result based on the edge constraint according to the attention map by the GCN feature, and the mathematical expression of the graph feature adaptive optimization module is as follows:

Seg₃＝δ(Seg₂+δ(Concat(GF,Seg₂))*GF)

wherein Seg₃Optimizing the results for building semantic segmentation, Seg₂For the building semantic segmentation result based on edge constraint, GF is GCN feature, δ (-) is conventional convolution operation, and Concat (-) is feature concatenation.

Further, in the training process, a Loss function is used for calculating a Loss value Loss of the multitask CNN + GCN semantic segmentation model, and a specific mathematical formula is as follows:

wherein the content of the first and second substances,

for the loss of the difference between the ith semantic segmentation result and the ground surface real categoryThe specific mathematical formula is:

wherein n represents a sample number, seg_n,iPrediction probability, seg, representing whether a pixel in the semantic segmentation result is a building or not_n,1Representing the result of the primary building semantic segmentation, seg_n,2Seg for building semantic segmentation results based on edge constraints_n,3Represents the building semantic segmentation result based on GCN edge optimization,

a true value corresponding to the value is shown, and the presence of a building is 1, and the absence of a building is 0;

L_edgethe specific mathematical formula of the loss of the difference between the building edge detection result and the real building edge is as follows:

wherein n represents a sample number, BP_nRepresenting the probability of whether a pixel in edge detection is a building edge,

the true value corresponding to this is 1 for the building edge, and is not 0 for the building edge.

Further, a gradient descent algorithm is applied to the Loss function, when the Loss value Loss approaches convergence, the preset precision is reached, and the multi-task CNN + GCN semantic segmentation model is trained completely.

Furthermore, in the training process, the hyper-parameters including the proportion of positive and negative samples, the learning rate, the batch size and the weight attenuation coefficient need to be debugged according to the test condition.

The technical scheme provided by the invention has the beneficial effects that:

1. the construction process of the adjacent matrix is simulated through matrix diagonalization operation, the storage process of the adjacent matrix is omitted by utilizing simple matrix multiplication combination rate, the GCN can be enabled to directly act on a CNN high-resolution spatial domain characteristic diagram, potential spatial relation among pixel points is explored by utilizing the GCN, the accurate position of the pixel points on the space is accurately sensed, and the method has stronger feasibility compared with a method based on the CNN for accurately acquiring building edge information;

2. the multitask CNN + GCN semantic segmentation model capable of being trained and predicted end to end optimizes a CNN semantic segmentation result by using GCN characteristics rich in accurate position information through a graph characteristic self-adaptive optimization module, and further improves the accuracy and application value of an automatic building extraction result.

Drawings

The invention will be further described with reference to the accompanying drawings and examples, in which:

FIG. 1 is a structural diagram of a multitask CNN + GCN semantic segmentation model of a semantic segmentation edge optimization method for a remote sensing image building in an embodiment of the present invention;

FIG. 2 is a flow chart of building sample set construction in a semantic segmentation edge optimization method for a remote sensing image building according to an embodiment of the present invention;

FIG. 3 is a schematic structural diagram of fast graph reasoning performed by the GCN in the semantic segmentation edge optimization method for the remote sensing image building according to the embodiment of the present invention;

FIG. 4 is a schematic structural diagram of a graph feature adaptive optimization module in the method for optimizing semantic segmentation edges of a remote sensing image building according to the embodiment of the present invention.

Detailed Description

For a more clear understanding of the technical features, objects and effects of the present invention, embodiments of the present invention will now be described in detail with reference to the accompanying drawings.

The embodiment of the invention provides a remote sensing image building semantic segmentation edge optimization method based on multitask CNN + GCN.

Referring to fig. 1, fig. 1 is a schematic structural diagram of a multitask CNN + GCN semantic segmentation model of a method for optimizing semantic segmentation edges of a remote sensing image building according to an embodiment of the present invention, and the specific steps include:

s1, constructing a remote sensing image building sample set, please refer to fig. 2, which specifically includes the following steps:

s11, converting the original building vector grid into a binary building semantic label image, and adjusting the high-resolution remote sensing image range to be the same as the binary image;

s13, sliding a window with the size of 400 multiplied by 400 and the step length of 100 on a high-resolution remote sensing image, a building semantic label image and a building edge label image simultaneously, counting pixel points in the window, marking the proportion of the building by the number of the pixel points, and performing image cutting if the proportion is more than 10%;

s14, after finishing the cutting work, according to the following steps: the ratio of 3 divides the sample set into a training set and a testing set, 70% of data is used for training, and 30% of data is used for investigating the performance of the model after training.

S2, constructing a multitask CNN + GCN semantic segmentation model by using the building sample set constructed in the step S1 and using ResNet and GCN as frameworks, wherein ResNet extracts high-level semantic features of the building through a series of convolution and pooling operations, and GCN conducts fast graph reasoning on a high-resolution original image through a conventional graph convolution operation to mine potential spatial correlation among pixels.

Referring to fig. 3, the mathematical expression of the fast graph reasoning process for the high resolution original image by using the GCN is:

is the training weight learnable by the l-th layer, sigma is the nonlinear activation function, when l is 0, H⁰The dimension of the two-dimensional graph structure obtained by performing the dimension transformation on the input remote sensing image is (400 × 400) × 3, 400 × 400 is the image size after the cropping, that is, the window size of step S13, and 3 is the input image channel.

Based on the basic principle that any Laplace matrix can be diagonal, the adjacent matrix A is quickly constructed through convolution and matrix operation, and the mathematical expression of the adjacent matrix A is as follows:

A＝φ(BP)diag(ρ(BP))φ^T(BP)

wherein, BP is a probability graph extracted from the building edge, phi (-) adopts the conventional convolution operation for dimension change, rho (-) is self-adaptive pooling, diag (-) is diagonalization operation, (.)^TFor the operation of matrix transposition, the multiplication combination law of matrix operation omits the storage of a huge intermediate result adjacent matrix A, reduces the video memory overhead, ensures the network efficiency, and realizes the process of rapid reasoning of GCN on high-resolution original images, and the mathematical expression is as follows:

wherein H^(l+1)And H^(l)Is the vertex characteristics of the l-th layer and the l + 1-th layer, BP is a building edge extraction probability graph, phi (-) adopts the conventional convolution operation for dimension change, rho (-) is self-adaptive pooling, diag (-) is diagonalization operation, ()^TIn order to perform the matrix transposition operation,

is the training weight learnable by the l-th layer, and σ is the nonlinear activation function. When l is 0, H⁰The dimension of the two-dimensional graph structure obtained by carrying out dimension transformation on the input high-resolution remote sensing image is (400 multiplied by 400) multiplied by 3. The multiplication combination law of the matrix operation omits the storage of a huge intermediate result adjacent matrix A, reduces the video memory overhead and ensures the network effectAnd the rapid reasoning process of the GCN on the high-resolution original image is realized.

S3, remapping the resolution of the high-level semantic features of the building output in the step S2 to the original image by adopting a plurality of times of up-sampling, transverse connection and convolution operations, and extracting the edge of the building and carrying out primary building semantic segmentation. And when the resolution of the high-level semantic features is remapped back to the original image, a feature pyramid structure is adopted to fuse the deep-layer semantic features and the shallow-layer texture features of the building in the remote sensing image, and the idea of multi-task learning is adopted to simultaneously extract the edges of the building and segment the semantics.

And S4, performing feature integration on the building edge extraction probability graph output in the step S3 and the high-level semantic features of the building output in the step S2, and performing semantic segmentation on the building features based on edge constraint.

In this embodiment, a CNN deep feature map is set as CF, a building edge probability map is set as BP, the process uses edge information to perform integration constraint on deep semantic features in a CF × (1+ BP) manner, and further, a full convolution network is used to complete building semantic segmentation based on edge constraint, and the constraint process is supervised and learned by building semantic tags.

And S5, adopting a graph feature self-adaptive optimization module to enable the GCN inference features output in the step S1 to optimize the building semantic segmentation result based on the edge constraint output in the step S4, and outputting the building semantic segmentation result with excellent edge performance. The graph feature self-adaptive optimization module takes the building semantic segmentation result based on the edge constraint and the cascade feature of the GCN feature as input, generates an attention diagram with rich space local details through conventional convolution operation, and enables the GCN feature to optimize the building semantic segmentation result based on the edge constraint according to the attention diagram, wherein the mathematical expression is as follows:

Seg₃＝δ(Seg₂+δ(Concat(GF,Seg₂))*GF)

wherein Seg₃Optimizing the results for building semantic segmentation, Seg₂For the building semantic segmentation result based on edge constraint, GF is GCN feature, Concat (-) is feature cascade,δ (-) is a conventional convolution operation, see FIG. 4 in particular.

S6, the multitask CNN + GCN semantic segmentation model simultaneously supervises the three building semantic segmentation results and the primary edge extraction results output by the steps S3, S4 and S5, backward propagation and random gradient descent algorithms are adopted for training, the trained multitask CNN + GCN semantic segmentation model is further obtained, and the more optimized practical building edge semantic segmentation result can be obtained by inputting the remote sensing image of the practical building into the trained multitask CNN + GCN semantic segmentation model.

In this embodiment, according to the output item of the model, a Loss function is used to calculate a Loss value Loss of the multitask CNN + GCN semantic segmentation model, and a specific mathematical formula of the Loss value Loss is as follows:

wherein the content of the first and second substances,

the specific mathematical formula of the loss of the difference between the ith semantic segmentation result and the real earth surface category is as follows:

the true value indicates that there is a building 1 and there is no building 0.

L_edgeThe loss of difference between the building edge detection result and the real building edgeThe specific mathematical formula is:

And applying a gradient descent algorithm to the Loss function, finishing training of the multi-task CNN + GCN semantic segmentation model when the Loss value Loss approaches convergence, applying the multi-task CNN + GCN semantic segmentation model to an actual execution step, and debugging hyper-parameters such as positive and negative sample proportion, learning rate, batch size, weight attenuation coefficient and the like according to a test condition in the training process.

The invention has the beneficial effects that:

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. A remote sensing image building semantic segmentation edge optimization method based on multitask CNN + GCN is characterized by comprising the following steps: the method comprises the following steps:

s1, constructing a remote sensing image building sample set;

s3, remapping the resolution of the high-level semantic features of the building extracted by ResNet in the step S2 to the original image, and performing building edge extraction and primary building semantic segmentation to respectively obtain a building edge extraction probability map and a primary building semantic segmentation result;

s5, a graph feature adaptive optimization module is adopted to enable GCN reasoning features to optimize the building semantic segmentation result based on the edge constraint and output a building semantic segmentation result with excellent edge performance, wherein the building semantic segmentation result is output in the step S4;

and S6, training the multitask CNN + GCN semantic segmentation model by using the three building semantic segmentation results output by the steps S3, S4 and S5 and the building edge extraction probability graph by adopting a backward propagation and random gradient descent algorithm, further obtaining a trained multitask CNN + GCN semantic segmentation model, and inputting a remote sensing image of an actual building into the trained multitask CNN + GCN semantic segmentation model to obtain a more optimized actual building edge semantic segmentation result.

2. The method of claim 1, wherein the method comprises the following steps: in step S1, the specific steps of constructing the remote sensing image building sample set are as follows:

s13, sliding a window with a preset size and a preset step length on the remote sensing image, the building semantic label image and the building edge label image simultaneously, counting pixel points in the window, marking the proportion of the building by the number of the pixel points, and performing image cutting if the proportion is larger than a preset proportion value;

and S14, after the cutting work is finished, dividing the sample set into a training set and a testing set according to the proportion of 7: 3, wherein 70% of data is used for training, and 30% of data is used for testing the performance after training.

3. The method of claim 1, wherein the method comprises the following steps: the high-level semantic features of the building extracted by ResNet are realized by a series of convolution and pooling operations, and the remapping of the resolution to the original image is realized by a plurality of upsampling, transverse connection and convolution operations.

4. The method of claim 1, wherein the method comprises the following steps: in step S4, the building semantic segmentation based on edge constraint is obtained by building semantic tag supervised learning, where a CNN deep feature map is set as CF, a building edge extraction probability map is set as BP, and deep semantic features are integrated and constrained by edge information in a CF x (1+ BP) manner, and then a full convolution network is used to complete building semantic segmentation based on edge constraint.

5. The method of claim 1, wherein the method comprises the following steps: in step S2, a GCN is used to perform graph reasoning on the original image of the building through a conventional graph rolling operation, and the mathematical expression is as follows:

H^(l+1)＝σ(AH^(l)w^(l))

6. The method of claim 4, wherein the method comprises the following steps: based on the basic principle that any Laplace matrix can be diagonal, the adjacent matrix A is quickly constructed through convolution and matrix operation, and the mathematical expression of the adjacent matrix A is as follows:

A＝φ(BP)diag(ρ(BP))φ^T(BP)

7. the method of claim 1, wherein the method comprises the following steps: in step S5, the graph feature adaptive optimization module optimizes the CNN building semantic segmentation result by enhancing the spatial detail features, and the graph feature adaptive optimization module takes the building semantic segmentation result based on the edge constraint and the cascade feature of the GCN feature as input, generates an attention diagram with rich spatial local details by a conventional convolution operation, and optimizes the building semantic segmentation result based on the edge constraint by the GCN feature according to the attention diagram, where the mathematical expression is as follows:

Seg₃＝δ(Seg₂+6(Concat(GF，Seg₂))*GF)

8. The method of claim 1, wherein the method comprises the following steps: in step S6, in the training process, a Loss function is used to calculate a Loss value Loss of the multitask CNN + GCN semantic segmentation model, and the specific mathematical formula is:

wherein the content of the first and second substances,

wherein n represents a sample number, Seg_n，iRepresenting whether pixels in semantic segmentation result are buildingsPredicted probability of (seg)_n，1Representing the result of the primary building semantic segmentation, seg_n，2Seg for building semantic segmentation results based on edge constraints_n，3Represents the building semantic segmentation result based on GCN edge optimization,

9. The method of claim 8, wherein the method for optimizing the semantic segmentation edge of the remote sensing image building based on the multitask CNN + GCN comprises the following steps: and applying a gradient descent algorithm to the Loss function, and finishing training the multitask CNN + GCN semantic segmentation model when the Loss value Loss approaches convergence.

10. The method of claim 1, wherein the method comprises the following steps: in step S6, in the training process, the hyper-parameters including the positive and negative sample ratio, the learning rate, the batch size, and the weight attenuation coefficient need to be adjusted according to the test condition.