CN113436204A - High-resolution remote sensing image weak supervision building extraction method - Google Patents
High-resolution remote sensing image weak supervision building extraction method Download PDFInfo
- Publication number
- CN113436204A CN113436204A CN202110651041.7A CN202110651041A CN113436204A CN 113436204 A CN113436204 A CN 113436204A CN 202110651041 A CN202110651041 A CN 202110651041A CN 113436204 A CN113436204 A CN 113436204A
- Authority
- CN
- China
- Prior art keywords
- class
- pixel
- remote sensing
- building
- sensing image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/11—Region-based segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/194—Segmentation; Edge detection involving foreground-background segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10032—Satellite or aerial image; Remote sensing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Analysis (AREA)
Abstract
The invention provides a method for extracting a high-resolution remote sensing image weakly supervised building, which comprises the following steps: the method comprises the steps of class activation graph iterative optimization based on an iterative counterattack raising strategy, class activation graph inter-pixel relation mining, class boundary detection, building pseudo label generation, semantic segmentation network training and building region extraction. Compared with the existing image-level weak supervision semantic segmentation method, the method can generate a better building class activation map by using an iterative countermeasure ascent strategy, fully excavates class activation map information under the condition of not introducing additional supervision information, and obtains the class equivalence relation among pixels. In addition, the invention further improves the segmentation performance of the network on the boundary processing by using the gated convolution layer, so that the object region can be effectively expanded and covered in the boundary to generate a high-quality building pseudo label, and the semantic segmentation model can generate a building region with high accuracy and complete segmentation boundary.
Description
Technical Field
The invention relates to the field of remote sensing, in particular to a method for extracting a high-resolution remote sensing image weakly supervised building.
Background
With the development of satellite remote sensing and aerial photography technologies, people can obtain various high-resolution images more quickly and cheaply. Remote sensing images collected from aerial and satellite platforms can be widely used in various applications such as land use mapping, urban resource management and disaster monitoring. Image Analysis (GEOBIA) based on geographic objects is a main method for extracting buildings from high-resolution remote sensing images, but determining an optimal image segmentation scale has certain difficulty, and strong domain expert knowledge is often required when extracting features. Semantic segmentation of high-resolution remote sensing images aims to assign a geographic label to each pixel through an end-to-end mechanism, and has been widely used for many geographic applications such as cloud detection, building automatic extraction, land cover mapping, urban target positioning and the like under the remarkable promotion effect of a deep convolutional neural network.
Under the supervision of rich pixel level label data sets, the deep convolutional network can extract multi-level features under different receptive field visual angles by utilizing spatial context information in the images, and the performance of semantic segmentation of the remote sensing images is greatly promoted. However, acquiring a large number of pixel-level tags is time consuming, labor intensive, and costly. The semantic segmentation method based on weak supervision provides a new idea for overcoming the difficulty in labeling the remote sensing image.
The commonly used weak labeling information includes image-level labels, point labels, graffiti, bounding boxes, etc., wherein the image-level labels are widely used due to low labeling cost. At present, a two-stage training strategy is mainly adopted for training a segmentation model with image-level labels as weak label information. In the first stage, the classification network first generates pseudo masks by using image-level labels, and the pseudo masks are generally generated by using Class Activation Maps (CAMs). CAMs can notice the most distinctive object in the image, thereby roughly positioning the object region, but CAMs cannot display the whole object region with precise boundary, and the generated pseudo mask has low quality, so that the final segmentation result is not ideal. Therefore, fully exploiting the inter-pixel relationships in CAMs is of crucial importance for the subsequent stages. In the second stage of the segmentation network, the pseudo mask generated in the first stage is used for training the model, so that the segmentation performance is further improved. In the existing research work, part of the work uses a conditional random field or a super-pixel method to carry out post-processing on a segmentation result, but the boundary optimization effect is good and bad, and the reasoning process is complex. In addition, the current image segmentation method based on the deep convolutional neural network generally processes information such as color, shape and texture of a remote sensing image together, but different types of information also cause the problems of fuzzy building segmentation boundaries and the like.
Disclosure of Invention
In order to solve the technical problems that labels are difficult to obtain, class activation maps are not completely covered and segmentation boundaries are not complete in the traditional building extraction method, the invention provides a high-resolution remote sensing image weak supervision building extraction method which focuses on excavating class equivalence relations among pixels and optimizing the boundaries.
In order to achieve the purpose, the invention provides a high-resolution remote sensing image weak supervision building extraction method, which specifically comprises the following steps:
s1, inputting the high-resolution remote sensing image and the corresponding label graph into a classification network for training to obtain a trained classification network, and performing iterative confrontation lifting processing on the high-resolution remote sensing image based on the trained classification network to obtain a processed remote sensing image;
s2, inputting the processed remote sensing image into the trained classification network to obtain an iterated class activation map, and normalizing the iterated class activation map to generate an optimized class activation map;
s3, obtaining the equivalence relation among the pixels in the optimized class activation map, and dividing all the pixels in the optimized class activation map into a positive set P according to the equivalence relation among the pixels+And negative set P-;
S4, building a class boundary network based on the gated convolution layer, and outputting a first class boundary diagram through the class boundary network;
s5, calculating to obtain a first semantic correlation matrix according to the first class boundary graph and according to the positive set P+Negative set P-Calculating with the first semantic correlation matrix to obtain class boundary loss, and training the class boundary network according to the class boundary loss to obtain a trained class boundary network;
s6, outputting a second class boundary diagram through the trained class boundary network, calculating according to the second class boundary diagram to obtain a second semantic correlation matrix, calculating according to the second semantic correlation matrix to obtain a transition probability matrix, and performing random walk propagation on the optimized class activation diagram according to the transition probability matrix to obtain a propagated class activation diagram;
and S7, generating a building pseudo label according to the propagated class activation map, inputting the building pseudo label and the high-resolution remote sensing image into a semantic segmentation model for training, and extracting a building area after training.
Preferably, in step S1, the calculation formula for performing the iterative resistance-to-lift processing on the high-resolution remote sensing image is as follows:
wherein T (1. ltoreq. T. ltoreq.T) denotes the number of antagonistic iterations, xtRefers to a remote sensing image obtained after the t-th iteration countermeasure processing,finger trained classification network pair input remote sensing image xt-1Is assigned a class score of, ξ represents the course of countering the interferenceThe degree of the magnetic field is measured,representing the derivation.
Preferably, step S2 specifically includes:
s21, inputting the processed remote sensing image into the trained classification network, and calculating to obtain an iterated class activation map CAM (x)t) The calculation formula is as follows:
wherein T (1. ltoreq. T. ltoreq.T) denotes the number of antagonistic iterations, xtRefers to a remote sensing image obtained after the t-th iterative confrontation,represents the weight of the full connection layer corresponding to the building category, f (x)t) Is a feature map before the global average pooling layer;
s22, activating the class after iteration to obtain a CAM (x)t) Carrying out normalization to obtain an optimized class activation graph M; the specific normalization operation is as follows:
where T refers to the total number of final challenge iterations, max represents the maximum,representing the summation of all class activation maps from iteration 0 to iteration T.
Preferably, step S3 specifically includes:
s31, comparing the activation scores in the optimized class activation image with foreground and background threshold values to obtain a building confidence area and a background confidence area;
s32, selecting classification scores of each pixel point in the building confidence region and the background confidence regionThe category corresponding to the maximum value of the pixel point is obtained to obtain a pseudo category graph of all the pixel points
S33, matching the pseudo category graphLocal neighborhood sampling is carried out on the pixel point in the neighborhood P, category equivalence relation comparison is carried out on the pixel point in the neighborhood P and the neighborhood center pixel point, namely if the prediction category of the neighborhood center pixel is the same as the prediction category of a certain pixel point in the neighborhood, the category of the pixel pair is equivalent, otherwise, the pixel pair is not equivalent, and the category equivalence relation between the pixels is obtained according to the comparison result;
s34, dividing the pixel pairs with equivalent categories into positive sets and the pixel pairs with non-equivalent categories into negative sets, thereby dividing all the pixel pairs into positive sets P+And negative set P-The concrete division formula is as follows:
where γ represents the neighborhood radius to limit the maximum distance between pixel pairs, (i, j) represents the indices of pixel i and pixel j, | xi-xj||2Representative pixel xiAnd xjThe distance of (a) to (b),represents a pseudo-class diagram of the device,the pseudo class representing pixel i coincides with the pseudo class of pixel j,the pseudo class representing pixel i does not coincide with the pseudo class of pixel j.
Preferably, step S4 specifically includes:
s41, constructing a feature extraction flow: the characteristicsThe extraction flow has the same structure as ResNet50 network, and comprises 5 feature extraction layers, each convolution layer in the feature extraction layers is followed by regularization and ReLU activation function, and r is usedtFeature maps representing the output of the t-th feature extraction layer:
wherein the content of the first and second substances,representing the dimension of the feature diagram, C representing the number of channels, H and W representing the height and width of the feature diagram, and m representing the step length;
s42, constructing a gated convolutional layer: firstly, the characteristic graph r istApplication to generating an attention map with channel number 1The specific calculation formula is as follows:
αt=σ(C1×1(st||rt)) 6)
wherein s istAnd rtRespectively representing a feature graph output by the t-th gated convolution layer and a feature graph output by the t-th feature extraction layer, | | represents channel number splicing, and the spliced feature graphs pass through the convolution layer C with the convolution kernel size of 1 x 11×1And sigmoid activation function σ;
s43, outputting the attention drawing and the characteristic drawing S of the gated convolution layertPerforming dot product operation, and mixing the operation result with stAnd performing residual error connection to obtain the output of the gated convolutional layer, wherein the specific calculation process comprises the following steps:
wherein T represents a transpose operation,represents a gating operation, <' > representsThe dot-product is obtained by multiplying the points,representing the output of the gated convolutional layer, wtRepresenting channel-weighted, superscript or subscript (i, j) each representing an element of its corresponding feature map at row i and column j,will be the input for the next attention map generation again;
s44, constructing shape flow to obtain a first class boundary diagramThe input of the shape flow branch is the feature map r output in the first layer in the step S411Then, three of the gated convolutional layers are used for concatenation, the output of each gated convolutional layer is used as the input of the next gated convolutional layer, the three gated convolutional layers are connected to the third, fourth and last layers of ResNet50, and finally the first class boundary diagram is output
Preferably, step S5 specifically includes:
s51, calculating semantic relevance a between two pixels according to the first class boundary graphij(ii) a For a given set of pixels xiAnd xjSemantic relevance formula:
wherein, piijRepresentative pixel xiAnd xjSet of pixel points on a straight line, xkRepresents a certain pixel therein, aijThe larger, the pixel x is indicatediAnd xjHas high semantic relevance, the less the possibility of a boundary between the two, aijThe smaller, the pixel x is indicatediAnd xjHas low semantic relevance and has the possibility of a boundary between the twoThe greater the sex;
s52, using the inter-pixel class equivalence relation as the semantic relevance a of the learning pixelijIf the classes of the pixel pairs are equivalent, the class equivalent relation is expressed as 1, otherwise, the class equivalent relation is 0; then learning semantic relevance, class boundary loss by minimizing cross entropy loss values between binary semantic labels and semantic relevanceThe specific calculation formula is as follows:
wherein, | | represents the number of all the pixel points belonging to the current set, Σ represents summation, log aijRepresents a pair ofijThe logarithm is calculated, and the logarithm is calculated,andare all sets P+The subsets of (1) represent the pseudo categories of the pixels in the positive set as the sets of the foreground and the background respectively, the loss of the three subsets is combined after being regularized, and the total loss isAnd P-The contained pixel pairs are unbalanced;
and S53, using the class boundary loss for class boundary network back propagation, updating the weight, and training the class boundary network until convergence to obtain the trained class boundary network.
Preferably, step S6 specifically includes:
s61, predicting the trained class boundary network to generate a second class boundary graph, calculating semantic correlation to obtain a second semantic correlation matrix, and calculating a transition probability matrix through the second semantic correlation matrix;
wherein A represents a second semantic correlation matrixAoβRepresents the result of the Hadamard product operation of matrix a and parameter beta,represents the pair matrix AoβRegularization is carried out, S-1Representing the inverse operation on the matrix S, aijRepresenting a semantic correlation between pixel i and pixel j;
s62, carrying out random walk propagation on the optimized class activation graph through the transition probability matrix to obtain the propagated class activation graph, wherein the specific formula is as follows:
vec(M*)=Tt·vec(M⊙(1-B)) 11)
wherein, the row represents Hadamardproduct operation, the row (·) represents vectorization, t is iteration number, M represents the class activation graph in step S22, the fraction of boundary pixels is punished by multiplying by (1-B), and the boundary pixels do not need to be propagated to adjacent pixels, and finally the propagated class activation graph M is obtained*。
Preferably, step S7 specifically includes:
s71, thresholding the propagated class activation map, namely, if the activation score of a pixel in the propagated class activation map is larger than a preset foreground threshold, classifying the pixel into a building class, otherwise, classifying the pixel into a background class, and obtaining a building pseudo label;
and S72, inputting the building pseudo label and the high-resolution remote sensing image into a semantic segmentation model, finishing a training process and realizing building extraction.
Preferably, before step S1, the method further includes:
s01, acquiring a high-resolution remote sensing image and a corresponding label map;
and S02, cutting the high-resolution remote sensing image and the corresponding label graph to obtain a remote sensing image and a corresponding label graph with a certain size.
Preferably, the classification network uses ResNet50 as a backbone network, followed by a global averaging pooling layer and a full connectivity layer.
The invention has the beneficial effects that:
compared with the existing image-level weak supervision semantic segmentation method, the method can generate a better building class activation map by using an iterative countermeasure ascent strategy, fully excavates class activation map information under the condition of not introducing additional supervision information, and obtains the class equivalence relation among pixels. In addition, the invention further improves the segmentation performance of the network on the boundary processing by using the gated convolution layer, so that the object region can be effectively expanded and covered in the boundary to generate a high-quality building pseudo label, and the semantic segmentation model can generate a building region with high accuracy and complete segmentation boundary.
Drawings
The invention will be further described with reference to the accompanying drawings and examples, in which:
FIG. 1 is a flowchart of a high-resolution remote sensing image weak supervised building extraction method combining inter-pixel class equivalence relation and gating boundary optimization in the embodiment of the present invention;
FIG. 2 is a process diagram of iterative optimization of class activation graphs in an embodiment of the present invention;
FIG. 3 is a schematic diagram illustrating a class boundary detection process performed by a class boundary network according to an embodiment of the present invention;
FIG. 4 is a circuit diagram of a gated convolutional layer in an embodiment of the present invention;
FIG. 5 is a schematic diagram of a building pseudo tag generation process in an embodiment of the invention;
Detailed Description
For a more clear understanding of the technical features, objects and effects of the present invention, embodiments of the present invention will now be described in detail with reference to the accompanying drawings.
Referring to fig. 1, an embodiment of the present invention provides a method for extracting a high-resolution remote sensing image weakly supervised building by combining an inter-pixel class equivalence relation and gating boundary optimization, including the following steps:
s1, inputting the high-resolution remote sensing image and the corresponding label graph into a classification network for training to obtain a trained classification network, and carrying out iterative confrontation and ascending processing on the high-resolution remote sensing image based on the trained classification network to obtain a processed remote sensing image.
In this embodiment, S1 specifically includes:
and S11, cutting the large-amplitude remote sensing image data and the corresponding label image into tiles with certain sizes, using the tiles as training input, and cutting the large-amplitude remote sensing image data and the corresponding label image according to a certain superposition rate to obtain the cut remote sensing image and the corresponding label image.
S12, inputting the cut remote sensing image and the corresponding label graph into a classification network for training to obtain a trained classification network, carrying out iterative confrontation ascending processing on the cut remote sensing image based on the trained classification network to obtain a processed remote sensing image, and acquiring a remote sensing image x0The iterative countervailing rise of (c) can be performed using equation 1):
wherein T (1. ltoreq. T. ltoreq.T) denotes the number of antagonistic iterations, xtRefers to a remote sensing image obtained after the t-th iteration countermeasure processing,finger trained classification network pair input remote sensing image xt-1Is given as a category score, ξ represents the degree of countering the interference,representing the derivation.
And S2, inputting the processed remote sensing image into the trained classification network to obtain an iterated class activation map, and normalizing the iterated class activation map to generate an optimized class activation map.
In this embodiment, S2 specifically includes:
s21, taking the remote sensing image after the iterative confrontation ascending processing and the corresponding class label thereof as the input of a classification network, wherein the classification network adopts ResNet50 as a backbone network, is connected with a global average pooling layer (GAP) and a full connection layer, and finally obtains an iterative class activation map CAM (x) through calculation of a formula 2)t);
Wherein T (1. ltoreq. T. ltoreq.T) denotes the number of antagonistic iterations, xtRefers to a remote sensing image obtained after the t-th iterative confrontation,represents the weight of the full connection layer corresponding to the building category, f (x)t) Is a feature map before the global average pooling layer;
s22, activating the class after iteration to obtain a CAM (x)t) And (4) carrying out normalization to obtain a final optimized class activation map (refer to fig. 2). Specific optimization formula visible formula 3):
where T refers to the total number of final challenge iterations, max represents the maximum,representing the summation of all class activation maps from iteration 0 to iteration T.
S3, obtaining the equivalence relation of the pixels in the optimized class activation image, and dividing all the pixels in the optimized class activation image into a positive set P + and a negative set P according to the equivalence relation of the pixels.
In this embodiment, S3 specifically includes:
s31, comparing the activation scores in the optimized class activation map with foreground and background thresholds, and screening out three areas: foreground confidence area, background confidence area, irrelevant area. Regardless of the irrelevant areas, each confidence area is optimized by using a dense conditional random field to obtain a building confidence area and a background confidence area;
s32, selecting the category corresponding to the maximum value of the classification score of each pixel point in the building confidence region and the background confidence region to obtain a pseudo category map of all the pixel points
S33, matching the pseudo category graphLocal neighborhood sampling is carried out on the pixel point in the neighborhood P, category equivalence relation comparison is carried out on the pixel point in the neighborhood P and the neighborhood center pixel point, namely if the prediction category of the neighborhood center pixel is the same as the prediction category of a certain pixel point in the neighborhood, the category of the pixel pair is equivalent, otherwise, the pixel pair is not equivalent, and the category equivalence relation between the pixels is obtained according to the comparison result;
s34, dividing the pixel pairs with equivalent categories into positive sets and the pixel pairs with non-equivalent categories into negative sets, thereby dividing all the pixel pairs into positive sets P+And negative set P-The concrete division formula is as follows:
where γ represents the neighborhood radius to limit the maximum distance between pixel pairs, (i, j) represents the indices of pixel i and pixel j, | xi-xj||2Representative pixel xiAnd xjThe distance of (a) to (b),represents a pseudo-class diagram of the device,dummy representing pixel iThe class coincides with the pseudo-class of pixel j,the pseudo class representing pixel i does not coincide with the pseudo class of pixel j.
S4, building a class boundary network based on the gated convolution layer, and outputting a first class boundary diagram through the class boundary network.
Referring to fig. 3, in this embodiment, S4 specifically includes:
s41, constructing a feature extraction flow: the feature extraction flow has the same structure as the ResNet50 network and comprises 5 feature extraction layers, each convolution layer in the feature extraction layers is followed by regularization and a ReLU activation function, and r is usedtFeature maps representing the output of the t-th feature extraction layer:
wherein the content of the first and second substances,representing the dimension of the feature diagram, C representing the number of channels, H and W representing the height and width of the feature diagram, and m representing the step length;
s42, constructing gated convolutional layer (see fig. 4): firstly, the characteristic graph r istApplication to generating an attention map with channel number 1The specific calculation formula is as follows:
αt=σ(C1×1(st||rt)) 6)
wherein s istAnd rtRespectively representing a feature graph output by the t-th gated convolution layer and a feature graph output by the t-th feature extraction layer, | | represents channel number splicing, and the spliced feature graphs pass through the convolution layer C with the convolution kernel size of 1 x 11×1And sigmoid activation function σ;
s43, outputting the attention drawing and the characteristic drawing S of the gated convolution layertGo on pointMultiply the result and sum the result with stAnd performing residual error connection to obtain the output of the gated convolutional layer, wherein the specific calculation process comprises the following steps:
wherein T represents a transpose operation,representing a gate control operation, a representing a dot product,representing the output of the gated convolutional layer, wtRepresenting channel-weighted, superscript or subscript (i, j) each representing an element of its corresponding feature map at row i and column j,will be the input for the next attention map generation again;
s44, constructing shape flow to obtain a first class boundary diagramThe input of the shape flow branch is the feature map r output in the first layer in the step S411Then, three of the gated convolutional layers are used for concatenation, the output of each gated convolutional layer is used as the input of the next gated convolutional layer, the three gated convolutional layers are connected to the third, fourth and last layers of ResNet50, and finally the first class boundary diagram is output
S5, calculating to obtain a first semantic correlation matrix according to the first class boundary graph and according to the positive set P+Negative set P-And calculating the first semantic correlation matrix to obtain class boundary loss, and training the class boundary network according to the class boundary loss to obtain the trained class boundary network.
In this embodiment, S5 specifically includes:
s51, calculating semantic relevance a between two pixels according to the first class boundary graphij(ii) a For a given set of pixels xiAnd xjSemantic relevance formula:
wherein, IIijRepresentative pixel xiAnd xjSet of pixel points on a straight line, xkRepresents a certain pixel therein, aijThe larger, the pixel x is indicatediAnd xjHas high semantic relevance, the less the possibility of a boundary between the two, aijThe smaller, the pixel x is indicatediAnd xjThe higher the probability that a boundary exists between the two is;
s52, using the inter-pixel class equivalence relation as the semantic relevance a of the learning pixelijIf the classes of the pixel pairs are equivalent, the class equivalent relation is expressed as 1, otherwise, the class equivalent relation is 0; then learning semantic relevance, class boundary loss by minimizing cross entropy loss values between binary semantic labels and semantic relevanceThe specific calculation formula is as follows:
wherein, | | represents the number of all the pixel points belonging to the current set, Σ represents summation, log aijRepresents a pair ofijThe logarithm is calculated, and the logarithm is calculated,andare all sets P+Respectively generation ofThe pseudo categories of the pixels in the positive set are represented as a set of a foreground and a background, the loss of the three subsets is combined after being regularized, and the total loss isAnd P-The contained pixel pairs are unbalanced;
and S53, using the class boundary loss for class boundary network back propagation, updating the weight, and training the class boundary network until convergence to obtain the trained class boundary network.
S6, outputting a second class boundary diagram through the trained class boundary network, calculating according to the second class boundary diagram to obtain a second semantic correlation matrix, calculating according to the second semantic correlation matrix to obtain a transition probability matrix, and performing random walk propagation on the optimized class activation diagram according to the transition probability matrix to obtain a propagated class activation diagram.
In this embodiment, S6 specifically includes:
s61, predicting the trained class boundary network to generate a second class boundary graph, calculating semantic correlation to obtain a second semantic correlation matrix, and calculating a transition probability matrix through the second semantic correlation matrix;
wherein A represents a second semantic correlation matrixAoβRepresents the result of the Hadamard product operation of matrix a and parameter beta,represents the pair matrix AoβRegularization is carried out, S-1Representing the inverse operation on the matrix S, aijRepresenting a semantic correlation between pixel i and pixel j;
s62, carrying out random walk propagation on the optimized class activation graph through the transition probability matrix to obtain the propagated class activation graph, wherein the specific formula is as follows:
vec(M*)=Tt·vec(M⊙(1-B)) 11)
wherein, the row represents Hadamardproduct operation, the row (·) represents vectorization, t is iteration number, M represents the class activation graph in step S22, the fraction of boundary pixels is punished by multiplying by (1-B), and the boundary pixels do not need to be propagated to adjacent pixels, and finally the propagated class activation graph M is obtained*。
And S7, generating a building pseudo label (refer to fig. 5) according to the propagated class activation map, inputting the building pseudo label and the high-resolution remote sensing image into a semantic segmentation model for training, and automatically extracting a building area after training.
In this embodiment, S7 specifically includes:
s71, thresholding the propagated class activation map, namely, if the activation score of a pixel in the propagated class activation map is larger than a preset foreground threshold, classifying the pixel into a building class, otherwise, classifying the pixel into a background class, and obtaining a building pseudo label;
and S72, inputting the building pseudo label and the high-resolution remote sensing image into a semantic segmentation model, completing a training process, and realizing automatic extraction of the building area.
According to the method, a large number of experiments are carried out on two competition data sets Vaihingen and Potsdam provided by ISPRS in the remote sensing field, and the feasibility and the effectiveness of the technical scheme are effectively proved. Table 1 compares the results of the experiments on the data sets Vaihingen and Potsdam of the method of the invention with other methods:
TABLE 1 comparison of the results of the experiments with other methods on the data sets Vaihingen and Potsdam
The experimental results in table 1 show that the building segmentation indexes mlou on the data sets Vaihingen and Potsdam respectively reach 84.1% and 83.9% by using the method provided by the invention, and compared with the precision of the existing weak supervision building extraction method on the data sets, the best segmentation performance at present is achieved. In addition, compared with the precision of the fully supervised building extraction method, the precision of the method on the two data sets respectively reaches 91.9 percent and 91.7 percent of the fully supervised performance.
While the present invention has been described with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, which are illustrative and not restrictive, and it will be apparent to those skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope of the invention as defined in the appended claims.
Claims (10)
1. A high-resolution remote sensing image weak supervision building extraction method is characterized by comprising the following steps:
s1, inputting the high-resolution remote sensing image and the corresponding label graph into a classification network for training to obtain a trained classification network, and performing iterative confrontation lifting processing on the high-resolution remote sensing image based on the trained classification network to obtain a processed remote sensing image;
s2, inputting the processed remote sensing image into the trained classification network to obtain an iterated class activation map, and normalizing the iterated class activation map to generate an optimized class activation map;
s3, obtaining the equivalence relation among the pixels in the optimized class activation map, and dividing all the pixels in the optimized class activation map into a positive set P according to the equivalence relation among the pixels+And negative set P-;
S4, building a class boundary network based on the gated convolution layer, and outputting a first class boundary diagram through the class boundary network;
s5, calculating to obtain a first semantic correlation matrix according to the first class boundary graph and according to the positive set P+Negative set P-And the first semanticCalculating a correlation matrix to obtain class boundary loss, and training the class boundary network according to the class boundary loss to obtain a trained class boundary network;
s6, outputting a second class boundary diagram through the trained class boundary network, calculating according to the second class boundary diagram to obtain a second semantic correlation matrix, calculating according to the second semantic correlation matrix to obtain a transition probability matrix, and performing random walk propagation on the optimized class activation diagram according to the transition probability matrix to obtain a propagated class activation diagram;
and S7, generating a building pseudo label according to the propagated class activation map, inputting the building pseudo label and the high-resolution remote sensing image into a semantic segmentation model for training, and extracting a building area after training.
2. The method for extracting the high-resolution remote sensing image weakly supervised building as recited in claim 1, wherein in step S1, the calculation formula for performing the iterative resistance to lift process on the high-resolution remote sensing image is as follows:
wherein T (1. ltoreq. T. ltoreq.T) denotes the number of antagonistic iterations, xtRefers to a remote sensing image obtained after the t-th iteration countermeasure processing,finger trained classification network pair input remote sensing image xt-1Is given as a category score, ξ represents the degree of countering the interference,representing the derivation.
3. The method for extracting the high-resolution remote sensing image weakly supervised building as recited in claim 1, wherein the step S2 specifically comprises:
s21, inputting the processed remote sensing image into the trained classification network, and calculating to obtain an iterated class activation map CAM (x)t) The calculation formula is as follows:
wherein T (1. ltoreq. T. ltoreq.T) denotes the number of antagonistic iterations, xtRefers to a remote sensing image obtained after the t-th iterative confrontation,represents the weight of the full connection layer corresponding to the building category, f (x)t) Is a feature map before the global average pooling layer;
s22, activating the class after iteration to obtain a CAM (x)t) Carrying out normalization to obtain an optimized class activation graph M; the specific normalization operation is as follows:
4. The method for extracting the high-resolution remote sensing image weakly supervised building as recited in claim 1, wherein the step S3 specifically comprises:
s31, comparing the activation scores in the optimized class activation image with foreground and background threshold values to obtain a building confidence area and a background confidence area;
s32, selecting the category corresponding to the maximum value of the classification score of each pixel point in the building confidence region and the background confidence region to obtain a pseudo category map of all the pixel points
S33, matching the pseudo category graphLocal neighborhood sampling is carried out on the pixel point in the neighborhood P, category equivalence relation comparison is carried out on the pixel point in the neighborhood P and the neighborhood center pixel point, namely if the prediction category of the neighborhood center pixel is the same as the prediction category of a certain pixel point in the neighborhood, the category of the pixel pair is equivalent, otherwise, the pixel pair is not equivalent, and the category equivalence relation between the pixels is obtained according to the comparison result;
s34, dividing the pixel pairs with equivalent categories into positive sets and the pixel pairs with non-equivalent categories into negative sets, thereby dividing all the pixel pairs P into the positive sets P+And negative set P-The concrete division formula is as follows:
where γ represents the neighborhood radius to limit the maximum distance between pixel pairs, (i, j) represents the indices of pixel i and pixel j, | xi-xj||2Representative pixel xiAnd xjThe distance of (a) to (b),represents a pseudo-class diagram of the device,the pseudo class representing pixel i coincides with the pseudo class of pixel j,the pseudo class representing pixel i does not coincide with the pseudo class of pixel j.
5. The method for extracting the high-resolution remote sensing image weakly supervised building as recited in claim 1, wherein the step S4 specifically comprises:
s41, constructing a feature extraction flow: the feature extraction flow has the same structure as the ResNet50 network and comprises 5 feature extraction layers, each convolution layer in the feature extraction layers is followed by regularization and a ReLU activation function, and r is usedtFeature maps representing the output of the t-th feature extraction layer:
wherein the content of the first and second substances,representing the dimension of the feature diagram, C representing the number of channels, H and W representing the height and width of the feature diagram, and m representing the step length;
s42, constructing a gated convolutional layer: firstly, the characteristic graph r istApplication to generating an attention map with channel number 1The specific calculation formula is as follows:
αt=σ(C1×1(st||rt)),
wherein s istAnd rtRespectively representing a feature graph output by the t-th gated convolution layer and a feature graph output by the t-th feature extraction layer, | | represents channel number splicing, and the spliced feature graphs pass through the convolution layer C with the convolution kernel size of 1 x 11×1And sigmoid activation function σ;
s43, the attention drawing and the gated volumeFeature map s of the laminated outputtPerforming dot product operation, and mixing the operation result with stAnd performing residual error connection to obtain the output of the gated convolutional layer, wherein the specific calculation process comprises the following steps:
wherein T represents a transpose operation,representing a gate control operation, a representing a dot product,representing the output of the gated convolutional layer, wtRepresenting channel-weighted, superscript or subscript (i, j) each representing an element of its corresponding feature map at row i and column j,will be the input for the next attention map generation again;
s44, constructing shape flow to obtain a first class boundary diagramOutputting the feature map r of the first layer in the step S411As input to the shape flow branch, three of said gated convolutional layers are then used in a concatenation, the output of each gated convolutional layer will be input to the next gated convolutional layer, the three gated convolutional layers are connected to the third, fourth and last layers of ResNet50, and finally the first class boundary map is output
6. The method for extracting the high-resolution remote sensing image weakly supervised building as recited in claim 1, wherein the step S5 specifically comprises:
s51, calculating semantic relevance a between two pixels according to the first class boundary graphij(ii) a For a given set of pixels xiAnd xjSemantic relevance formula:
therein, IIijRepresentative pixel xiAnd xjSet of pixel points on a straight line, xkRepresents a certain pixel therein, aijThe larger, the pixel x is indicatediAnd xjHas high semantic relevance, the less the possibility of a boundary between the two, aijThe smaller, the pixel x is indicatediAnd xjThe higher the probability that a boundary exists between the two is;
s52, using the inter-pixel class equivalence relation as the semantic relevance a of the learning pixelijIf the classes of the pixel pairs are equivalent, the class equivalent relation is expressed as 1, otherwise, the class equivalent relation is 0; then learning semantic relevance, class boundary loss by minimizing cross entropy loss values between binary semantic labels and semantic relevanceThe specific calculation formula is as follows:
wherein, | | represents the number of all the pixels belonging to the current set, Σ represents summation, logaijRepresents a pair ofijThe logarithm is calculated, and the logarithm is calculated,andare all subsets of the set P +, and respectively generateThe pseudo categories of the pixels in the positive set are represented as a set of a foreground and a background, the loss of the three subsets is combined after being regularized, and the total loss isAnd P-The contained pixel pairs are unbalanced;
and S53, using the class boundary loss for class boundary network back propagation, updating the weight, and training the class boundary network until convergence to obtain the trained class boundary network.
7. The method for extracting the high-resolution remote sensing image weakly supervised building as recited in claim 1, wherein the step S6 specifically comprises:
s61, predicting the trained class boundary network to generate a second class boundary graph, calculating semantic correlation to obtain a second semantic correlation matrix, and calculating a transition probability matrix through the second semantic correlation matrix;
wherein A represents a second semantic correlation matrixAoβRepresents the result of the Hadamard product operation of matrix a and parameter beta,represents the pair matrix AoβRegularization is carried out, S-1Representing the inverse operation on the matrix S, aijRepresenting a semantic correlation between pixel i and pixel j;
s62, carrying out random walk propagation on the optimized class activation graph through the transition probability matrix to obtain the propagated class activation graph, wherein the specific formula is as follows:
υec(M*)=Tt·vec(M⊙(1-B)),
wherein, the row represents Hadamard product operation, the vec (·) represents vectorization, t represents iteration times, B represents class boundary diagram, and M represents optimized class activation diagram, penalizing fraction of boundary pixel by multiplying by (1-B), not need to be propagated to adjacent pixel, and finally obtaining propagated class activation diagram M*。
8. The method for extracting the high-resolution remote sensing image weakly supervised building as recited in claim 1, wherein the step S7 specifically comprises:
s71, thresholding the propagated class activation map, namely, if the activation score of a pixel in the propagated class activation map is larger than a preset foreground threshold, classifying the pixel into a building class, otherwise, classifying the pixel into a background class, and obtaining a building pseudo label;
and S72, inputting the building pseudo label and the high-resolution remote sensing image into a semantic segmentation model, finishing a training process and realizing building extraction.
9. The method for extracting the high-resolution remote sensing image weakly supervised building as recited in claim 1, further comprising, before step S1:
s01, acquiring a high-resolution remote sensing image and a corresponding label map;
and S02, cutting the high-resolution remote sensing image and the corresponding label graph to obtain a remote sensing image and a corresponding label graph with a certain size.
10. The method for extracting the high-resolution remote sensing image weakly supervised building as recited in claim 1, wherein the classification network adopts ResNet50 as a backbone network, and is followed by a global average pooling layer and a full connection layer.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110651041.7A CN113436204A (en) | 2021-06-10 | 2021-06-10 | High-resolution remote sensing image weak supervision building extraction method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110651041.7A CN113436204A (en) | 2021-06-10 | 2021-06-10 | High-resolution remote sensing image weak supervision building extraction method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113436204A true CN113436204A (en) | 2021-09-24 |
Family
ID=77755617
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110651041.7A Pending CN113436204A (en) | 2021-06-10 | 2021-06-10 | High-resolution remote sensing image weak supervision building extraction method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113436204A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114677515A (en) * | 2022-04-25 | 2022-06-28 | 电子科技大学 | Weak supervision semantic segmentation method based on inter-class similarity |
CN114820655A (en) * | 2022-04-26 | 2022-07-29 | 中国地质大学(武汉) | Weak supervision building segmentation method taking reliable area as attention mechanism supervision |
CN115082778A (en) * | 2022-04-28 | 2022-09-20 | 中国农业科学院农业信息研究所 | Multi-branch learning-based homestead identification method and system |
CN115170569A (en) * | 2022-09-07 | 2022-10-11 | 新乡学院 | Failure detection method of high-entropy material coating cutter based on image |
CN116052019A (en) * | 2023-03-31 | 2023-05-02 | 深圳市规划和自然资源数据管理中心(深圳市空间地理信息中心) | High-quality detection method suitable for built-up area of large-area high-resolution satellite image |
CN117079103A (en) * | 2023-10-16 | 2023-11-17 | 暨南大学 | Pseudo tag generation method and system for neural network training |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112052783A (en) * | 2020-09-02 | 2020-12-08 | 中南大学 | High-resolution image weak supervision building extraction method combining pixel semantic association and boundary attention |
CN112183360A (en) * | 2020-09-29 | 2021-01-05 | 上海交通大学 | Lightweight semantic segmentation method for high-resolution remote sensing image |
CN112668579A (en) * | 2020-12-24 | 2021-04-16 | 西安电子科技大学 | Weak supervision semantic segmentation method based on self-adaptive affinity and class distribution |
CN112767407A (en) * | 2021-02-02 | 2021-05-07 | 南京信息工程大学 | CT image kidney tumor segmentation method based on cascade gating 3DUnet model |
-
2021
- 2021-06-10 CN CN202110651041.7A patent/CN113436204A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112052783A (en) * | 2020-09-02 | 2020-12-08 | 中南大学 | High-resolution image weak supervision building extraction method combining pixel semantic association and boundary attention |
CN112183360A (en) * | 2020-09-29 | 2021-01-05 | 上海交通大学 | Lightweight semantic segmentation method for high-resolution remote sensing image |
CN112668579A (en) * | 2020-12-24 | 2021-04-16 | 西安电子科技大学 | Weak supervision semantic segmentation method based on self-adaptive affinity and class distribution |
CN112767407A (en) * | 2021-02-02 | 2021-05-07 | 南京信息工程大学 | CT image kidney tumor segmentation method based on cascade gating 3DUnet model |
Non-Patent Citations (2)
Title |
---|
ALEXEY KURAKIN ET AL.: "ADVERSARIAL MACHINE LEARNING AT SCALE", 《ARXIV.ORG》 * |
ANURAG ARNAB ET AL.: "On the Robustness of Semantic Segmentation Models to Adversarial Attacks", 《IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE》 * |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114677515A (en) * | 2022-04-25 | 2022-06-28 | 电子科技大学 | Weak supervision semantic segmentation method based on inter-class similarity |
CN114677515B (en) * | 2022-04-25 | 2023-05-26 | 电子科技大学 | Weak supervision semantic segmentation method based on similarity between classes |
CN114820655A (en) * | 2022-04-26 | 2022-07-29 | 中国地质大学(武汉) | Weak supervision building segmentation method taking reliable area as attention mechanism supervision |
CN114820655B (en) * | 2022-04-26 | 2024-04-19 | 中国地质大学(武汉) | Weak supervision building segmentation method taking reliable area as attention mechanism supervision |
CN115082778A (en) * | 2022-04-28 | 2022-09-20 | 中国农业科学院农业信息研究所 | Multi-branch learning-based homestead identification method and system |
CN115170569A (en) * | 2022-09-07 | 2022-10-11 | 新乡学院 | Failure detection method of high-entropy material coating cutter based on image |
CN115170569B (en) * | 2022-09-07 | 2022-12-02 | 新乡学院 | Failure detection method for high-entropy material coating cutter based on image |
CN116052019A (en) * | 2023-03-31 | 2023-05-02 | 深圳市规划和自然资源数据管理中心(深圳市空间地理信息中心) | High-quality detection method suitable for built-up area of large-area high-resolution satellite image |
CN116052019B (en) * | 2023-03-31 | 2023-07-25 | 深圳市规划和自然资源数据管理中心(深圳市空间地理信息中心) | High-quality detection method suitable for built-up area of large-area high-resolution satellite image |
CN117079103A (en) * | 2023-10-16 | 2023-11-17 | 暨南大学 | Pseudo tag generation method and system for neural network training |
CN117079103B (en) * | 2023-10-16 | 2024-01-02 | 暨南大学 | Pseudo tag generation method and system for neural network training |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113436204A (en) | High-resolution remote sensing image weak supervision building extraction method | |
CN109977918B (en) | Target detection positioning optimization method based on unsupervised domain adaptation | |
CN109919108B (en) | Remote sensing image rapid target detection method based on deep hash auxiliary network | |
CN112750140B (en) | Information mining-based disguised target image segmentation method | |
US10262214B1 (en) | Learning method, learning device for detecting lane by using CNN and testing method, testing device using the same | |
CN112183414A (en) | Weak supervision remote sensing target detection method based on mixed hole convolution | |
CN107368845A (en) | A kind of Faster R CNN object detection methods based on optimization candidate region | |
CN113255915B (en) | Knowledge distillation method, device, equipment and medium based on structured instance graph | |
JP7252120B2 (en) | A learning method and device for extracting features from input images in a plurality of blocks in a CNN so that hardware optimization that satisfies the core figure of merit is performed, and a test method and device using the same | |
CN113610146B (en) | Method for realizing image classification based on knowledge distillation with enhanced intermediate layer feature extraction | |
CN113033520A (en) | Tree nematode disease wood identification method and system based on deep learning | |
CN113222068B (en) | Remote sensing image multi-label classification method based on adjacency matrix guidance label embedding | |
CN114283345A (en) | Small sample city remote sensing image information extraction method based on meta-learning and attention | |
Ali et al. | Destruction from sky: Weakly supervised approach for destruction detection in satellite imagery | |
CN114494821A (en) | Remote sensing image cloud detection method based on feature multi-scale perception and self-adaptive aggregation | |
US20240029402A1 (en) | Quick and intelligent ir7-ec network based classification method for concrete image crack type | |
CN110598746A (en) | Adaptive scene classification method based on ODE solver | |
Zhang et al. | Transland: An adversarial transfer learning approach for migratable urban land usage classification using remote sensing | |
CN116524189A (en) | High-resolution remote sensing image semantic segmentation method based on coding and decoding indexing edge characterization | |
CN116206112A (en) | Remote sensing image semantic segmentation method based on multi-scale feature fusion and SAM | |
CN109409224B (en) | Method for detecting flame in natural scene | |
CN111275076A (en) | Image significance detection method based on feature selection and feature fusion | |
CN115546668A (en) | Marine organism detection method and device and unmanned aerial vehicle | |
CN103824299A (en) | Target tracking method based on significance | |
Yuan et al. | Graph neural network based multi-feature fusion for building change detection |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210924 |