CN113436204A

CN113436204A - High-resolution remote sensing image weak supervision building extraction method

Info

Publication number: CN113436204A
Application number: CN202110651041.7A
Authority: CN
Inventors: 郑道远; 方芳; 刘袁缘; 李圣文; 曾林芸; 张嘉辉
Original assignee: China University of Geosciences
Current assignee: China University of Geosciences
Priority date: 2021-06-10
Filing date: 2021-06-10
Publication date: 2021-09-24

Abstract

The invention provides a method for extracting a high-resolution remote sensing image weakly supervised building, which comprises the following steps: the method comprises the steps of class activation graph iterative optimization based on an iterative counterattack raising strategy, class activation graph inter-pixel relation mining, class boundary detection, building pseudo label generation, semantic segmentation network training and building region extraction. Compared with the existing image-level weak supervision semantic segmentation method, the method can generate a better building class activation map by using an iterative countermeasure ascent strategy, fully excavates class activation map information under the condition of not introducing additional supervision information, and obtains the class equivalence relation among pixels. In addition, the invention further improves the segmentation performance of the network on the boundary processing by using the gated convolution layer, so that the object region can be effectively expanded and covered in the boundary to generate a high-quality building pseudo label, and the semantic segmentation model can generate a building region with high accuracy and complete segmentation boundary.

Description

High-resolution remote sensing image weak supervision building extraction method

Technical Field

The invention relates to the field of remote sensing, in particular to a method for extracting a high-resolution remote sensing image weakly supervised building.

Background

With the development of satellite remote sensing and aerial photography technologies, people can obtain various high-resolution images more quickly and cheaply. Remote sensing images collected from aerial and satellite platforms can be widely used in various applications such as land use mapping, urban resource management and disaster monitoring. Image Analysis (GEOBIA) based on geographic objects is a main method for extracting buildings from high-resolution remote sensing images, but determining an optimal image segmentation scale has certain difficulty, and strong domain expert knowledge is often required when extracting features. Semantic segmentation of high-resolution remote sensing images aims to assign a geographic label to each pixel through an end-to-end mechanism, and has been widely used for many geographic applications such as cloud detection, building automatic extraction, land cover mapping, urban target positioning and the like under the remarkable promotion effect of a deep convolutional neural network.

Under the supervision of rich pixel level label data sets, the deep convolutional network can extract multi-level features under different receptive field visual angles by utilizing spatial context information in the images, and the performance of semantic segmentation of the remote sensing images is greatly promoted. However, acquiring a large number of pixel-level tags is time consuming, labor intensive, and costly. The semantic segmentation method based on weak supervision provides a new idea for overcoming the difficulty in labeling the remote sensing image.

The commonly used weak labeling information includes image-level labels, point labels, graffiti, bounding boxes, etc., wherein the image-level labels are widely used due to low labeling cost. At present, a two-stage training strategy is mainly adopted for training a segmentation model with image-level labels as weak label information. In the first stage, the classification network first generates pseudo masks by using image-level labels, and the pseudo masks are generally generated by using Class Activation Maps (CAMs). CAMs can notice the most distinctive object in the image, thereby roughly positioning the object region, but CAMs cannot display the whole object region with precise boundary, and the generated pseudo mask has low quality, so that the final segmentation result is not ideal. Therefore, fully exploiting the inter-pixel relationships in CAMs is of crucial importance for the subsequent stages. In the second stage of the segmentation network, the pseudo mask generated in the first stage is used for training the model, so that the segmentation performance is further improved. In the existing research work, part of the work uses a conditional random field or a super-pixel method to carry out post-processing on a segmentation result, but the boundary optimization effect is good and bad, and the reasoning process is complex. In addition, the current image segmentation method based on the deep convolutional neural network generally processes information such as color, shape and texture of a remote sensing image together, but different types of information also cause the problems of fuzzy building segmentation boundaries and the like.

Disclosure of Invention

In order to solve the technical problems that labels are difficult to obtain, class activation maps are not completely covered and segmentation boundaries are not complete in the traditional building extraction method, the invention provides a high-resolution remote sensing image weak supervision building extraction method which focuses on excavating class equivalence relations among pixels and optimizing the boundaries.

In order to achieve the purpose, the invention provides a high-resolution remote sensing image weak supervision building extraction method, which specifically comprises the following steps:

s1, inputting the high-resolution remote sensing image and the corresponding label graph into a classification network for training to obtain a trained classification network, and performing iterative confrontation lifting processing on the high-resolution remote sensing image based on the trained classification network to obtain a processed remote sensing image;

s2, inputting the processed remote sensing image into the trained classification network to obtain an iterated class activation map, and normalizing the iterated class activation map to generate an optimized class activation map;

s3, obtaining the equivalence relation among the pixels in the optimized class activation map, and dividing all the pixels in the optimized class activation map into a positive set P according to the equivalence relation among the pixels⁺And negative set P^-；

S4, building a class boundary network based on the gated convolution layer, and outputting a first class boundary diagram through the class boundary network;

s5, calculating to obtain a first semantic correlation matrix according to the first class boundary graph and according to the positive set P⁺Negative set P^-Calculating with the first semantic correlation matrix to obtain class boundary loss, and training the class boundary network according to the class boundary loss to obtain a trained class boundary network;

s6, outputting a second class boundary diagram through the trained class boundary network, calculating according to the second class boundary diagram to obtain a second semantic correlation matrix, calculating according to the second semantic correlation matrix to obtain a transition probability matrix, and performing random walk propagation on the optimized class activation diagram according to the transition probability matrix to obtain a propagated class activation diagram;

and S7, generating a building pseudo label according to the propagated class activation map, inputting the building pseudo label and the high-resolution remote sensing image into a semantic segmentation model for training, and extracting a building area after training.

Preferably, in step S1, the calculation formula for performing the iterative resistance-to-lift processing on the high-resolution remote sensing image is as follows:

wherein T (1. ltoreq. T. ltoreq.T) denotes the number of antagonistic iterations, x^tRefers to a remote sensing image obtained after the t-th iteration countermeasure processing,

finger trained classification network pair input remote sensing image x^t-1Is assigned a class score of, ξ represents the course of countering the interferenceThe degree of the magnetic field is measured,

representing the derivation.

Preferably, step S2 specifically includes:

s21, inputting the processed remote sensing image into the trained classification network, and calculating to obtain an iterated class activation map CAM (x)^t) The calculation formula is as follows:

wherein T (1. ltoreq. T. ltoreq.T) denotes the number of antagonistic iterations, x^tRefers to a remote sensing image obtained after the t-th iterative confrontation,

represents the weight of the full connection layer corresponding to the building category, f (x)^t) Is a feature map before the global average pooling layer;

s22, activating the class after iteration to obtain a CAM (x)^t) Carrying out normalization to obtain an optimized class activation graph M; the specific normalization operation is as follows:

where T refers to the total number of final challenge iterations, max represents the maximum,

representing the summation of all class activation maps from iteration 0 to iteration T.

Preferably, step S3 specifically includes:

s31, comparing the activation scores in the optimized class activation image with foreground and background threshold values to obtain a building confidence area and a background confidence area;

s32, selecting classification scores of each pixel point in the building confidence region and the background confidence regionThe category corresponding to the maximum value of the pixel point is obtained to obtain a pseudo category graph of all the pixel points

S33, matching the pseudo category graph

Local neighborhood sampling is carried out on the pixel point in the neighborhood P, category equivalence relation comparison is carried out on the pixel point in the neighborhood P and the neighborhood center pixel point, namely if the prediction category of the neighborhood center pixel is the same as the prediction category of a certain pixel point in the neighborhood, the category of the pixel pair is equivalent, otherwise, the pixel pair is not equivalent, and the category equivalence relation between the pixels is obtained according to the comparison result;

s34, dividing the pixel pairs with equivalent categories into positive sets and the pixel pairs with non-equivalent categories into negative sets, thereby dividing all the pixel pairs into positive sets P⁺And negative set P^-The concrete division formula is as follows:

where γ represents the neighborhood radius to limit the maximum distance between pixel pairs, (i, j) represents the indices of pixel i and pixel j, | x_i-x_j||₂Representative pixel x_iAnd x_jThe distance of (a) to (b),

represents a pseudo-class diagram of the device,

the pseudo class representing pixel i coincides with the pseudo class of pixel j,

the pseudo class representing pixel i does not coincide with the pseudo class of pixel j.

Preferably, step S4 specifically includes:

s41, constructing a feature extraction flow: the characteristicsThe extraction flow has the same structure as ResNet50 network, and comprises 5 feature extraction layers, each convolution layer in the feature extraction layers is followed by regularization and ReLU activation function, and r is used_tFeature maps representing the output of the t-th feature extraction layer:

wherein the content of the first and second substances,

representing the dimension of the feature diagram, C representing the number of channels, H and W representing the height and width of the feature diagram, and m representing the step length;

s42, constructing a gated convolutional layer: firstly, the characteristic graph r is_tApplication to generating an attention map with channel number 1

The specific calculation formula is as follows:

α_t＝σ(C_1×1(s_t||r_t)) 6)

wherein s is_tAnd r_tRespectively representing a feature graph output by the t-th gated convolution layer and a feature graph output by the t-th feature extraction layer, | | represents channel number splicing, and the spliced feature graphs pass through the convolution layer C with the convolution kernel size of 1 x 1_1×1And sigmoid activation function σ;

s43, outputting the attention drawing and the characteristic drawing S of the gated convolution layer_tPerforming dot product operation, and mixing the operation result with s_tAnd performing residual error connection to obtain the output of the gated convolutional layer, wherein the specific calculation process comprises the following steps:

wherein T represents a transpose operation,

represents a gating operation, <' > representsThe dot-product is obtained by multiplying the points,

representing the output of the gated convolutional layer, w_tRepresenting channel-weighted, superscript or subscript (i, j) each representing an element of its corresponding feature map at row i and column j,

will be the input for the next attention map generation again;

s44, constructing shape flow to obtain a first class boundary diagram

The input of the shape flow branch is the feature map r output in the first layer in the step S41₁Then, three of the gated convolutional layers are used for concatenation, the output of each gated convolutional layer is used as the input of the next gated convolutional layer, the three gated convolutional layers are connected to the third, fourth and last layers of ResNet50, and finally the first class boundary diagram is output

Preferably, step S5 specifically includes:

s51, calculating semantic relevance a between two pixels according to the first class boundary graph_ij(ii) a For a given set of pixels x_iAnd x_jSemantic relevance formula:

wherein, pi_ijRepresentative pixel x_iAnd x_jSet of pixel points on a straight line, x_kRepresents a certain pixel therein, a_ijThe larger, the pixel x is indicated_iAnd x_jHas high semantic relevance, the less the possibility of a boundary between the two, a_ijThe smaller, the pixel x is indicated_iAnd x_jHas low semantic relevance and has the possibility of a boundary between the twoThe greater the sex;

s52, using the inter-pixel class equivalence relation as the semantic relevance a of the learning pixel_ijIf the classes of the pixel pairs are equivalent, the class equivalent relation is expressed as 1, otherwise, the class equivalent relation is 0; then learning semantic relevance, class boundary loss by minimizing cross entropy loss values between binary semantic labels and semantic relevance

The specific calculation formula is as follows:

wherein, | | represents the number of all the pixel points belonging to the current set, Σ represents summation, log a_ijRepresents a pair of_ijThe logarithm is calculated, and the logarithm is calculated,

and

are all sets P⁺The subsets of (1) represent the pseudo categories of the pixels in the positive set as the sets of the foreground and the background respectively, the loss of the three subsets is combined after being regularized, and the total loss is

And P^-The contained pixel pairs are unbalanced;

and S53, using the class boundary loss for class boundary network back propagation, updating the weight, and training the class boundary network until convergence to obtain the trained class boundary network.

Preferably, step S6 specifically includes:

s61, predicting the trained class boundary network to generate a second class boundary graph, calculating semantic correlation to obtain a second semantic correlation matrix, and calculating a transition probability matrix through the second semantic correlation matrix;

wherein A represents a second semantic correlation matrix

A^oβRepresents the result of the Hadamard product operation of matrix a and parameter beta,

represents the pair matrix A^oβRegularization is carried out, S^-1Representing the inverse operation on the matrix S, a_ijRepresenting a semantic correlation between pixel i and pixel j;

s62, carrying out random walk propagation on the optimized class activation graph through the transition probability matrix to obtain the propagated class activation graph, wherein the specific formula is as follows:

vec(M^*)＝T^t·vec(M⊙(1-B)) 11)

wherein, the row represents Hadamardproduct operation, the row (·) represents vectorization, t is iteration number, M represents the class activation graph in step S22, the fraction of boundary pixels is punished by multiplying by (1-B), and the boundary pixels do not need to be propagated to adjacent pixels, and finally the propagated class activation graph M is obtained^*。

Preferably, step S7 specifically includes:

s71, thresholding the propagated class activation map, namely, if the activation score of a pixel in the propagated class activation map is larger than a preset foreground threshold, classifying the pixel into a building class, otherwise, classifying the pixel into a background class, and obtaining a building pseudo label;

and S72, inputting the building pseudo label and the high-resolution remote sensing image into a semantic segmentation model, finishing a training process and realizing building extraction.

Preferably, before step S1, the method further includes:

s01, acquiring a high-resolution remote sensing image and a corresponding label map;

and S02, cutting the high-resolution remote sensing image and the corresponding label graph to obtain a remote sensing image and a corresponding label graph with a certain size.

Preferably, the classification network uses ResNet50 as a backbone network, followed by a global averaging pooling layer and a full connectivity layer.

The invention has the beneficial effects that:

compared with the existing image-level weak supervision semantic segmentation method, the method can generate a better building class activation map by using an iterative countermeasure ascent strategy, fully excavates class activation map information under the condition of not introducing additional supervision information, and obtains the class equivalence relation among pixels. In addition, the invention further improves the segmentation performance of the network on the boundary processing by using the gated convolution layer, so that the object region can be effectively expanded and covered in the boundary to generate a high-quality building pseudo label, and the semantic segmentation model can generate a building region with high accuracy and complete segmentation boundary.

Drawings

The invention will be further described with reference to the accompanying drawings and examples, in which:

FIG. 1 is a flowchart of a high-resolution remote sensing image weak supervised building extraction method combining inter-pixel class equivalence relation and gating boundary optimization in the embodiment of the present invention;

FIG. 2 is a process diagram of iterative optimization of class activation graphs in an embodiment of the present invention;

FIG. 3 is a schematic diagram illustrating a class boundary detection process performed by a class boundary network according to an embodiment of the present invention;

FIG. 4 is a circuit diagram of a gated convolutional layer in an embodiment of the present invention;

FIG. 5 is a schematic diagram of a building pseudo tag generation process in an embodiment of the invention;

Detailed Description

For a more clear understanding of the technical features, objects and effects of the present invention, embodiments of the present invention will now be described in detail with reference to the accompanying drawings.

Referring to fig. 1, an embodiment of the present invention provides a method for extracting a high-resolution remote sensing image weakly supervised building by combining an inter-pixel class equivalence relation and gating boundary optimization, including the following steps:

s1, inputting the high-resolution remote sensing image and the corresponding label graph into a classification network for training to obtain a trained classification network, and carrying out iterative confrontation and ascending processing on the high-resolution remote sensing image based on the trained classification network to obtain a processed remote sensing image.

In this embodiment, S1 specifically includes:

and S11, cutting the large-amplitude remote sensing image data and the corresponding label image into tiles with certain sizes, using the tiles as training input, and cutting the large-amplitude remote sensing image data and the corresponding label image according to a certain superposition rate to obtain the cut remote sensing image and the corresponding label image.

S12, inputting the cut remote sensing image and the corresponding label graph into a classification network for training to obtain a trained classification network, carrying out iterative confrontation ascending processing on the cut remote sensing image based on the trained classification network to obtain a processed remote sensing image, and acquiring a remote sensing image x⁰The iterative countervailing rise of (c) can be performed using equation 1):

finger trained classification network pair input remote sensing image x^t-1Is given as a category score, ξ represents the degree of countering the interference,

representing the derivation.

And S2, inputting the processed remote sensing image into the trained classification network to obtain an iterated class activation map, and normalizing the iterated class activation map to generate an optimized class activation map.

In this embodiment, S2 specifically includes:

s21, taking the remote sensing image after the iterative confrontation ascending processing and the corresponding class label thereof as the input of a classification network, wherein the classification network adopts ResNet50 as a backbone network, is connected with a global average pooling layer (GAP) and a full connection layer, and finally obtains an iterative class activation map CAM (x) through calculation of a formula 2)^t)；

s22, activating the class after iteration to obtain a CAM (x)^t) And (4) carrying out normalization to obtain a final optimized class activation map (refer to fig. 2). Specific optimization formula visible formula 3):

S3, obtaining the equivalence relation of the pixels in the optimized class activation image, and dividing all the pixels in the optimized class activation image into a positive set P + and a negative set P according to the equivalence relation of the pixels.

In this embodiment, S3 specifically includes:

s31, comparing the activation scores in the optimized class activation map with foreground and background thresholds, and screening out three areas: foreground confidence area, background confidence area, irrelevant area. Regardless of the irrelevant areas, each confidence area is optimized by using a dense conditional random field to obtain a building confidence area and a background confidence area;

s32, selecting the category corresponding to the maximum value of the classification score of each pixel point in the building confidence region and the background confidence region to obtain a pseudo category map of all the pixel points

S33, matching the pseudo category graph

represents a pseudo-class diagram of the device,

dummy representing pixel iThe class coincides with the pseudo-class of pixel j,

S4, building a class boundary network based on the gated convolution layer, and outputting a first class boundary diagram through the class boundary network.

Referring to fig. 3, in this embodiment, S4 specifically includes:

s41, constructing a feature extraction flow: the feature extraction flow has the same structure as the ResNet50 network and comprises 5 feature extraction layers, each convolution layer in the feature extraction layers is followed by regularization and a ReLU activation function, and r is used_tFeature maps representing the output of the t-th feature extraction layer:

wherein the content of the first and second substances,

s42, constructing gated convolutional layer (see fig. 4): firstly, the characteristic graph r is_tApplication to generating an attention map with channel number 1

The specific calculation formula is as follows:

α_t＝σ(C_1×1(s_t||r_t)) 6)

s43, outputting the attention drawing and the characteristic drawing S of the gated convolution layer_tGo on pointMultiply the result and sum the result with s_tAnd performing residual error connection to obtain the output of the gated convolutional layer, wherein the specific calculation process comprises the following steps:

wherein T represents a transpose operation,

representing a gate control operation, a representing a dot product,

will be the input for the next attention map generation again;

s44, constructing shape flow to obtain a first class boundary diagram

S5, calculating to obtain a first semantic correlation matrix according to the first class boundary graph and according to the positive set P⁺Negative set P^-And calculating the first semantic correlation matrix to obtain class boundary loss, and training the class boundary network according to the class boundary loss to obtain the trained class boundary network.

In this embodiment, S5 specifically includes:

wherein, II_ijRepresentative pixel x_iAnd x_jSet of pixel points on a straight line, x_kRepresents a certain pixel therein, a_ijThe larger, the pixel x is indicated_iAnd x_jHas high semantic relevance, the less the possibility of a boundary between the two, a_ijThe smaller, the pixel x is indicated_iAnd x_jThe higher the probability that a boundary exists between the two is;

The specific calculation formula is as follows:

and

are all sets P⁺Respectively generation ofThe pseudo categories of the pixels in the positive set are represented as a set of a foreground and a background, the loss of the three subsets is combined after being regularized, and the total loss is

And P^-The contained pixel pairs are unbalanced;

S6, outputting a second class boundary diagram through the trained class boundary network, calculating according to the second class boundary diagram to obtain a second semantic correlation matrix, calculating according to the second semantic correlation matrix to obtain a transition probability matrix, and performing random walk propagation on the optimized class activation diagram according to the transition probability matrix to obtain a propagated class activation diagram.

In this embodiment, S6 specifically includes:

wherein A represents a second semantic correlation matrix

vec(M^*)＝T^t·vec(M⊙(1-B)) 11)

And S7, generating a building pseudo label (refer to fig. 5) according to the propagated class activation map, inputting the building pseudo label and the high-resolution remote sensing image into a semantic segmentation model for training, and automatically extracting a building area after training.

In this embodiment, S7 specifically includes:

and S72, inputting the building pseudo label and the high-resolution remote sensing image into a semantic segmentation model, completing a training process, and realizing automatic extraction of the building area.

According to the method, a large number of experiments are carried out on two competition data sets Vaihingen and Potsdam provided by ISPRS in the remote sensing field, and the feasibility and the effectiveness of the technical scheme are effectively proved. Table 1 compares the results of the experiments on the data sets Vaihingen and Potsdam of the method of the invention with other methods:

TABLE 1 comparison of the results of the experiments with other methods on the data sets Vaihingen and Potsdam

The experimental results in table 1 show that the building segmentation indexes mlou on the data sets Vaihingen and Potsdam respectively reach 84.1% and 83.9% by using the method provided by the invention, and compared with the precision of the existing weak supervision building extraction method on the data sets, the best segmentation performance at present is achieved. In addition, compared with the precision of the fully supervised building extraction method, the precision of the method on the two data sets respectively reaches 91.9 percent and 91.7 percent of the fully supervised performance.

While the present invention has been described with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, which are illustrative and not restrictive, and it will be apparent to those skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope of the invention as defined in the appended claims.

Claims

1. A high-resolution remote sensing image weak supervision building extraction method is characterized by comprising the following steps:

s5, calculating to obtain a first semantic correlation matrix according to the first class boundary graph and according to the positive set P⁺Negative set P^-And the first semanticCalculating a correlation matrix to obtain class boundary loss, and training the class boundary network according to the class boundary loss to obtain a trained class boundary network;

2. The method for extracting the high-resolution remote sensing image weakly supervised building as recited in claim 1, wherein in step S1, the calculation formula for performing the iterative resistance to lift process on the high-resolution remote sensing image is as follows:

representing the derivation.

3. The method for extracting the high-resolution remote sensing image weakly supervised building as recited in claim 1, wherein the step S2 specifically comprises:

4. The method for extracting the high-resolution remote sensing image weakly supervised building as recited in claim 1, wherein the step S3 specifically comprises:

S33, matching the pseudo category graph

s34, dividing the pixel pairs with equivalent categories into positive sets and the pixel pairs with non-equivalent categories into negative sets, thereby dividing all the pixel pairs P into the positive sets P⁺And negative set P^-The concrete division formula is as follows:

represents a pseudo-class diagram of the device,

5. The method for extracting the high-resolution remote sensing image weakly supervised building as recited in claim 1, wherein the step S4 specifically comprises:

wherein the content of the first and second substances,

The specific calculation formula is as follows:

α_t＝σ(C_1×1(s_t||r_t))，

s43, the attention drawing and the gated volumeFeature map s of the laminated output_tPerforming dot product operation, and mixing the operation result with s_tAnd performing residual error connection to obtain the output of the gated convolutional layer, wherein the specific calculation process comprises the following steps:

wherein T represents a transpose operation,

representing a gate control operation, a representing a dot product,

will be the input for the next attention map generation again;

s44, constructing shape flow to obtain a first class boundary diagram

Outputting the feature map r of the first layer in the step S41₁As input to the shape flow branch, three of said gated convolutional layers are then used in a concatenation, the output of each gated convolutional layer will be input to the next gated convolutional layer, the three gated convolutional layers are connected to the third, fourth and last layers of ResNet50, and finally the first class boundary map is output

6. The method for extracting the high-resolution remote sensing image weakly supervised building as recited in claim 1, wherein the step S5 specifically comprises:

therein, II_ijRepresentative pixel x_iAnd x_jSet of pixel points on a straight line, x_kRepresents a certain pixel therein, a_ijThe larger, the pixel x is indicated_iAnd x_jHas high semantic relevance, the less the possibility of a boundary between the two, a_ijThe smaller, the pixel x is indicated_iAnd x_jThe higher the probability that a boundary exists between the two is;

The specific calculation formula is as follows:

wherein, | | represents the number of all the pixels belonging to the current set, Σ represents summation, loga_ijRepresents a pair of_ijThe logarithm is calculated, and the logarithm is calculated,

and

are all subsets of the set P +, and respectively generateThe pseudo categories of the pixels in the positive set are represented as a set of a foreground and a background, the loss of the three subsets is combined after being regularized, and the total loss is

And P^-The contained pixel pairs are unbalanced;

7. The method for extracting the high-resolution remote sensing image weakly supervised building as recited in claim 1, wherein the step S6 specifically comprises:

wherein A represents a second semantic correlation matrix

υec(M^*)＝T^t·vec(M⊙(1-B))，

wherein, the row represents Hadamard product operation, the vec (·) represents vectorization, t represents iteration times, B represents class boundary diagram, and M represents optimized class activation diagram, penalizing fraction of boundary pixel by multiplying by (1-B), not need to be propagated to adjacent pixel, and finally obtaining propagated class activation diagram M^*。

8. The method for extracting the high-resolution remote sensing image weakly supervised building as recited in claim 1, wherein the step S7 specifically comprises:

9. The method for extracting the high-resolution remote sensing image weakly supervised building as recited in claim 1, further comprising, before step S1:

10. The method for extracting the high-resolution remote sensing image weakly supervised building as recited in claim 1, wherein the classification network adopts ResNet50 as a backbone network, and is followed by a global average pooling layer and a full connection layer.