CN113191213B

CN113191213B - High-resolution remote sensing image newly-added building detection method

Info

Publication number: CN113191213B
Application number: CN202110389794.5A
Authority: CN
Inventors: 孙希延; 肖钰; 纪元法; 黄建华; 付文涛; 白杨; 郭宁
Original assignee: Guilin University of Electronic Technology
Current assignee: Guilin University of Electronic Technology
Priority date: 2021-04-12
Filing date: 2021-04-12
Publication date: 2023-01-03
Anticipated expiration: 2041-04-12
Also published as: CN113191213A

Abstract

The invention discloses a high-resolution remote sensing image newly-added building detection method, which comprises the steps of carrying out superpixel segmentation on an obtained second time phase GF2 image to obtain a superpixel object; then constructing a building data set by using the time phase GF2 image; inputting the building extraction data set into a multi-scale constraint coding and decoding network for training to obtain a second time-phase building binary image, wherein the multi-scale constraint coding and decoding network adopts a dual-path system structure to respectively obtain global information and local information, can better distinguish buildings from a complex background and refine details of the buildings by combining the global information and the local information, and simultaneously combines the obtained superpixel object with the building binary image to obtain a second time-phase building target object; and then, obtaining a pixel level change detection result by using an IRMAD algorithm, and carrying out spatial position superposition analysis based on the building target object and the pixel level change detection result to realize detection of the newly added building.

Description

High-resolution remote sensing image newly-added building detection method

Technical Field

The invention relates to the technical field of remote sensing image processing, in particular to a method for detecting a newly added building of a high-resolution remote sensing image.

Background

The building is used as an important artificial ground object target in a basic geographic database, and the realization of automatic change detection on the building is always a research hotspot and difficulty in the fields of artificial intelligence, photogrammetry, remote sensing and the like. The rapid and accurate building change detection has important significance in urban planning, post-earthquake disaster relief and the like.

There are three main strategies for building change detection: firstly, extracting buildings from remote sensing images in different periods, and then carrying out change detection by using an extraction result; building change detection is directly carried out by using the building characteristics; change detection is performed first, and then the part belonging to the building in the change area is judged. Change detection is directly carried out on the basis of building extraction, and the result of the change detection often excessively depends on building extraction precision. The key point of directly detecting the change of the building by adopting the characteristics of the building is the effectiveness of the characteristics of the building, and the stability and the applicability of the characteristics are still to be improved. Therefore, a strategy of synchronously extracting the building and detecting the change is adopted, the extraction of the building and the change detection of all the land features are independently completed respectively, and then the two results are combined to obtain the detection result of the change of the building.

The traditional remote sensing image building extraction method mainly expresses 'what is a building' by empirically designing an appropriate feature, and creates a corresponding feature set for automatic identification and extraction of the building. Common characteristic indicators include spectrum, length, edges, shape, texture, shading, etc., but these characteristics can vary significantly with season, lighting, atmospheric conditions, sensor quality, dimensions, building style, and environment. The method for designing features by experience can only process specific data, and cannot realize real automation, so that at present, deep learning is applied to remote sensing image building extraction, and the traditional method for designing features by artificial experience is replaced by the capability of automatically learning multi-level feature representation by deep learning. On the other hand, the deep learning-based method strongly depends on a large-capacity and high-precision sample database, and only quantitative comparison can be performed on theories and methods by using an open source data set, so that the method cannot be used for practical application.

Disclosure of Invention

The invention aims to provide a high-resolution remote sensing image newly-added building detection method, and aims to solve the technical problems that in the prior art, when deep learning is used for extracting a building, the accuracy of extracting building features is low and no proprietary data set exists in the high-resolution remote sensing image newly-added building detection.

In order to achieve the purpose, the invention adopts a method for detecting the newly added building by using the high-resolution remote sensing image, which comprises the following steps:

selecting high-resolution GF2 images available in two phases, comparing the first-phase high-resolution GF2 image with retention, and performing superpixel segmentation on the second-phase high-resolution GF2 image to obtain a superpixel object;

constructing a building extraction dataset using the second temporal high-resolution GF2 images;

inputting the building extraction data set into a multi-scale constraint coding and decoding network for training to obtain a second time-phase building binary image;

the super pixel object is combined with the second time-phase building binary image to obtain a second time-phase building target object;

performing differential processing on the first time-phase high-resolution GF2 image and the second time-phase high-resolution GF2 image to obtain a pixel-level change detection result;

and carrying out spatial position analysis on the pixel level change detection result and the second time-phase building target object to realize detection of the newly added building.

Wherein the construction of the building extraction data set using the second-phase high-resolution GF2 image comprises the following steps:

processing the second time-phase high-resolution GF2 image to obtain a morphological building index gray-scale map;

calculating the building index mean value in the super-pixel object, setting a threshold value and obtaining a suspected building pattern spot after segmentation;

the suspected building pattern spots are converted into final building labels through manual modification;

randomly cutting and expanding the high-resolution GF2 image and the final building label to obtain a building extraction data set;

the building extraction data set is divided into a training set, a verification set and a test set.

And in the process of randomly cutting and expanding the high-resolution GF2 image and the final building label to obtain a building extraction data set, dividing the large-size image into 512 by 512 data sets with specified sizes, and performing data expansion through horizontal overturning, vertical overturning and diagonal overturning.

Wherein the proportion of the training set, the verification set and the test set in the building extraction data set is 6:2:2.

the multi-scale constraint coding and decoding network comprises an encoder and a decoder, wherein the encoder consists of a dual-path system structure and multi-scale branches, the dual-path system structure comprises a local information path and a whole local information path, the local information path uses expansion convolution to extract features, the whole information path adopts VGG16 to extract features, and the multi-scale branches acquire multi-scale information through different down-sampling multiples.

The decoder adopts a multipath feature fusion module, and the multipath feature fusion module distributes different weights to feature maps of different perception fields.

And performing differentiation processing by using an IRMAD algorithm in the process of performing differentiation processing on the first time-phase high-resolution GF2 image and the second time-phase high-resolution GF2 image to obtain a pixel-level change detection result.

And in the process of analyzing the spatial position of the pixel-level change detection result and the second time-phase building target object to realize the detection of the newly added building, specifically judging by using the intersection after the pixel-level change detection result and the spatial position of the second time-phase building target object are superposed.

The invention relates to a high-resolution remote sensing image newly-added building detection method, which comprises the steps of carrying out superpixel segmentation on an obtained second time phase GF2 image to obtain a superpixel object; then, building data sets are constructed by the time phase GF2 images; inputting the building extraction data set into a multi-scale constraint coding and decoding network for training to obtain a second time-phase building binary image, wherein the multi-scale constraint coding and decoding network adopts a double-path architecture to respectively obtain global information and local information, can better distinguish buildings from complex backgrounds and refine details of the buildings by combining the global information and the local information, and simultaneously combines the obtained super-pixel object with the building binary image to obtain a second time-phase building target object; and then, an IRMAD algorithm is used for obtaining a pixel level change detection result, and spatial position superposition analysis is carried out based on a building target object and the pixel level change detection result, so that detection of a newly-built building is realized, and the technical problems that in the prior art, when a building is extracted by deep learning, building feature extraction accuracy is low and a proprietary data set does not exist in detection of a newly-added building with a high-resolution remote sensing image are solved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the prior art descriptions will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.

Fig. 1 is a schematic flow chart of a high-resolution remote sensing image newly-added building detection method of the invention.

Fig. 2 is a diagram of a second time-phase image segmentation result according to an embodiment of the invention.

FIG. 3 is a morphological building index feature map of an embodiment of the present invention.

FIG. 4 is a suspected speckle pattern of a building, according to an embodiment of the invention.

Fig. 5 is a final building label diagram of an embodiment of the present invention.

Fig. 6 is a network structure diagram of the multi-scale constraint codec network of the present invention.

FIG. 7 is a schematic diagram of the convolutional expansion of the multi-scale constrained codec network of the present invention.

FIG. 8 is a diagram of a semantic information path structure of the multi-scale constrained codec network of the present invention.

FIG. 9 is a schematic diagram of a feature fusion module of the multi-scale constrained codec network according to the present invention.

Fig. 10 is a second phase building object diagram of an embodiment of the present invention.

FIG. 11 is a diagram of pixel level variation detection results in accordance with an embodiment of the present invention.

Fig. 12 is a diagram showing the detection result of the newly added building according to the embodiment of the present invention.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and are intended to be illustrative of the invention and should not be construed as limiting the invention.

In the description of the present invention, it is to be understood that the terms "length," "width," "upper," "lower," "front," "rear," "left," "right," "vertical," "horizontal," "top," "bottom," "inner," "outer," and the like are used in the orientations and positional relationships indicated in the drawings only for the convenience of description and simplicity of description, and are not intended to indicate or imply that the referenced devices or elements must have a particular orientation, be constructed in a particular orientation, and be operated in a particular manner, and thus, are not to be construed as limiting the present invention. In addition, in the description of the present invention, "a plurality" means two or more unless specifically defined otherwise.

Referring to fig. 1, the present invention provides a method for detecting a new building with a high resolution remote sensing image, which includes the following steps:

s1: selecting high-resolution GF2 images available in two phases, comparing the first time-phase high-resolution GF2 image in a retention mode, and performing superpixel segmentation on the second time-phase high-resolution GF2 image to obtain a superpixel object;

s2: constructing a building extraction dataset using the second temporal high-resolution GF2 images;

s3: inputting the building extraction data set into a multi-scale constraint coding and decoding network for training to obtain a second time-phase building binary image;

s4: the super pixel object and the second time-phase building binary image are combined to obtain a second time-phase building target object;

s5: performing differential processing on the first time-phase high-resolution GF2 image and the second time-phase high-resolution GF2 image to obtain a pixel-level change detection result;

s6: and carrying out spatial position analysis on the pixel level change detection result and the second time-phase building target object to realize detection of the newly added building.

Constructing a building extraction dataset using the second-phase high-resolution GF2 images, comprising the steps of:

s21, processing the second time-phase high-resolution GF2 image to obtain a morphological building index gray scale image;

s22, calculating the building index mean value in the superpixel object, setting a threshold value, and segmenting to obtain suspected construction spots;

s23, converting the suspected building pattern spots into final building labels through manual modification;

s24, randomly cutting and expanding the high-resolution GF2 image and the final building label to obtain a building extraction data set;

and S25, dividing the building extraction data set into a training set, a verification set and a test set.

The proportion of the training set, the verification set and the test set in the building extraction data set is 6:2:2.

the multi-scale constraint coding and decoding network comprises an encoder and a decoder, wherein the encoder is composed of a double-path architecture and multi-scale branches, the double-path architecture comprises a local information path and a global information path, the local information path uses expansion convolution to extract features, the global information path adopts VGG16 to extract features, and the multi-scale branches acquire multi-scale information through different down-sampling multiples.

The decoder adopts a multipath feature fusion module which allocates different weights to feature maps of different sensed views.

And in the process of carrying out spatial position analysis on the pixel level change detection result and the second time-phase building target object to realize detection of the newly added building, specifically judging by using an intersection formed by superposing the pixel level change detection result and the second time-phase building target object in spatial position.

Referring to fig. 2 to 12, the present invention provides a specific embodiment of a method for detecting a new building with high resolution remote sensing images, which comprises:

the research area is a color-stacking area of Guilin city of the Guangxi Zhuang nationality autonomous area, a typical sub-area of the area is selected for analysis, data come from 2016 and 2018 two-phase GF2 images, the second-phase images, namely the 2018-year GF2 images, are subjected to superpixel segmentation, a data set is constructed, a multi-scale constraint coding and decoding network (MSCNet) extracts a second-phase building target object, the two-phase image images, namely the 2016 and 2018-year GF2 images, are subjected to change detection to obtain a pixel level change detection result, and the pixel level change detection result and the building target object are subjected to spatial position analysis to realize new building detection.

(A) Superpixel segmentation

The Mean Shift algorithm includes two steps, superpixel primary segmentation and merging of segmented regions.

The super-pixel primary segmentation is equivalent to the realization of primary segmentation by means of a module point search technology, and the specific process is as follows: (1) Setting the kernel function bandwidth of a coordinate space and a spectrum space; (2) calculating Mean-Shift vector by using a Gaussian kernel function; (3) And judging whether the modulus of the vector is larger than a specified threshold value or not, thereby carrying out iterative operation and determining the position of the modulus point.

The merging of the segmentation regions is to merge the spatially adjacent and spectrally similar regions into the same object after the primary segmentation of the superpixel is completed, so as to implement image segmentation, and the segmentation result is shown in fig. 2.

(B) Building a data set

(1) Calculating a morphological building index feature map:

the Morphological Building Index (MBI) is based on the large spectral variation at the edge of the building and the small spectral variation at its interior. The index is constructed in consideration of the shape, direction, brightness, contrast, and other characteristics of the building. The steps for establishing MBI are as follows:

1) Calculating a luminance value

In the formula: k is the number of bands of the visible light spectrum _k (x) The luminance value of the k-th band at the pixel x is selected as the luminance value of each pixel in the visible light band because the visible light band has a large influence on the spectral information of the building.

2) Morphological white cap reconstruction

In the formula:

to perform the morphological opening operation on the luminance image b, d and s represent the direction and scale of the linear structural element, respectively. Because the spectral information of buildings and roads are relatively similar, the directions of the roads are less, the roads generally extend along one or two directions, the directions of the buildings are more, and linear structural elements with multiple directions and scales are selected for well distinguishing the roads from the buildings.

3) Calculating the morphological Profile MP

4) Calculating a differential morphology Profile DMP

DMP _W-TH (d,s)＝|WP _W-TH (d,s+Δs)-MP _W-TH (d,s) (4)

In the formula: s _min ≤Δs≤S _max

5) Calculating a morphological building index MBI

In the formula: s = ((S) _max -S _min ) And the value of/. DELTA.s) +1,D is the number of directions in calculating the cross section of the building.

The reason for establishing the MBI is that the differential morphology section has higher local contrast, so that the building with larger MBI characteristic value is shown as a building, as shown in FIG. 3.

(2) And calculating the MBI mean value in each super pixel, and when the mean value is larger than a certain specified threshold value, regarding the super pixel as a suspected building spot, generating a diagram 4, and generating a final building label after manual modification, such as the diagram 5.

(3) Random clipping and expanding: the invention adopts a random cutting mode to cut the preprocessed image and the label sample, divides the large-size image into 512-512 data sets with specified size, carries out data expansion through horizontal turning, vertical turning and diagonal turning, and finally, according to the following steps of 6:2: the scale of 2 is divided into a training set, a validation set, and a test set for later training and test evaluation.

(C) Multi-scale constrained codec network (MSCNet) extraction building

The network includes an encoder and a decoder, the encoder being constructed from a dual-path architecture and multi-scale branches. The dual path architecture handles local information and global information separately, which are defined herein as local information paths and global information paths, respectively. Specifically, the problem of resolution reduction in the downsampling process is solved by adopting expansion convolution on the local information path, and more local information is reserved. The global information path adopts VGG16 to extract features, the receptive field is increased to obtain global information, meanwhile, multi-scale branches are designed on the path, and the multi-scale information is obtained through different down-sampling multiples.

And a special feature fusion module is introduced in the decoding part to fuse the obtained multiple features. And then, two strategies are adopted to restrict the updating of the parameters on each path. One is to add a constraint (here, a constraint refers to an optimization objective between a prediction of a certain path and a corresponding ground truth) behind each path in the upsampling step. The parameters are updated through multi-path constraints, so that multi-resolution tags can be utilized in back propagation and weight updating, the parameters are prevented from being biased to a single constraint, and the feature representation of each path is further strengthened. Specifically, a 1 × 1 convolutional layer after the Sigmoid layer is used to obtain dense predictions from a feature map with a certain resolution, and the corresponding tags are down-sampled from the ground truth by bilinear interpolation. Finally, the overall loss is used for the network to learn to integrate the multi-resolution segment label graph into the final prediction; secondly, the final characteristic diagrams on each path are fused to form a characteristic diagram containing 4 channels, and finally a final prediction diagram is obtained through 1 multiplied by 1 convolution and sigmoid function, through the strategy, the final prediction diagram converges multipath characteristics, restricts multipath updating parameters, and each path plays an active role in back propagation and model training, and a network structure schematic diagram is shown in fig. 6.

(1) An encoder:

in order to preserve the original input image size and encode rich local information while maintaining a certain perceptual field of view, the present algorithm uses a dilation convolution in the local information path. The path contains three layers, the first two layers include a convolution with step 1, then batch normalization and ReLU, and the third layer is a dilation convolution. Therefore, the output feature map of this path extraction maintains the size of the original image. Because the space size of the feature map is large, the feature map encodes rich local information. The expression of the common convolution is

Where O (x, y) is the pixel value of the original image at point (x, y), and H (x, y) is the convolution kernel by which it is multiplied, with a size of w × H.

The expansion convolution is calculated as

Where l is the dilation factor and H' (x, y) is the dilation convolution kernel.

As can be seen from equations (6) and (7), the dilation convolution essentially fills the convolution kernel by 0, which can increase the field of view of the convolution kernel while retaining the original pixel information, thereby increasing the resolution. If the convolution kernel size is k and the dilation rate is l, the actual effective size of the dilation convolution is k + (k-1) × (l-1). The dilated convolution not only enlarges the perceived field of view, but also maintains the same resolution as the normal convolution, as compared to the normal convolution of the same size, which is schematically illustrated in fig. 7.

The global information path adopts VGG16 to extract features, the receptive field is increased to obtain global information, and meanwhile, the global information path adopts VGG16 to extract featuresAnd designing multi-scale branches for the path, and acquiring multi-scale information through different down-sampling multiples. Specifically, after block 1, three parallel independent branches are split, and the feature map is rapidly downsampled to obtain a large receptive field. Each branch is intended to encode features to a particular resolution, making full use of semantic information at different scales. Referring to FIG. 8, for a simple representation of the global information path and the multi-scale branching structure, use N _sr Representing the feature layer of the stage, wherein s represents the branch, r represents the down sampling times, and the resolution of the sub-network feature map of the s-th branch is the original size divided by 2 ^r . The highest resolution of the original image is used as the input of the block 1, then the characteristic graph N is output by each branch after parallel branches ₁₂ 、N ₂₃ 、N ₃₄ The sizes of the (A) and (B) are respectively 1/4, 1/8 and 1/16 of the original figure.

(2) A decoder:

buildings of different dimensions have different importance in sensing the size of the field of view. For larger objects, the features acquired with a large perceived field of view are important, while for smaller objects, the features acquired with a large perceived field of view may acquire excessive peripheral information, leading to errors. The traditional feature fusion method is generally cascade or addition, and the simple method does not consider different perception fields of different feature maps and neglects specificity among features. In contrast, the multipath feature fusion module adopted by the present invention assigns different weights to feature maps of different perception fields, and achieves better feature fusion, as shown in fig. 9.

First, two or more input feature maps are concatenated at the channel dimension level. Secondly, the concatenated feature maps are passed through convolution kernels with the size of 3 x 3 to realize the preliminary fusion of feature map information, and the obtained feature maps are subjected to global pooling operation to extract the information of each feature map. Then, the obtained feature maps are passed through a convolution kernel of size 1 × 1, and the network is allowed to learn the weights from the overall information of each feature map. And finally, obtaining the final weight through a sigmoid function, and multiplying the final weight by the original characteristic diagram. Through the characteristic fusion module, weights are distributed to the characteristic graphs in different perception visual fields, so that the characteristic specificity in different perception visual fields is reflected, and the characteristics are fused better.

Since the output is targeted to a binary classification of buildings and non-buildings, the sigmoid function is chosen to generate predictions for each layer:

w∈R ^c and b ∈ R ¹ Representing the weight and the deviation, respectively. Predicting y _i,j Is limited to [0,1 ]]。

For better convergence during training iterations, binary cross-entropy is chosen to compute the kth constraint (C) between each prediction and the relative ground truth _k ) Rather than a simple Mean Square Error (MSE). The formula is:

wherein h is ^k And w ^k Is the kth prediction y ^k And ground true value g ^k Height and width of (a). If the observation is of the category 1,

has a value of 1; otherwise, the value is 0.

Is the predicted probability that the pixel belongs to class 1.

The MSCNet model generates a pyramidal element map layer by sequential convolution and upsampling blocks and skipping connections.

In strategy one, for each feature layer in the feature pyramid, after sigmoid activation, a single kernel of 1 × 1 convolution is applied to generate a prediction for that layer. Then, the constraint of each layer can be passed through each predictionAnd the associated ground truth. These constraints are denoted as C, in terms of distance from the final convolution layer _main 、C′ _main 、C _sub1 、C _sub2 And C _sub3 . Thus, the final penalty for MSCNet strategy one can be expressed as:

Loss＝α×C _main +γC _sub1 +λC _sub2 +σC _sub3 (11)

wherein the sum of α, γ, λ and σ is set to 1.0.

And for the strategy two, fusing each feature layer in the feature pyramid to form a feature map containing 4 channels, and finally obtaining a final prediction map through 1 × 1 convolution and a sigmoid function. Thus, the final penalty for MSCNet strategy two can be expressed as:

Loss′＝C _main′ (12)

all the layers are trained by small-batch Stochastic Gradient Descent (SGD) and Back Propagation (BP) algorithms to minimize the final loss, and the MSCNet model learns how to map from the input multichannel remote sensing images to the equal-size binary segmentation maps. And finally, combining the predicted binary image with the superpixel object to obtain a building target object, as shown in fig. 10.

(D) IRMAD-based pixel level change detection

The method comprises the steps of obtaining a difference image of 2 time phase remote sensing images by using an IRMAD algorithm, then selecting a proper threshold value, and dividing pixels on the difference image into variable pixels or non-variable pixels. The basic principle of pixel level change detection based on the IRMAD is as follows: introducing a random variable T related to the MAD component and carrying out iterative weighting on pixels through a chi-square distribution probability function, so that the unchanged pixels can obtain larger weight in the iterative process, then carrying out next iteration until convergence by using new weight to generate a difference graph, wherein the probability that a brighter area in the graph is changed is higher, and finally, a threshold value is assigned to judge whether each pixel is changed, so as to generate a pixel-level change detection result, please refer to FIG. 11.

(E) The specific discrimination process of the spatial position superposition method is as follows:

(1) And (4) judging the intersection condition of the extracted result of the time-phase image building and the changed pixels, as shown in a formula (12).

In the formula (I), the compound is shown in the specification,

the ith building object of the rear phase image is shown, C is the pixel level change detection result,

it represents the number of pixels that the intersection of the object and the later-phase building extraction result contains.

(2) According to

Size, setting rules to judge newly added buildings:

exceed

The area of the building object is half, the object is a new building, and otherwise, the building object is not changed. Fig. 12 shows the detection result of the newly added building.

While the invention has been described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention.

Claims

1. A method for detecting a newly added building by using a high-resolution remote sensing image is characterized by comprising the following steps:

selecting high-resolution GF2 images available in two phases, comparing the first-phase high-resolution GF2 image in retention, and performing superpixel segmentation on the second-phase high-resolution GF2 image to obtain a superpixel object;

constructing a building extraction data set by using the second time-phase high-resolution GF2 image;

the multi-scale constraint coding and decoding network comprises a coder and a decoder, wherein the coder consists of a dual-path architecture and multi-scale branches, the dual-path architecture comprises a local information path and a global information path, the local information path uses expansion convolution to extract features, the global information path adopts VGG16 to extract features, and the multi-scale branches acquire multi-scale information through different down-sampling multiples;

the decoder adopts a multipath feature fusion module which allocates different weights to feature maps of different perception fields;

2. The method for detecting the newly added building by using the high-resolution remote sensing image as claimed in claim 1, wherein the building extraction data set is constructed by using the second-time-phase high-resolution GF2 image, and the method comprises the following steps:

3. The method for detecting the new building of the high-resolution remote sensing image according to claim 2, wherein in the process of randomly cutting and expanding the high-resolution GF2 image and the final building label to obtain the building extraction data set, the large-size image is divided into 512 by 512 data sets with specified sizes, and data expansion is carried out through horizontal overturning, vertical overturning and diagonal overturning.

4. The method for detecting the newly added building in the high-resolution remote sensing image as claimed in claim 3, wherein the proportion of the training set, the verification set and the test set in the building extraction data set is 6:2:2.

5. the method for detecting the newly-added building by using the high-resolution remote sensing image according to claim 4, wherein an IRMAD algorithm is used for differentiation processing in the process of obtaining a pixel-level change detection result by differentiation processing of the first time-phase high-resolution GF2 image and the second time-phase high-resolution GF2 image.

6. The method for detecting the newly added building by using the high-resolution remote sensing image as set forth in claim 5, wherein in the process of analyzing the spatial position of the pixel-level change detection result and the second-time-phase building target object to detect the newly added building, the intersection of the pixel-level change detection result and the spatial position of the second-time-phase building target object after superposition is used for specific judgment.