CN113269717A

CN113269717A - Building detection method and device based on remote sensing image

Info

Publication number: CN113269717A
Application number: CN202110382233.2A
Authority: CN
Inventors: 魏永明; 高锦风; 陈玉; 李剑南
Original assignee: Aerospace Information Research Institute of CAS
Current assignee: Aerospace Information Research Institute of CAS
Priority date: 2021-04-09
Filing date: 2021-04-09
Publication date: 2021-08-17

Abstract

The invention provides a building detection method and a device based on a remote sensing image, wherein the method comprises the following steps: inputting a remote sensing image to be detected into a first network in a building detection model, and outputting the characteristics of the remote sensing image to be detected; inputting the characteristics of the remote sensing image to be detected into a second network in the building detection model, and outputting a prediction frame of a building in the remote sensing image to be detected; wherein the first network comprises a plurality of portions, each portion comprising a downsampling layer, and one or more SE-ResNeXt layers; the building detection model is obtained by training by taking a remote sensing image sample containing a building as a sample and taking a real frame of the building in the remote sensing image sample as a label. The invention improves the recall rate of building detection, so that the building detection is more accurate.

Description

Building detection method and device based on remote sensing image

Technical Field

The invention relates to the technical field of image processing, in particular to a building detection method and device based on remote sensing images.

Background

The detection of specific buildings like gas stations, schools and airports is of great importance in smart cities and military applications. Although the traditional surveying and mapping technology is high in precision, time and labor are wasted, the updating period is long, and the requirements of quick updating and changing urban construction cannot be met.

With the rapid development of sensors and aerospace technologies, the time resolution, the spatial resolution and the spectral resolution of remote sensing images are higher and higher. Remote sensing technology can obtain more detailed information of the ground features in a shorter time, which makes it possible to detect a certain type of building from the remote sensing image.

Traditionally, the detection of a particular building in a remote sensing image has been based primarily on artificial features such as corners, edges and textures. Although the methods based on these features are easy to understand, the detection accuracy is often low due to the limited amount of information and the lack of spatial structure information in the artificially constructed features. Furthermore, these methods are poorly portable and difficult to use universally between different types of buildings.

Although a Convolutional Neural Network (CNN) has a strong capability of mining spatial structure information and has strong universality due to an automatic learning mechanism, the Convolutional Neural Network (CNN) has the problems of low recall rate of building detection and inaccurate detection.

Disclosure of Invention

The invention provides a building detection method and device based on a remote sensing image, which are used for solving the defects of low recall rate and inaccurate detection of the remote sensing image building detection in the prior art and realizing the accurate detection of the remote sensing image building.

The invention provides a building detection method based on a remote sensing image, which comprises the following steps:

inputting a remote sensing image to be detected into a first network in a building detection model, and outputting the characteristics of the remote sensing image to be detected;

inputting the characteristics of the remote sensing image to be detected into a second network in the building detection model, and outputting a prediction frame of a building in the remote sensing image to be detected;

wherein the first network comprises a plurality of portions, each portion comprising a downsampling layer, and one or more SE-ResNeXt layers;

the building detection model is obtained by training by taking a remote sensing image sample containing a building as a sample and taking a real frame of the building in the remote sensing image sample as a label.

According to the building detection method based on the remote sensing image, provided by the invention, each SE-ResNeXt layer comprises a conversion layer with preset number of weight sharing;

each transformation layer comprises two CBL structures;

each CBL structure comprises a convolution layer, a batch processing normalization layer and a LeakyReLU layer;

wherein, the convolution kernel sizes of the convolution layers in the two CBL structures are different.

According to the building detection method based on the remote sensing image, each SE-ResNeXt layer further comprises a SEnet layer, and each SE-ResNeXt layer is connected through one jump layer.

According to the building detection method based on the remote sensing image provided by the invention, the remote sensing image to be detected is input into a first network in a building detection model, and the characteristics of the remote sensing image to be detected are output, and the method also comprises the following steps:

calculating a first intersection-parallel ratio between a prediction frame and a real frame according to the overlapping area between the prediction frame and the real frame of the building in the remote sensing image sample, the distance between the central point of the prediction frame and the central point of the real frame, and the consistency between the aspect ratio of the prediction frame and the aspect ratio of the real frame;

calculating a position loss between the prediction frame and the real frame according to the first intersection ratio;

and training the building detection model according to the position loss.

According to the building detection method based on the remote sensing image, provided by the invention, the first intersection ratio between the prediction frame and the real frame is calculated through the following formula:

wherein CIOU is the first intersection ratio, IOU is the second intersection ratio calculated according to the overlapping area between the prediction frame and the real frame, O_pRepresents the center point of the prediction box, O_lRepresents the center point of the real box, l (O)_p，O_l) Represents O_pAnd O_lC represents a distance between diagonals of a minimum bounding rectangle that simultaneously encloses the prediction box and the true-value box, ν represents a correspondence between an aspect ratio of the prediction box and an aspect ratio of the true box, and α represents a coefficient of ν.

According to the building detection method based on the remote sensing image, the consistency between the length-width ratio of the prediction frame and the length-width ratio of the real frame is calculated through the following formula:

wherein, w_tWidth, h, representing the real box_tHigh, w representing said real box_pWidth, h, of the prediction box_pIndicating a high for the prediction box.

According to the building detection method based on the remote sensing image, provided by the invention, the coefficient is calculated through the following formula:

according to the building detection method based on the remote sensing image, provided by the invention, the position loss between the prediction frame and the real frame is calculated according to the first intersection ratio through the following formula:

CIOU_LOSS＝Confidence×(2-w_t×h_t)×(1-CIOU)；

wherein CIOU _ LOSS isThe position loss, CIOU, is the first cross-over ratio, w_tWidth, h, representing the real box_tRepresenting the true box high and Confidence representing the Confidence of the predicted box.

The invention also provides a building detection device based on the remote sensing image, which comprises:

the extraction module is used for inputting the remote sensing image to be detected into a first network in a building detection model and outputting the characteristics of the remote sensing image to be detected;

the detection module is used for inputting the characteristics of the remote sensing image to be detected into a second network in the building detection model and outputting a prediction frame of a building in the remote sensing image to be detected;

The invention also provides an electronic device, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the program to realize the steps of the building detection method based on the remote sensing image.

According to the building detection method and device based on the remote sensing image, the first network in the building detection model is used as the feature extraction network, the first network comprises a plurality of parts, each part comprises a downsampling layer and one or more SE-ResNeXt layers, so that the first network keeps a larger depth, the recall rate of building detection is improved, and the building detection is more accurate, fast and strong in robustness.

Drawings

In order to more clearly illustrate the technical solutions of the present invention or the prior art, the drawings needed for the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

FIG. 1 is a schematic flow chart of a method for building detection based on remote sensing images provided by the present invention;

FIG. 2 is a schematic structural diagram of a building detection model in the building detection method based on remote sensing images provided by the invention;

FIG. 3 is a schematic structural diagram of a building detection device based on remote sensing images provided by the invention;

fig. 4 is a schematic structural diagram of an electronic device provided in the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The building detection method based on remote sensing images of the invention is described below with reference to fig. 1, and comprises the following steps: step 101, inputting a remote sensing image to be detected into a first network in a building detection model, and outputting the characteristics of the remote sensing image to be detected; wherein the first network comprises a plurality of portions, each portion comprising a downsampling layer, and one or more SE-ResNeXt layers;

the remote sensing image to be detected is a remote sensing image needing building detection.

The building detection model includes a first network and a second network. The first network is used for extracting the characteristics of the remote sensing image to be detected. The second network is used for building detection according to the features extracted by the first network.

Each part of the first network is treated as one nSR module. Optionally, the first network comprises 5 nSR modules, n being 1, 2, 8 and 4, respectively, as shown in (r) in fig. 2.

The number of nSR modules and the value of n included in the first network may be set as desired.

Each nSR module is formed by the superposition of one down-sampling layer down-sampling and n SE-rennext layers, as shown in fig. 2, so that the first network maintains a large depth.

Optionally, the first layer of the first network is a CBL structure, which includes one convolution layer Conv2D, one batch normalization layer (BN) and one leakage relu layer, as indicated by (v) in fig. 2.

Optionally, the remote sensing image to be detected is preprocessed before the remote sensing image to be detected is subjected to building detection. The pre-processing includes geometric correction and image registration. The geometric correction is used for eliminating deformation of the geometric position, shape and other characteristics of the building on the remote sensing image caused by factors such as atmospheric refraction, earth curvature, topographic relief and the like.

Optionally, the preprocessed remote sensing image to be detected is cut according to a preset resolution, for example, the resolution is 416 × 416. And inputting the cut image blocks Input into a trained building detection model for detection.

And splicing the image blocks with the building prediction frame output by the building detection model according to the original positions of the image blocks in the image to be detected to obtain a complete remote sensing image with a detection result.

Optionally, if the remote sensing image to be detected is an RGB image, the image block Input is an image of 416 × 416 × 3.

Step 102, inputting the characteristics of the remote sensing image to be detected into a second network in the building detection model, and outputting a prediction frame of a building in the remote sensing image to be detected;

optionally, as shown in fig. 2, the output of the second network includes a small-scale prediction result output, a medium-scale prediction result output, and a large-scale prediction result output.

Wherein, the small-scale prediction result is output as that the remote sensing image to be detected is subjected to multiple down-sampling through a first network to obtain a small-scale characteristic diagram, for example, 416 × 416 characteristic diagram of the remote sensing image to be detected is output as that 13 × 13 characteristic diagram through the first network; and then, after the post-processing of the second network, a prediction frame on a 13 × 13 scale is obtained.

Outputting the mesoscale prediction result as the output add1 of the ith part in the first network and the up-sampling result finally output by the first network to be spliced through a concat layer; and then, carrying out post-processing on the splicing result through a second network to obtain a prediction frame on a 26 × 26 scale. E.g., the i-th part is the second 8 SR.

Outputting a large-scale prediction result as that the output add2 of the j-th part in the first network is spliced with the up-sampling result of the splicing result through a concat layer; and then, carrying out post-processing on the splicing result through a second network to obtain a prediction frame on a 52 x 52 scale. E.g., the first 8SR in section j. Wherein i > j.

The second network in the present embodiment is not limited to the structure shown in fig. 2.

When the building detection model is trained, remote sensing image samples containing buildings are collected and marked with real frames and categories of the buildings.

All remote sensing image samples contain the same or different classes of buildings.

When the remote sensing image samples containing different types of buildings are used for training the building detection model, the building detection model can detect the different types of buildings.

Optionally, the remote sensing image sample is a 416 x 416 image block.

And training the building detection model by using the collected remote sensing image samples, thereby optimizing the weight parameters in the building detection model.

According to the method, the first network in the building detection model is used as the feature extraction network, the first network comprises multiple parts, each part comprises a down-sampling layer and one or more SE-ResNeXt layers, so that the first network keeps a larger depth, the recall rate of building detection is improved, the building detection is more accurate, the speed is high, and the robustness is strong.

On the basis of the above embodiment, each SE-resenext layer in this embodiment includes a preset number of conversion layers shared by weights; each transformation layer comprises two CBL structures; each CBL structure comprises a convolution layer, a batch processing normalization layer and a LeakyReLU layer; wherein, the convolution kernel sizes of the convolution layers in the two CBL structures are different.

Alternatively, as shown in ((c) of fig. 2), the preset number cardability is 16. The 16 transform layers are arranged in parallel.

The two CBL structures in each conversion layer are connected in series.

Optionally, the convolution kernel sizes of the two CBL structures in each transform layer are 1 × 1 and 3 × 3, respectively.

CBL structures are the most frequently used base components in building inspection models.

In the embodiment, the learning capability of the building detection model is improved by using a plurality of weight sharing transformation layers, and meanwhile, the weight sharing strategy can keep the model parameter quantity not to be increased along with the improvement of the model performance; in addition, the improvement also effectively avoids the problem of propagation gradient caused by deepening or widening the model to improve the precision.

On the basis of the above embodiment, each SE-resenext layer in this embodiment further includes a SEnet layer, and each SE-resenext layer uses one hop layer connection.

Optionally, as shown in fig. 2, after being spliced by the splicing layer, the outputs of all the transformation layers sequentially pass through the convolutional layer, the batch normalization layer and the SEnet layer, so as to ensure that the dominant features learned from the current SE-renex are preferentially applied to the learning process of the next SE-renex. In addition, the addition of SEnet can inhibit the interference of some useless features on learning.

The structure of SEnet is shown as (r) in FIG. 2.

One skip layer connection is used at each SE-resenext layer to avoid gradient problems in the back propagation.

The embodiment introduces an attention mechanism into the model through the use of SEnet, so that the learned characteristics are selectively used, the dominant characteristics are preferentially utilized, and meanwhile, the interference of useless characteristics is avoided.

On the basis of the foregoing embodiments, in this embodiment, the inputting a remote sensing image to be detected into a first network in a building detection model, and outputting characteristics of the remote sensing image to be detected further includes: calculating a first intersection-parallel ratio between a prediction frame and a real frame according to the overlapping area between the prediction frame and the real frame of the building in the remote sensing image sample, the distance between the central point of the prediction frame and the central point of the real frame, and the consistency between the aspect ratio of the prediction frame and the aspect ratio of the real frame; calculating a position loss between the prediction frame and the real frame according to the first intersection ratio; and training the building detection model according to the position loss.

In the prior art, the frame regression process of the building is mainly performed under the guidance of an intersection ratio IOU. However, the IOU only considers the overlapping region between the prediction box and the truth box. When two blocks contain or do not intersect each other, no optimization direction can be given.

The first intersection ratio in this embodiment considers the overlapping area between the prediction box and the real box, the distance between the center points of the two boxes, and the aspect ratio consistency of the two boxes. The optimization direction can be provided under various position relations of the prediction frame and the true value frame, the regression of the building frame is more accurate, and the building detection accuracy is further improved.

Optionally, a loss function is constructed based on the position loss and confidence loss between the predicted frame and the real frame of the building in the remote sensing image sample, and the class loss between the predicted class and the real class of the building in the remote sensing image sample.

Optionally, the second network of the building detection model outputs a prediction box, but the prediction box may not contain the target building, and the confidence loss of the prediction box is calculated according to whether the target is contained in the prediction box.

Optionally, the second network in the building detection model further outputs the category of the building, and the category loss is determined according to the predicted category and the real category of the building in the remote sensing image sample.

On the basis of the above embodiment, in this embodiment, the first intersection ratio between the prediction frame and the real frame is calculated by the following formula:

wherein CIOU is the first intersection ratio, IOU is the second intersection ratio calculated according to the overlapping area between the prediction frame and the real frame, O_pRepresents the center point of the prediction box, O_lRepresents the center point of the real box, l (O)_p，O_l) Represents O_pAnd O_lC represents a distance between diagonals of a minimum bounding rectangle that simultaneously encloses the prediction box and the real box, v represents a correspondence between an aspect ratio of the prediction box and an aspect ratio of the real box, and α represents a coefficient of v.

The present embodiment is not limited to a specific method of calculating the second intersection ratio according to the overlapping area between the prediction box and the real box.

On the basis of the above embodiment, in the present embodiment, the consistency between the aspect ratio of the prediction box and the aspect ratio of the real box is calculated by the following formula:

On the basis of the above embodiment, the coefficient is calculated in the present embodiment by the following formula:

wherein alpha is used for balancing the proportion of v in the first cross-over ratio.

On the basis of the above embodiment, in the present embodiment, the position loss between the prediction frame and the real frame is calculated according to the first intersection ratio by the following formula:

CIOU_LOSS＝Confidence×(2-w_t×h_t)×(1-CIOU)；

wherein CIOU _ LOSS is the position LOSS, CIOU is the first cross-over ratio, w_tWidth, h, representing the real box_tRepresenting the true box high and Confidence representing the Confidence of the predicted box.

Optionally, when there is a building in the prediction box, the Confidence is 1, otherwise, the Confidence is 0.

w_tAnd h_tThe results of the width and height normalization of the real box are shown, respectively, in the range between 0 and 1.

The program of the method in the embodiment is carried out on a Windows10 operating system which carries an RTX2080Ti independent display card (running memory 11GB) and i9-9900k processors. Experiments with remote sensing image samples show that: in the embodiment, the precision in the gas station detection is improved by 40 percent on average, the recall rate is improved by 50 percent on average, and the parameter quantity is reduced by 9 MB. The method can well realize accurate detection of the specific building on the remote sensing image, and has high application value.

The building detection device based on the remote sensing image provided by the invention is described below, and the building detection device based on the remote sensing image described below and the building detection method based on the remote sensing image described above can be referred to correspondingly.

As shown in fig. 3, the apparatus comprises an extraction module 301 and a detection module 302, wherein:

the extraction module 301 is configured to input a remote sensing image to be detected into a first network in a building detection model, and output characteristics of the remote sensing image to be detected;

The detection module 302 is configured to input the features of the remote sensing image to be detected into a second network in the building detection model, and output a prediction frame of a building in the remote sensing image to be detected;

the present embodiment is not limited to the structure of the second network.

each part of the first network is treated as one nSR module. Each nSR module is formed by the superposition of one down-sampling layer down-sampling and n SE-resenext layers.

On the basis of the foregoing embodiments, in this embodiment, the method further includes a training module, configured to calculate a first intersection-parallel ratio between a prediction frame and a real frame of a building according to an overlapping area between the prediction frame and the real frame in the remote sensing image sample, a distance between a center point of the prediction frame and a center point of the real frame, and a consistency between an aspect ratio of the prediction frame and an aspect ratio of the real frame; calculating a position loss between the prediction frame and the real frame according to the first intersection ratio; and training the building detection model according to the position loss.

On the basis of the above embodiment, in this embodiment, the training module calculates a first intersection ratio between the prediction box and the real box by using the following formula:

wherein CIOU is the first intersection ratio, IOU is the second intersection ratio calculated according to the overlapping area between the prediction frame and the real frame, O_pRepresents the center point of the prediction box, O_lRepresents the center point of the real box, l (O)_p，O_l) Represents O_pAnd O_lC represents a distance between diagonals of a minimum bounding rectangle that simultaneously bounds the prediction box and the real boxIn the above description, v represents the correspondence between the aspect ratio of the prediction frame and the aspect ratio of the real frame, and α represents a coefficient of v.

On the basis of the above embodiment, in this embodiment, the training module calculates the consistency between the aspect ratio of the prediction box and the aspect ratio of the real box by the following formula:

On the basis of the above embodiment, in this embodiment, the training module calculates the coefficient by the following formula:

on the basis of the above embodiment, in this embodiment, the training module calculates the position loss between the prediction frame and the real frame according to the first intersection ratio by the following formula:

CIOU_LOSS＝Confidence×(2-w_t×h_t)×(1-CIOU)；

Fig. 4 illustrates a physical structure diagram of an electronic device, which may include, as shown in fig. 4: a processor (processor)410, a communication Interface 420, a memory (memory)430 and a communication bus 440, wherein the processor 410, the communication Interface 420 and the memory 430 are communicated with each other via the communication bus 440. Processor 410 may invoke logic instructions in memory 430 to perform a method for remote sensing image-based building detection, the method comprising: inputting a remote sensing image to be detected into a first network in a building detection model, and outputting the characteristics of the remote sensing image to be detected; inputting the characteristics of the remote sensing image to be detected into a second network in the building detection model, and outputting a prediction frame of a building in the remote sensing image to be detected; wherein the first network comprises a plurality of portions, each portion comprising a downsampling layer, and one or more SE-ResNeXt layers; the building detection model is obtained by training by taking a remote sensing image sample containing a building as a sample and taking a real frame of the building in the remote sensing image sample as a label.

In addition, the logic instructions in the memory 430 may be implemented in the form of software functional units and stored in a computer readable storage medium when the software functional units are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

In another aspect, the present invention also provides a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions which, when executed by a computer, enable the computer to perform the method for remote sensing image-based building detection provided by the above methods, the method comprising: inputting a remote sensing image to be detected into a first network in a building detection model, and outputting the characteristics of the remote sensing image to be detected; inputting the characteristics of the remote sensing image to be detected into a second network in the building detection model, and outputting a prediction frame of a building in the remote sensing image to be detected; wherein the first network comprises a plurality of portions, each portion comprising a downsampling layer, and one or more SE-ResNeXt layers; the building detection model is obtained by training by taking a remote sensing image sample containing a building as a sample and taking a real frame of the building in the remote sensing image sample as a label.

In yet another aspect, the present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform the provided remote sensing image based building detection methods described above, the method comprising: inputting a remote sensing image to be detected into a first network in a building detection model, and outputting the characteristics of the remote sensing image to be detected; inputting the characteristics of the remote sensing image to be detected into a second network in the building detection model, and outputting a prediction frame of a building in the remote sensing image to be detected; wherein the first network comprises a plurality of portions, each portion comprising a downsampling layer, and one or more SE-ResNeXt layers; the building detection model is obtained by training by taking a remote sensing image sample containing a building as a sample and taking a real frame of the building in the remote sensing image sample as a label.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A building detection method based on remote sensing images is characterized by comprising the following steps:

2. The remote sensing image-based building detection method according to claim 1, wherein each SE-resenex layer comprises a preset number of weight-shared transform layers;

each transformation layer comprises two CBL structures;

3. The remote sensing image based building detection method of claim 2, wherein each SE-resenext layer further comprises a SEnet layer, and each SE-resenext layer is connected using one hop layer.

4. The remote sensing image-based building detection method according to any one of claims 1-3, wherein the inputting the remote sensing image to be detected into a first network in a building detection model and outputting the characteristics of the remote sensing image to be detected further comprises:

and training the building detection model according to the position loss.

5. The remote sensing image-based building detection method according to claim 4, wherein the first intersection ratio between the prediction frame and the real frame is calculated by the following formula:

wherein CIOU is the first intersection ratio, IOU is the second intersection ratio calculated according to the overlapping area between the prediction frame and the real frame, O_pRepresents the center point of the prediction box, O_lRepresents the center point of the real box, l (O)_p，O_l) Represents O_pAnd O_lC represents a distance between diagonals of a minimum rectangle that surrounds both the prediction box and the real box, v represents a correspondence between an aspect ratio of the prediction box and an aspect ratio of the real box, and α represents a coefficient of v.

6. The remote sensing image-based building detection method according to claim 5, wherein the consistency between the aspect ratio of the prediction box and the aspect ratio of the real box is calculated by the following formula:

7. The remote sensing image based building detection method of claim 5, wherein the coefficients are calculated by the formula:

8. the remote sensing image-based building detection method according to claim 4, wherein the position loss between the prediction frame and the real frame is calculated from the first intersection ratio by the following formula:

CIOU_LOSS＝Confidence×(2-w_t×h_t)×(1-CIOU)；

wherein CIOU _ LOSS is the position LOSS, CIOU is the first cross-over ratio, w_tWidth, h, representing the real box_tHigh, Confidence representing said real boxRepresenting a confidence of the prediction box.

9. A building detection device based on remote sensing images is characterized by comprising:

10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor, when executing the program, carries out the steps of the method for remote sensing image based building detection according to any of claims 1 to 8.