CN111161250A

CN111161250A - Multi-scale remote sensing image dense house detection method and device

Info

Publication number: CN111161250A
Application number: CN201911403728.8A
Authority: CN
Inventors: 徐其志; 齐子鹏
Original assignee: Beijing Yunzhi Aerospace Technology Co Ltd
Current assignee: Nanyao Technology Guangdong Co ltd
Priority date: 2019-12-31
Filing date: 2019-12-31
Publication date: 2020-05-15
Anticipated expiration: 2039-12-31
Also published as: CN111161250B

Abstract

The invention discloses a multi-scale remote sensing image dense house detection method and a device, wherein the method comprises the following steps: acquiring remote sensing images containing different levels of a house, and extracting the characteristics of the remote sensing images based on a convolution mode to obtain a characteristic diagram of the remote sensing images; adding a sobel operator into the characteristic diagram to obtain a remote sensing image characteristic diagram with enhanced house line characteristics; connecting the remote sensing image characteristic diagrams based on a direct connection structure in the U-Net network to obtain a final remote sensing image characteristic diagram; using up-sampling operation to keep the size of the final remote sensing image characteristic image consistent with that of the original image, and using convolution operation to decode the house characteristic image in the characteristic image after the up-sampling operation; and performing secondary classification on each pixel point in the house characteristic image by adopting convolution operation, and comparing the secondary classification with a preset threshold value to obtain house output results with different colors. The remote sensing image processed by the embodiment has large scale, and the house in the remote sensing image is detected with high precision.

Description

Multi-scale remote sensing image dense house detection method and device

Technical Field

The invention relates to the technical field of computers, in particular to the technical field of computer image processing, and particularly relates to a method and a device for detecting a dense house in a multi-scale remote sensing image.

Background

In the remote sensing image, the house presents a narrow-band-shaped characteristic, and the color characteristic of the house has a fixed mode and is obviously different from the background color. The remote sensing image of the Google earth can be used for obtaining a remote sensing image of up to 20 levels, the highest spatial resolution can be up to 0.27 meter, and a clear training sample can be provided for a depth model.

The deep learning is rapidly developed, and the deep learning has excellent performances in the fields of image classification, image segmentation, target recognition, target tracking and the like in the computer vision field. In the field of image segmentation, the turing prize winner Hinton proposes a depth model of the encoding-decoding structure. U-Net is the classic structure of encoding-decoding, and the network has excellent performance in segmenting medical images. However, due to the complex background of the remote sensing image, the difficulty of the house target extraction task is increased due to the different sizes of the houses.

Disclosure of Invention

In order to solve the technical problems, the invention aims to provide a multi-scale remote sensing image dense house detection method and device based on a U-Net network, so as to solve the problem of poor house detection accuracy in the existing remote sensing image processing method.

According to one embodiment of the invention, the invention provides a multi-scale remote sensing image dense house detection method based on a U-Net network, which comprises the following steps:

acquiring remote sensing images containing different levels of a house, and extracting the characteristics of the remote sensing images based on a convolution mode to obtain a characteristic diagram of the remote sensing images;

adding a sobel operator into the characteristic diagram to obtain a remote sensing image characteristic diagram with enhanced house line characteristics;

connecting the remote sensing image characteristic diagrams based on a direct connection structure in the U-Net network to obtain a final remote sensing image characteristic diagram;

using up-sampling operation to keep the size of the final remote sensing image characteristic image consistent with that of the original image, and using convolution operation to decode the house characteristic image in the characteristic image after the up-sampling operation; and

and performing secondary classification on each pixel point in the house characteristic image by adopting convolution operation, and comparing the secondary classification with a preset threshold value to obtain house output results with different colors.

Further, the step of obtaining the remote sensing image containing houses of different levels and extracting the remote sensing image features based on a convolution mode to obtain the feature map of the remote sensing image specifically comprises:

making an annotation image of the remote sensing image, wherein the result is that the house target pixel is marked as white, and the rest background pixels are marked as black, then preprocessing the annotation image to smooth the annotation result, and then sending the remote sensing image and the corresponding annotation image into a network for training;

respectively obtaining a specific area of a single channel in the remote sensing image characteristic diagram and calculation results of all areas through convolution operation; wherein, the convolution formula is:

G＝∑_C∑_i∑_jA(i，j，c)*w(i，j，c)

wherein i, j and c are variables of the length, the width and the channel direction of the image respectively, A is an original image, and W is a parameter in a convolution kernel;

and repeating the previous step of operation on all the convolution kernels to finally obtain the calculation results of all the channels of the remote sensing image characteristic diagram, thereby obtaining the characteristic diagram of the remote sensing image.

Further, the step of adding a sobel operator into the feature map to obtain the remote sensing image feature map with the enhanced house line feature further comprises the step of performing convolution operation on the 3 x 3 area of each pixel and the sobel operator to enhance the line feature in the remote sensing image feature map; and parameters in the sobel operator are dynamically changed according to the gradient value of the loss function in each training process.

Further, the step of connecting the remote sensing image feature maps based on the direct connection structure in the U-Net network to obtain a final remote sensing image feature map comprises:

the direct connection structure stacks the feature maps of the remote sensing images in the same level and then carries out convolution operation, or the direct connection structure firstly carries out up-sampling operation on the feature maps of the remote sensing images in different levels and then carries out convolution operation after stacking the new feature maps, so that the final remote sensing image feature map is obtained.

Further, the upsampling operation uses a quadratic interpolation mode, wherein the quadratic interpolation is formulated as:

f (x, y) ═ j +1-y) (i +1-x) F (i, j) + (j +1-y) (x-i) F (i +1, j) + (y-j) (i +1-x) F (i, j +1) + (y-j) (x-i) F (i +1, j +1), where F (x, y) is the pixel point value of the image after interpolation and F (i, j) is the pixel point value of the image before interpolation.

Further, the feature map after the quadratic interpolation is increased to twice of the original feature map in the length and width dimensions.

Further, the step of performing secondary classification on each pixel point in the house characteristic image by using convolution operation, and comparing the secondary classification with a preset threshold value to obtain house output images with different colors includes: secondly, each classified pixel point is a probability value of the house, if the probability value is larger than the threshold value, the pixel point is considered to be a pixel point in the house, and the pixel point is white in a final output image; and if the probability value is smaller than the threshold value, the pixel point is regarded as the pixel point in the background, and the final output image is black.

Further, the convolution mode is linear summation of pixel values in the remote sensing image and the characteristic image.

Further, the step of obtaining the calculation results of the specific area and all areas of a single channel in the remote sensing image feature map by convolution operation respectively comprises:

carrying out convolution operation on the convolution kernel and a specific area in each channel of the remote sensing image, and finally adding convolution results of all the channels to obtain a calculation result of the specific area of a single channel in the remote sensing image characteristic diagram;

and performing sliding operation on the length dimension and the width dimension of each channel of the remote sensing image by using the same convolution kernel to repeat the operation, so as to obtain the calculation results of all regions of a single channel in the characteristic diagram of the remote sensing image.

Further, the step of performing sliding operation on the same convolution kernel in the length dimension and the width dimension of each channel of the remote sensing image to repeat the operation to obtain the calculation results of all the areas of a single channel in the remote sensing image characteristic diagram further comprises the step of performing repeated convolution operation on the center of the convolution kernel at intervals of a plurality of pixel points in the length direction and the width direction of the remote sensing image characteristic diagram.

Further, the step of convolving the 3 × 3 region of each pixel with the sobel operator includes: and the sobel operator slides in the length dimension and the width dimension of the remote sensing image, and finally, the house edge characteristics in the characteristic diagram of the remote sensing image are extracted.

Further, the step of performing convolution operation after stacking the feature maps of the remote sensing images of the same hierarchy by the direct connection structure includes stacking the feature maps of the remote sensing images of the same hierarchy on a channel dimension, wherein the channel number of the new feature map is the sum of the channel numbers of the previous feature maps.

Or the direct connection structure firstly carries out up-sampling operation on the feature maps of the remote sensing images of different levels, and then carries out convolution operation after stacking the new feature maps, thereby obtaining the final remote sensing image feature map.

Further, the two classification functions use a sigmoid function, and the sigmoid function scales discrete values to be within a [0,1] interval, corresponding to the probability value.

Further, the sobel operator updates the parameter value by a gradient descent formula according to the network return gradient.

Extracting the characteristics of a house target in a remote sensing image through convolution operation to obtain a characteristic diagram of the remote sensing image; adding dynamic sobel operator in multiple convolution operations to enhance line characteristics in characteristic diagram

According to the method, the characteristic diagrams of the remote sensing images of different layers are connected by using various direct connection structures, so that the adaptability of the method for processing the multi-scale remote sensing images is improved; the upsampling operation keeps the size of the result graph consistent with that of the original graph, and the convolution operation is used for decoding the house characteristics of the characteristic graph after the upsampling operation. And performing secondary classification on each pixel point in the last layer of feature map. Each pixel is white in the result image of the house in the original image, and black in the result image of the background in the original image, and finally the result image with the same size as the original image is obtained.

According to another embodiment of the invention, there is also provided a multi-scale remote sensing image dense house detection device for performing the detection method, the device including:

the first characteristic diagram acquisition module is used for acquiring remote sensing images containing different levels of a house, and extracting the characteristics of the remote sensing images based on a convolution mode to obtain a characteristic diagram of the remote sensing images;

the second characteristic diagram acquisition module is used for adding a sobel operator into the characteristic diagram to obtain a remote sensing image characteristic diagram with enhanced house line characteristics;

the third characteristic diagram acquisition module is used for connecting the characteristic diagrams of the remote sensing images based on a direct connection structure in the U-Net network to obtain a final characteristic diagram of the remote sensing images;

the up-sampling module is used for keeping the size of the final remote sensing image feature map consistent with that of the original image;

the first convolution module is used for performing convolution operation on the feature map subjected to the upsampling operation to decode the house feature image in the feature map; and

and the two-classification module is used for performing two-classification on each pixel point in the house characteristic image by adopting convolution operation, and comparing the two-classification with a preset threshold value so as to obtain house output results with different colors.

According to another embodiment of the invention, there is also provided a multi-scale remote sensing image dense house detection device, including: a memory having computer instructions stored therein; and the processor is in data connection with the memory and executes the computer instructions so as to execute the multi-scale remote sensing image dense house detection method.

By adopting the technical scheme, the remote sensing image processed by the method has large scale, and the house detection precision in the remote sensing image is high.

Drawings

FIG. 1 is a network framework diagram in the multi-scale remote sensing image dense house detection method based on U-Net network of the present invention;

FIG. 2 is a flow chart of a multi-scale remote sensing image dense house detection method based on a U-Net network provided by the invention;

FIG. 3 is a gray level image of a remote sensing image of a house to be detected at 18 levels in an embodiment of the U-Net network-based multi-scale remote sensing image dense house detection method provided by the invention;

FIG. 4 is a diagram showing the detection result of the gray image of the remote sensing image of the house to be detected at 18 levels in an embodiment of the U-Net network-based multi-scale remote sensing image dense house detection method provided by the present invention;

FIG. 5 is a gray level image of a 19-level remote sensing image containing a house to be detected in an embodiment of the U-Net network based multi-scale remote sensing image dense house detection method provided by the invention;

fig. 6 is a detection result diagram of a 19-level gray image including a remote sensing image of a house to be detected in an embodiment of the U-Net network-based multi-scale remote sensing image dense house detection method provided by the present invention.

Detailed Description

For the convenience of understanding, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the embodiments and features of the embodiments may be combined with each other without conflict. The present invention will be described in detail below with reference to the embodiments with reference to the attached drawings.

In this embodiment, the electronic device on which the remote sensing image processing method is executed may extract features from the remote sensing image.

FIG. 1 shows a network framework diagram in a multi-scale remote sensing image dense house detection method based on a U-Net network, and FIG. 2 shows a flow chart of the multi-scale remote sensing image dense house detection method based on the U-Net network in the invention.

As shown in fig. 2, the method comprises the steps of:

s1, obtaining remote sensing images containing different levels of a house, and extracting the characteristics of the remote sensing images based on a convolution mode to obtain a characteristic diagram of the remote sensing images.

According to the embodiment of the present invention, step S1 specifically includes:

and S101, making an annotation graph of the remote sensing image, wherein the result is that the house target pixel is marked as white, and the rest background pixels are marked as black. And then, carrying out a series of preprocessing operations on the labeled image to smooth the labeling result, and finally sending the remote sensing image and the corresponding labeled image into a network for training.

According to the embodiment of the invention, the remote sensing images of different levels containing the house target are obtained, and the remote sensing images comprise the following contents: acquiring sensory images of different levels containing house targets; making an annotation graph of the remote sensing image; selecting a first channel in a hundred-degree pure graph in the same area as a house label image; carrying out a plurality of times of preprocessing operations on the house annotation image, wherein the preprocessing operations comprise a plurality of times of expansion, contraction, denoising and other operations; combining the remote sensing image and the house mark image into a training set, and selecting a part of the training set as a verification set; every 20 images and the corresponding annotation pictures are combined into a whole and sent into the network structure designed by the invention.

And S102, obtaining the calculation results of the specific area of a single channel and all areas in the remote sensing image feature map respectively through convolution operation.

Wherein, the convolution formula is:

G＝∑_C∑_i∑_jA(i，j，c)*w(i，j，c)，

wherein i, j, c are variables of the image length, width, and channel direction, respectively, a is the original image, and W is a parameter in the convolution kernel.

According to the embodiment of the invention, in the invention, the convolution operation extracts the training image characteristics, the 3-channel image is expanded into a 64-channel tensor form, and the new characteristic diagram is sent into four sets of resnet network modules.

And S103, repeating the operation of the step S102 on all convolution kernels, and finally obtaining the calculation results of all channels of the remote sensing image characteristic diagram, thereby obtaining the characteristic diagram of the remote sensing image.

According to the embodiment of the invention, in the invention, part of convolution operation extracts the training image characteristics, the channel number of the characteristic diagram is expanded, and the step length of the convolution kernel moving in the length and the wide latitude of the characteristic diagram is 2; and the other part of convolution operation is used for extracting the characteristic image characteristics, expanding the channel number of the characteristic image, and the step length of the convolution kernel moving in the length dimension and the width dimension of the characteristic image is 1.

The area covered by the convolution operation in this embodiment is usually 3 × 3 pixel areas, and as can be seen from the convolution formula, the 3 × 3 pixel areas of each channel are finally calculated as one value. The remote sensing image initial channel has three RGB channels, and the characteristic image of the remote sensing image has more channels after continuous convolution operation. Since each wrap operation adds all the channel convolution values, the entire convolution operation can be viewed as a linear sum of the regional pixel values.

The corresponding region of the convolution operation is a 3 × 3 pixel region in this embodiment. After convolution operation, the central point of convolution kernel moves a certain step length in the image length and width directions to calculate the adjacent area.

The step size of the shift in this embodiment is usually two options, including shifting by one pixel or shifting by two pixels. The case of moving one pixel is to move the center point of the convolution kernel to an adjacent pixel point. The case of moving two pixels is to move the center point of the convolution kernel to the spaced pixel points. The operation of shifting one pixel can obtain more detailed feature information, but at the same time, the amount of calculation and parameter storage of the convolution operation are increased. The operation of moving two pixels can effectively increase the calculation area of the convolution operation and reduce redundant information of the convolution result, but at the same time, the operation loses certain detailed information.

In the embodiment of the invention, convolution operation is carried out on the convolution kernel and the specific area in each channel of the remote sensing image, and finally the convolution results of all the channels are added to obtain the calculation result of the specific area of a single channel in the remote sensing image characteristic diagram, wherein the convolution operation can be regarded as linear summation of pixel values in the remote sensing image and the characteristic diagram, and complex linear characteristics of the remote sensing image can be extracted through multiple times of convolution operation.

And performing sliding operation on the length dimension and the width dimension of the same convolution kernel on each channel of the remote sensing image to obtain the calculation results of all regions of a single channel in the remote sensing image characteristic diagram. And repeating the two steps of operations on all the convolution kernels to finally obtain the calculation results of all the channels of the remote sensing image characteristic diagram. And the center of the convolution kernel performs repeated convolution operation on every other pixel point in the length direction and the width direction of the remote sensing image characteristic diagram.

Since the convolution operation is a linear addition of the regional pixel values, the problem of the non-linear space cannot be handled. For this reason, a non-linear activation function is usually added after the convolution operation. The nonlinear activation function has the functions of improving the expression capability of the model, better processing the nonlinear problem and improving the robustness of the model.

And S2, adding a sobel operator into the characteristic graph to obtain the remote sensing image characteristic graph with the enhanced house line characteristic.

The equation for the sobel operation here is:

wherein the content of the first and second substances,

a is the original and G is the result.

In this embodiment, on the basis of the feature map obtained in step 101, a dynamic sobel operation is added after the feature map is acquired. The sobel operator has excellent performance in the aspect of extracting the line texture features of the image. And after the sobel operator is added, the line contour characteristics of the house in the feature map of the remote sensing image can be highlighted.

In order to adapt to different degrees of house feature extraction of feature maps of different levels, the invention designs a sobel operator capable of dynamically changing parameters along with the levels of the feature maps and the training process. Because of the fixed proportional relation between the transverse gradient and the longitudinal gradient in the sobel operator, the sobel cannot adapt to the remote sensing image targets of different levels. The parameters of the dynamic sobel operator of the invention participate in the optimization process of the optimization function along with the network parameters.

In the embodiment of the invention, after the first two convolution operations, the convolution operation is carried out on the 3 x 3 region of each pixel and the sobel operator to enhance the line characteristics in the remote sensing image characteristic diagram, wherein the parameters in the sobel operator dynamically change according to the gradient value of the loss function in each training process, and the parameter values are updated by the sobel operator through a gradient descent formula according to the network returning gradient. Because the method can dynamically change the proportional relation between the transverse gradient and the longitudinal gradient in the sobel operator, the adaptability of the network to the house target in the image of different levels can be improved. The gradient descent formula used in the present invention is a random gradient descent formula. A local optimum point in the feature space will be better found with a random gradient.

And S3, connecting the remote sensing image characteristic graphs based on a direct connection structure in the U-Net network to obtain a final remote sensing image characteristic graph.

In the embodiment of the invention, a plurality of direct connection structures are used for connecting the characteristic diagrams of the remote sensing images of different layers, so that the adaptivity of the model for processing the multi-scale remote sensing images is improved. The convolution kernel at the shallow level extracts local features of the remote sensing image, and the operator receptive field of the convolution operation at the shallow level is smaller. The convolution kernel at the deep level extracts the global features of the remote sensing image, and the operator receptive field of the convolution operation at the deep level is larger.

Compared with the U-Net network structure, the original network structure only has a direct connection structure at the same level. The same level of direct connection structure stacks the same level of feature images in the channel dimension to increase the expressive power of the feature images on the house features. However, the straight-connected structure at the same level cannot represent detail information in the upper-layer feature image and global information in the lower-layer feature image.

Aiming at the problem, the invention integrates the upper layer feature map, the same layer feature map and the lower layer feature map into a new feature map together, and retains the detail information and the global information of the remote sensing image feature map. The step of performing convolution operation after stacking the feature maps of the remote sensing images of the same level by the direct connection structure comprises the step of stacking the feature maps of the remote sensing images of the same level on a channel dimension, wherein the channel number of the new feature map is the sum of the channel numbers of the previous feature maps.

The direct connection structure firstly carries out up-sampling operation on the feature maps of the remote sensing images of different levels, and then carries out convolution operation after stacking the new feature maps, so that the step of obtaining the final remote sensing image feature map comprises the following steps:

the upsampling operation adopts a secondary interpolation mode to sample feature maps of different levels to the same size;

carrying out up-sampling operation on the feature map of the low-level remote sensing image, wherein the length and the width of a new feature map are doubled;

stacking the remote sensing image feature maps of different levels in the channel direction, wherein the channel number of the new feature map is the sum of the channel numbers of the previous feature maps.

And S4, using up-sampling operation to enable the size of the final remote sensing image feature image to be consistent with that of the original image, and using convolution operation to decode the house feature image in the feature image after the up-sampling operation.

In an embodiment of the invention, an upsampling operation is used to keep the final result graph consistent with the original graph size, and a convolution operation is used to decode the house features in the feature graph after the upsampling operation. And performing secondary classification on each pixel point in the last layer of feature map.

In the present embodiment, the upsampling operation uses a quadratic interpolation method.

The formula of the quadratic interpolation is:

F(x，y)＝(j+1-y)(i+1-x)F(i，j)+(j+1-y)(x-i)F(i+1，j)+(y-j)(i+1-x)F(i，j+1)+(y-j)(x-i)F(i+1，j+1)

where F (x, y) is the pixel point value of the image after interpolation, and F (i, j) is the pixel point value of the image before interpolation.

In this embodiment, the feature map after the second interpolation is twice as large in the length and width dimensions as the original feature map.

And S5, performing secondary classification on each pixel point in the house characteristic image by adopting convolution operation, and comparing the secondary classification with a preset threshold value to obtain an output result.

In this embodiment, a convolution operation is used to perform a dimension reduction operation on the channel dimensions of the feature image. The dimension reduction operation continuously reduces the channel number of the feature image, and the convolution operation after the last up-sampling feature image integration operation carries out secondary classification on each pixel of the feature image. The function used for the second classification is a sigmoid function.

The sigmoid function is of the form:

the sigmoid function scales the discrete values to within the [0,1] interval, corresponding to the probability values.

The sigmoid function result is the probability that each pixel point is the target pixel. And finally, judging the relationship between the probability value and the artificially set threshold value. If the probability value is larger than the threshold value, the pixel point is considered as a pixel point in the house and is white in the final output image; and if the probability value is smaller than the threshold value, the pixel point is regarded as the pixel point in the background and is black in the final output image.

In the embodiment of the invention, the method further comprises the step of calculating the loss of the generated segmentation image and the annotation image of the original image by using a cross entropy loss function so as to finally obtain an optimized detection model, and storing the model for later use.

Meanwhile, fig. 3-6 show comparison graphs of the actually used remote sensing image and the detection result thereof in the scheme of the invention, and obviously, the remote sensing image processed by the embodiment has large scale and the house in the remote sensing image is detected with higher precision.

It will be evident to those skilled in the art that the embodiments of the present invention are not limited to the details of the foregoing illustrative embodiments, and that the embodiments of the present invention are capable of being embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the embodiments being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned. Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. Several units, modules or means recited in the system, apparatus or terminal claims may also be implemented by one and the same unit, module or means in software or hardware.

Finally, it should be noted that the above embodiments are only used for illustrating the technical solutions of the embodiments of the present invention and not for limiting, and although the embodiments of the present invention are described in detail with reference to the above preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions can be made on the technical solutions of the embodiments of the present invention without departing from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A multi-scale remote sensing image dense house detection method is characterized by comprising the following steps:

2. The method for detecting the multi-scale remote sensing image dense house according to claim 1, wherein the step of obtaining the remote sensing image containing houses of different levels and extracting the remote sensing image features based on a convolution mode to obtain the feature map of the remote sensing image specifically comprises the following steps:

G＝∑_c∑_i∑_jA(i,j,c)*W(i,j,c)

3. The method for detecting the dense house of the multi-scale remote sensing image according to claim 2, wherein the step of adding a sobel operator to the feature map to obtain the feature map of the remote sensing image with the enhanced house line features further comprises the step of performing convolution operation on a 3 x 3 area of each pixel and the sobel operator to enhance the line features in the feature map of the remote sensing image; and parameters in the sobel operator are dynamically changed according to the gradient value of the loss function in each training process.

4. The method for detecting the multi-scale remote sensing image dense house according to claim 1, wherein the step of connecting the remote sensing image feature maps based on a direct connection structure in a U-Net network to obtain a final remote sensing image feature map comprises the following steps:

5. The method for detecting the dense houses in the multi-scale remote sensing images according to claim 1, wherein the step of performing binary classification on each pixel point in the house characteristic images by adopting convolution operation and comparing the two pixel points with a preset threshold value to obtain house output images with different colors comprises the following steps: secondly, each classified pixel point is a probability value of the house, if the probability value is larger than the threshold value, the pixel point is considered to be a pixel point in the house, and the pixel point is white in a final output image; and if the probability value is smaller than the threshold value, the pixel point is regarded as the pixel point in the background, and the final output image is black.

6. The method for detecting the dense houses in the multi-scale remote sensing images as claimed in claim 2, wherein the step of obtaining the calculation results of the specific area and all areas of a single channel in the remote sensing image feature map by convolution operation comprises the following steps:

7. The method for detecting the multi-scale remote sensing image dense house according to claim 4, wherein the step of performing convolution operation after stacking feature maps of the remote sensing images at the same level by the direct connection structure comprises stacking the feature maps of the remote sensing images at the same level on a channel dimension, and the channel number of the new feature map is the sum of the channel numbers of the previous feature maps.

8. The method for detecting the multi-scale remote sensing image dense house according to claim 4, wherein the direct connection structure performs upsampling operation on the feature maps of the remote sensing images of different levels, and performs convolution operation after stacking the new feature maps, so as to obtain a final remote sensing image feature map.

9. A multi-scale remote sensing image dense house detection device is characterized by comprising:

a memory having computer instructions stored therein;

a processor in data connection with the memory for executing the computer instructions to perform the method for detecting a dense house according to any one of claims 1-8.

10. A multi-scale remote sensing image dense house detection device is characterized by comprising: