Disclosure of Invention
In order to solve the technical problems, the invention aims to provide a multi-scale remote sensing image dense house detection method and device based on a U-Net network, so as to solve the problem of poor house detection precision in the existing remote sensing image processing method.
According to one embodiment of the invention, the invention provides a multi-scale remote sensing image dense house detection method based on a U-Net network, which comprises the following steps:
acquiring remote sensing images containing different levels of houses, and extracting the characteristics of the remote sensing images based on a convolution mode to obtain a characteristic map of the remote sensing images;
adding a sobel operator into the feature map to obtain a remote sensing image feature map with enhanced house line features;
connecting the remote sensing image feature images based on a direct connection structure in the U-Net network to obtain a final remote sensing image feature image;
the up-sampling operation is used for keeping the sizes of the final remote sensing image feature images and the original images consistent, and the convolution operation is used for decoding house feature images in the feature images after the up-sampling operation; and
and carrying out two classifications on each pixel point in the house characteristic image by adopting convolution operation, and comparing the two classifications with a preset threshold value, thereby obtaining house output results with different colors.
Further, the step of obtaining the remote sensing image including the houses with different levels and extracting the characteristics of the remote sensing image based on the convolution mode to obtain the characteristic map of the remote sensing image specifically comprises the following steps:
making a labeling graph of the remote sensing image, wherein the result is that the house target pixel is marked as white, the rest background pixels are marked as black, then preprocessing the labeling image to smooth the labeling result, and then sending the remote sensing image and the corresponding labeling image into a network for training;
respectively obtaining the calculation results of the specific areas and all areas of a single channel in the remote sensing image feature map through convolution operation; wherein, the convolution formula is:
G=∑ C ∑ i ∑ j A(i,j,c)*w(i,j,c)
wherein i, j, c are variables of image length, width and channel direction respectively, A is original image, and W is a parameter in convolution kernel;
repeating the operation of the previous step by all convolution kernels to finally obtain the calculation results of all channels of the characteristic map of the remote sensing image, thereby obtaining the characteristic map of the remote sensing image.
Further, the step of adding a sobel operator into the feature map to obtain a remote sensing image feature map with enhanced house line features further comprises the step of convolving 3*3 areas of each pixel with the sobel operator to achieve the line features in the enhanced remote sensing image feature map; the parameters in the sobel operator are dynamically changed according to the gradient value of the loss function in each training process.
Further, the step of connecting the remote sensing image feature map based on the direct connection structure in the U-Net network to obtain a final remote sensing image feature map comprises the following steps:
the direct connection structure stacks the characteristic images of the remote sensing images of the same level and then carries out convolution operation, or the direct connection structure carries out up-sampling operation on the characteristic images of the remote sensing images of different levels, stacks the new characteristic images and then carries out convolution operation, so that the final characteristic image of the remote sensing image is obtained.
Further, the upsampling operation uses a quadratic interpolation method, wherein the quadratic interpolation formula is:
f (x, y) = (j+1-y) (i+1-x) F (i, j) + (j+1-y) (x-i) F (i+1, j) + (y-j) (i+1-x) F (i, j+1) + (y-j) (x-i) F (i+1, j+1), wherein F (x, y) is the pixel value of the image after interpolation and F (i, j) is the pixel value of the image before interpolation.
Further, the feature map after the quadratic interpolation is increased to twice the original in the length and width dimensions.
Further, the step of performing two classifications on each pixel point in the house characteristic image by adopting convolution operation and comparing the two classifications with a preset threshold value so as to obtain house output images with different colors comprises the following steps: each pixel after the second classification is a probability value of a house, and if the probability value is larger than the threshold value, the pixel is considered to be a pixel in the house, and the pixel is considered to be white in a final output image; and if the probability value is smaller than the threshold value, the pixel point is considered to be a pixel point in the background, and the pixel point is black in the final output image.
Further, the convolution mode is linear addition of pixel values in the remote sensing image and the feature image.
Further, the step of obtaining the calculation results of the specific area and all areas of the single channel in the remote sensing image feature map through convolution operation respectively includes:
carrying out convolution operation on the convolution kernel and a specific region in each channel of the remote sensing image, and finally adding convolution results of all channels to obtain a calculation result of the specific region of a single channel in the characteristic diagram of the remote sensing image;
and the same convolution kernel performs sliding operation on the long dimension and the wide dimension of each channel of the remote sensing image to repeat the prosecution operation, so that the calculation results of all areas of a single channel in the characteristic diagram of the remote sensing image are obtained.
Further, the step of performing sliding operation on the long dimension and the wide dimension of each channel of the remote sensing image by the same convolution kernel to repeat the above operation, and obtaining the calculation results of all areas of a single channel in the remote sensing image feature map further comprises the step of performing repeated convolution operation on the center of the convolution kernel at intervals of a plurality of pixel points in the long direction and the wide direction of the remote sensing image feature map.
Further, the step of convolving the 3*3 region of each pixel with a sobel operator includes: and the sobel operator slides in the length and width dimensions of the remote sensing image, and finally, the house edge features in the feature map of the remote sensing image are extracted.
Further, the step of stacking the feature images of the same-level remote sensing images by the direct connection structure and then performing convolution operation includes stacking the feature images of the same-level remote sensing images on a channel dimension, wherein the number of channels of the new feature images is the sum of the number of channels of the previous feature images.
Or the direct connection structure performs up-sampling operation on the feature images of the remote sensing images of different levels, stacks the new feature images and then performs convolution operation, so that the final feature image of the remote sensing image is obtained.
Further, the binary classification function uses a sigmoid function, and the sigmoid function scales discrete values into a [0,1] interval and corresponds to the probability value.
Further, the sobel operator updates the parameter value by a gradient descent formula according to the network return gradient.
The method extracts the characteristics of the house target in the remote sensing image through convolution operation to obtain a characteristic image of the remote sensing image; adding dynamic sobel operator in multiple convolution operation to enhance linear features in feature map
According to the method, the feature images of the remote sensing images of different layers are connected by using a plurality of direct connection structures, so that the adaptability of the method for processing the multi-scale remote sensing images is improved; the upsampling operation keeps the resulting map consistent with the original size and the house features of the feature map are decoded after the upsampling operation using a convolution operation. And carrying out two classifications on each pixel point in the final layer of feature map. Each pixel is white in the original image and the house in the result image, and is black in the result image with the background in the original image, and finally the result image with the same size as the original image is obtained.
According to another embodiment of the present invention, there is also provided a multi-scale remote sensing image dense house detection apparatus for performing the above detection method, the apparatus including:
the first feature map acquisition module is used for acquiring remote sensing images containing different levels of houses, and extracting features of the remote sensing images based on a convolution mode so as to obtain feature maps of the remote sensing images;
the second feature map acquisition module is used for adding a sobel operator into the feature map to obtain a remote sensing image feature map with enhanced house line features;
the third feature map acquisition module is used for connecting the feature maps of the remote sensing images based on a direct connection structure in the U-Net network to obtain a final feature map of the remote sensing images;
the up-sampling module is used for keeping the final remote sensing image feature map consistent with the original map in size;
the first convolution module is used for carrying out convolution operation on the feature images after the up-sampling operation to decode house feature images in the feature images; and
and the classification module is used for performing two classifications on each pixel point in the house characteristic image by adopting convolution operation and comparing the two classifications with a preset threshold value so as to obtain house output results with different colors.
According to another embodiment of the present invention, there is also provided a multi-scale remote sensing image dense house detection apparatus, including: a memory having stored therein computer instructions; and the processor is in data connection with the memory, and executes the computer instructions so as to execute the multi-scale remote sensing image dense house detection method.
By adopting the technical scheme, the remote sensing image processed by the method has large scale and high precision in detecting houses in the remote sensing image.
Detailed Description
For ease of understanding, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is apparent that the described embodiments are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
It should be noted that, without conflict, the embodiments of the present invention and features of the embodiments may be combined with each other. The invention will be described in detail below with reference to the drawings in connection with embodiments.
In this embodiment, the electronic device on which the remote sensing image processing method operates may extract features from the remote sensing image.
Fig. 1 shows a network frame diagram in a multi-scale remote sensing image dense house detection method based on a U-Net network, and fig. 2 shows a flow chart of the multi-scale remote sensing image dense house detection method based on the U-Net network.
As shown in fig. 2, the method comprises the steps of:
s1, acquiring remote sensing images containing different levels of houses, and extracting the characteristics of the remote sensing images based on a convolution mode to obtain a characteristic map of the remote sensing images.
According to an embodiment of the present invention, step S1 specifically includes:
s101, making a label graph of the remote sensing image, wherein the result is that the house target pixel is marked as white, and the rest background pixels are marked as black. And performing a series of preprocessing operations on the marked images to smooth the marked results, and finally sending the remote sensing images and the corresponding marked images into a network for training.
According to an embodiment of the present invention, acquiring remote sensing images of different levels containing house objects includes: acquiring different levels of sensory images containing house targets; making a label graph of the remote sensing image; selecting a first channel in the hundred-degree pure graph of the same area as a house labeling image; carrying out pretreatment operation on the house labeling image for a plurality of times, wherein the pretreatment operation comprises operations of expansion, contraction, denoising and the like for a plurality of times; combining the remote sensing image and the house annotation image into a training set, and selecting a part of the training set as a verification set; every 20 images and corresponding label pictures are combined into a whole and sent into the network structure designed by the invention.
S102, respectively obtaining the calculation results of the specific areas and all areas of a single channel in the remote sensing image feature map through convolution operation.
Wherein, the convolution formula is:
G=∑ C ∑ i ∑ j A(i,j,c)*w(i,j,c),
wherein i, j, c are variables of image length, width and channel direction respectively, A is original image, and W is parameter in convolution kernel.
According to the embodiment of the invention, the convolution operation extracts the characteristics of the training image, expands the 3-channel image into a 64-channel tensor form, and sends the new characteristic diagram into four groups of network modules.
S103, repeating the operation of the step S102 on all convolution kernels, and finally obtaining the calculation results of all channels of the remote sensing image feature map, so as to obtain the remote sensing image feature map.
According to the embodiment of the invention, a part of convolution operation extracts the characteristics of the training image, the number of channels of the characteristic image is expanded, and the step length of the convolution kernel moving in the length and the width of the characteristic image is 2; the other part of convolution operation improves the characteristic image characteristics, expands the channel quantity of the characteristic image, and the step length of the convolution kernel moving in the length and width dimensions of the characteristic image is 1.
The area covered by the convolution operation in this embodiment is typically 3*3 pixel area, and as known from the convolution formula, the 3*3 pixel area of each channel is finally calculated as a numerical value. The initial channels of the remote sensing image have three RGB channels, and the characteristic images of the remote sensing image have more channels after continuous convolution operation. Since all the channel convolution values are added per convolution operation, the entire convolution operation can be considered as a linear addition of the region pixel values.
The corresponding area of the convolution operation in this embodiment is the 3*3 pixel area. After the convolution operation, the center point of the convolution kernel moves by a certain step length in the length and width directions of the image to calculate the adjacent area.
The step size of the movement in this embodiment is generally two options, including one pixel movement or two pixels movement. A case of shifting one pixel is to shift the center point of the convolution kernel to an adjacent pixel point. The case of moving two pixels is to move the center point of the convolution kernel to the spaced pixel point. The operation of shifting by one pixel can obtain finer feature information, but at the same time, the calculation amount of the convolution operation and the parameter storage amount are increased. The operation of shifting two pixels can effectively increase the calculation area of the convolution operation and reduce the redundant information of the convolution result, but at the same time, the operation can lose information of a certain detail.
In the embodiment of the invention, convolution operation is carried out on the convolution kernel and the specific area in each channel of the remote sensing image, and finally, the convolution results of all channels are added to obtain the calculation result of the specific area of a single channel in the characteristic image of the remote sensing image, wherein the convolution operation can be regarded as linear addition of the pixel values in the remote sensing image and the characteristic image, and complex linear characteristics of the remote sensing image can be extracted through multiple convolution operations.
And the same convolution kernel performs sliding operation on the long and wide dimensions of each channel of the remote sensing image to obtain the calculation results of all areas of a single channel in the characteristic diagram of the remote sensing image. And repeating the two steps of operation on all convolution kernels to finally obtain the calculation results of all channels of the remote sensing image feature map. The center of the convolution kernel performs repeated convolution operation on every other plurality of pixel points in the length direction and the width direction of the remote sensing image feature map.
Since the convolution operation is a linear addition of regional pixel values, the problem of non-linear space cannot be addressed. For this purpose, a nonlinear activation function is typically added after the convolution operation. The nonlinear activation function has the function of improving the expression capacity of the model, better processing nonlinear problems and improving the robustness of the model.
S2, adding a sobel operator into the feature map to obtain a remote sensing image feature map with enhanced house line features.
The equation for the sobel operation here is:
wherein, the liquid crystal display device comprises a liquid crystal display device,
a is original image, and G is result.
In this embodiment, on the basis of the feature map obtained in step 101, a dynamic sobel operation is added after the feature map is obtained. The sobel operator has excellent performance in the aspect of extracting the image line texture characteristics. After the sobel operator is added, the line outline features of the house in the feature map of the remote sensing image can be highlighted.
In order to adapt to different degrees of feature extraction of feature graphs of different levels on house features, the invention designs a sobel operator capable of dynamically changing parameters along with the levels of the feature graphs and the training process. Because of the fixed proportional relation between the transverse gradient and the longitudinal gradient in the sobel operator, sobel cannot adapt to remote sensing map targets of different levels. The parameters of the dynamic sobel operator of the invention will participate in the optimization process of the optimization function along with the network parameters.
In the embodiment of the invention, after the former two convolution operations, the 3*3 area of each pixel is convolved with a sobel operator to achieve the enhancement of the line characteristics in the remote sensing image characteristic diagram, wherein parameters in the sobel operator are dynamically changed according to the gradient value of the loss function in each training process, and the sobel operator updates the parameter value according to the gradient descent formula by the network return gradient. The method can dynamically change the proportional relation between the transverse gradient and the longitudinal gradient in the sobel operator, so that the adaptability of the network to house targets in images of different levels can be improved. The gradient descent formula used in the invention is a random gradient descent formula. The local optimum in feature space will be better found with random gradients.
S3, connecting the remote sensing image feature images based on a direct connection structure in the U-Net network to obtain a final remote sensing image feature image.
In the embodiment of the invention, the feature images of the remote sensing images of different layers are connected by using a plurality of direct connection structures, so that the adaptability of the model to process the multi-scale remote sensing images is improved. The shallow convolution kernel extracts local features of the remote sensing image because the operator receptive field of the convolution operation at the shallow level is smaller. The deep level convolution kernel extracts global features of the remote sensing image because the operator receptive field of the convolution operation at the deep level is larger.
Compared with the U-Net network structure, the original network structure only has the direct connection structure with the same level. The same-level direct connection structure stacks peer feature images in the channel dimension to increase the expressive force of the feature images on house features. However, the same-level straight-join structure cannot characterize the detail information in the upper-level feature image and the global information in the lower-level feature image.
Aiming at the problem, the invention integrates the upper layer feature map, the same layer feature map and the lower layer feature map into the new feature map together, and reserves the detail information and the global information of the remote sensing image feature map. The step of performing convolution operation after stacking the feature images of the same-level remote sensing images by the direct connection structure comprises stacking the feature images of the same-level remote sensing images on a channel dimension, wherein the number of channels of the new feature images is the sum of the number of channels of the previous feature images.
The step of performing up-sampling operation on the feature images of the remote sensing images of different levels by the direct connection structure, stacking new feature images, and then performing convolution operation to obtain a final feature image of the remote sensing image comprises the following steps:
the up-sampling operation adopts a secondary interpolation mode to sample the feature images of different levels to the same size;
performing up-sampling operation on the feature images of the low-level remote sensing images, wherein the length and the width of the new feature images are doubled;
and stacking the remote sensing image feature images of different levels in the channel direction, wherein the number of channels of the new feature image is the sum of the number of channels of the previous feature image.
S4, enabling the final remote sensing image feature image to be consistent with the original image in size through up-sampling operation, and decoding house feature images in the feature image through convolution operation after the up-sampling operation.
In an embodiment of the invention, an upsampling operation is used such that the final result graph is consistent with the original graph size, and a convolution operation is used to decode the house features in the feature graph after the upsampling operation. And carrying out two classifications on each pixel point in the final layer of feature map.
In this embodiment, the upsampling operation uses a quadratic interpolation method.
The formula of the quadratic interpolation is:
F(x,y)=(j+1-y)(i+1-x)F(i,j)+(j+1-y)(x-i)F(i+1,j)+(y-j)(i+1-x)F(i,j+1)+(y-j)(x-i)F(i+1,j+1)
wherein F (x, y) is the pixel value of the image after interpolation, and F (i, j) is the pixel value of the image before interpolation.
In this embodiment, the feature map after the quadratic interpolation is doubled in the length and width dimensions.
S5, performing two-classification on each pixel point in the house characteristic image by adopting convolution operation, and comparing the two-classification with a preset threshold value to obtain an output result.
In this embodiment, the convolution operation is used to perform the dimension reduction operation on the channel dimension of the feature image. The dimension reduction operation continuously reduces the number of channels of the feature image, and the convolution operation after the last up-sampling feature image integration operation carries out two classification on each pixel of the feature image. The function used for the two classifications is a sigmoid function.
The form of the sigmoid function is:
the sigmoid function scales the discrete values to within the [0,1] interval, corresponding to the probability values.
The sigmoid function results in a probability that each pixel point is the target pixel. Finally, the relation between the probability value and the manually set threshold value is judged. If the probability value is greater than the threshold value, the pixel point is considered to be a pixel point in the house, and the pixel point is considered to be white in the final output image; if the probability value is smaller than the threshold value, the pixel is considered to be a pixel in the background, and black in the final output image.
In the embodiment of the invention, the method further comprises the step of calculating the loss of the generated segmentation image and the labeling image of the original image by using the cross entropy loss function so as to finally obtain an optimized detection model, and storing the model for use.
Meanwhile, fig. 3-6 show the actual use remote sensing image and the comparison chart of the detection result of the remote sensing image according to the scheme of the invention, and obviously, the remote sensing image processed by the embodiment has large scale and the precision of detecting houses in the remote sensing image is higher.
According to another embodiment of the present invention, there is also provided a multi-scale remote sensing image dense house detection apparatus for performing the above detection method, the apparatus including:
the first feature map acquisition module is used for acquiring remote sensing images containing different levels of houses, and extracting features of the remote sensing images based on a convolution mode so as to obtain feature maps of the remote sensing images;
the second feature map acquisition module is used for adding a sobel operator into the feature map to obtain a remote sensing image feature map with enhanced house line features;
the third feature map acquisition module is used for connecting the feature maps of the remote sensing images based on a direct connection structure in the U-Net network to obtain a final feature map of the remote sensing images;
the up-sampling module is used for keeping the final remote sensing image feature map consistent with the original map in size;
the first convolution module is used for carrying out convolution operation on the feature images after the up-sampling operation to decode house feature images in the feature images; and
and the classification module is used for performing two classifications on each pixel point in the house characteristic image by adopting convolution operation and comparing the two classifications with a preset threshold value so as to obtain house output results with different colors.
According to another embodiment of the present invention, there is also provided a multi-scale remote sensing image dense house detection apparatus, including: a memory having stored therein computer instructions; and the processor is in data connection with the memory, and executes the computer instructions so as to execute the multi-scale remote sensing image dense house detection method.
It will be evident to those skilled in the art that the embodiments of the invention are not limited to the details of the foregoing illustrative embodiments, and that the embodiments of the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of embodiments being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned. Furthermore, it is evident that the word "comprising" does not exclude other elements or steps, and that the singular does not exclude a plurality. A plurality of units, modules or means recited in a system, means or terminal claim may also be implemented by means of software or hardware by means of one and the same unit, module or means.
Finally, it should be noted that the above-mentioned embodiments are merely for illustrating the technical solution of the embodiment of the present invention, and not for limiting, and although the embodiment of the present invention has been described in detail with reference to the above-mentioned preferred embodiments, it should be understood by those skilled in the art that modifications and equivalent substitutions can be made to the technical solution of the embodiment of the present invention without departing from the spirit and scope of the technical solution of the embodiment of the present invention.