Disclosure of Invention
In order to solve the technical problems, the invention aims to provide a multi-scale remote sensing image dense house detection method and device based on a U-Net network, so as to solve the problem of poor house detection accuracy in the existing remote sensing image processing method.
According to one embodiment of the invention, the invention provides a multi-scale remote sensing image dense house detection method based on a U-Net network, which comprises the following steps:
acquiring remote sensing images containing different levels of a house, and extracting the characteristics of the remote sensing images based on a convolution mode to obtain a characteristic diagram of the remote sensing images;
adding a sobel operator into the characteristic diagram to obtain a remote sensing image characteristic diagram with enhanced house line characteristics;
connecting the remote sensing image characteristic diagrams based on a direct connection structure in the U-Net network to obtain a final remote sensing image characteristic diagram;
using up-sampling operation to keep the size of the final remote sensing image characteristic image consistent with that of the original image, and using convolution operation to decode the house characteristic image in the characteristic image after the up-sampling operation; and
and performing secondary classification on each pixel point in the house characteristic image by adopting convolution operation, and comparing the secondary classification with a preset threshold value to obtain house output results with different colors.
Further, the step of obtaining the remote sensing image containing houses of different levels and extracting the remote sensing image features based on a convolution mode to obtain the feature map of the remote sensing image specifically comprises:
making an annotation image of the remote sensing image, wherein the result is that the house target pixel is marked as white, and the rest background pixels are marked as black, then preprocessing the annotation image to smooth the annotation result, and then sending the remote sensing image and the corresponding annotation image into a network for training;
respectively obtaining a specific area of a single channel in the remote sensing image characteristic diagram and calculation results of all areas through convolution operation; wherein, the convolution formula is:
G=∑C∑i∑jA(i,j,c)*w(i,j,c)
wherein i, j and c are variables of the length, the width and the channel direction of the image respectively, A is an original image, and W is a parameter in a convolution kernel;
and repeating the previous step of operation on all the convolution kernels to finally obtain the calculation results of all the channels of the remote sensing image characteristic diagram, thereby obtaining the characteristic diagram of the remote sensing image.
Further, the step of adding a sobel operator into the feature map to obtain the remote sensing image feature map with the enhanced house line feature further comprises the step of performing convolution operation on the 3 x 3 area of each pixel and the sobel operator to enhance the line feature in the remote sensing image feature map; and parameters in the sobel operator are dynamically changed according to the gradient value of the loss function in each training process.
Further, the step of connecting the remote sensing image feature maps based on the direct connection structure in the U-Net network to obtain a final remote sensing image feature map comprises:
the direct connection structure stacks the feature maps of the remote sensing images in the same level and then carries out convolution operation, or the direct connection structure firstly carries out up-sampling operation on the feature maps of the remote sensing images in different levels and then carries out convolution operation after stacking the new feature maps, so that the final remote sensing image feature map is obtained.
Further, the upsampling operation uses a quadratic interpolation mode, wherein the quadratic interpolation is formulated as:
f (x, y) ═ j +1-y) (i +1-x) F (i, j) + (j +1-y) (x-i) F (i +1, j) + (y-j) (i +1-x) F (i, j +1) + (y-j) (x-i) F (i +1, j +1), where F (x, y) is the pixel point value of the image after interpolation and F (i, j) is the pixel point value of the image before interpolation.
Further, the feature map after the quadratic interpolation is increased to twice of the original feature map in the length and width dimensions.
Further, the step of performing secondary classification on each pixel point in the house characteristic image by using convolution operation, and comparing the secondary classification with a preset threshold value to obtain house output images with different colors includes: secondly, each classified pixel point is a probability value of the house, if the probability value is larger than the threshold value, the pixel point is considered to be a pixel point in the house, and the pixel point is white in a final output image; and if the probability value is smaller than the threshold value, the pixel point is regarded as the pixel point in the background, and the final output image is black.
Further, the convolution mode is linear summation of pixel values in the remote sensing image and the characteristic image.
Further, the step of obtaining the calculation results of the specific area and all areas of a single channel in the remote sensing image feature map by convolution operation respectively comprises:
carrying out convolution operation on the convolution kernel and a specific area in each channel of the remote sensing image, and finally adding convolution results of all the channels to obtain a calculation result of the specific area of a single channel in the remote sensing image characteristic diagram;
and performing sliding operation on the length dimension and the width dimension of each channel of the remote sensing image by using the same convolution kernel to repeat the operation, so as to obtain the calculation results of all regions of a single channel in the characteristic diagram of the remote sensing image.
Further, the step of performing sliding operation on the same convolution kernel in the length dimension and the width dimension of each channel of the remote sensing image to repeat the operation to obtain the calculation results of all the areas of a single channel in the remote sensing image characteristic diagram further comprises the step of performing repeated convolution operation on the center of the convolution kernel at intervals of a plurality of pixel points in the length direction and the width direction of the remote sensing image characteristic diagram.
Further, the step of convolving the 3 × 3 region of each pixel with the sobel operator includes: and the sobel operator slides in the length dimension and the width dimension of the remote sensing image, and finally, the house edge characteristics in the characteristic diagram of the remote sensing image are extracted.
Further, the step of performing convolution operation after stacking the feature maps of the remote sensing images of the same hierarchy by the direct connection structure includes stacking the feature maps of the remote sensing images of the same hierarchy on a channel dimension, wherein the channel number of the new feature map is the sum of the channel numbers of the previous feature maps.
Or the direct connection structure firstly carries out up-sampling operation on the feature maps of the remote sensing images of different levels, and then carries out convolution operation after stacking the new feature maps, thereby obtaining the final remote sensing image feature map.
Further, the two classification functions use a sigmoid function, and the sigmoid function scales discrete values to be within a [0,1] interval, corresponding to the probability value.
Further, the sobel operator updates the parameter value by a gradient descent formula according to the network return gradient.
Extracting the characteristics of a house target in a remote sensing image through convolution operation to obtain a characteristic diagram of the remote sensing image; adding dynamic sobel operator in multiple convolution operations to enhance line characteristics in characteristic diagram
According to the method, the characteristic diagrams of the remote sensing images of different layers are connected by using various direct connection structures, so that the adaptability of the method for processing the multi-scale remote sensing images is improved; the upsampling operation keeps the size of the result graph consistent with that of the original graph, and the convolution operation is used for decoding the house characteristics of the characteristic graph after the upsampling operation. And performing secondary classification on each pixel point in the last layer of feature map. Each pixel is white in the result image of the house in the original image, and black in the result image of the background in the original image, and finally the result image with the same size as the original image is obtained.
According to another embodiment of the invention, there is also provided a multi-scale remote sensing image dense house detection device for performing the detection method, the device including:
the first characteristic diagram acquisition module is used for acquiring remote sensing images containing different levels of a house, and extracting the characteristics of the remote sensing images based on a convolution mode to obtain a characteristic diagram of the remote sensing images;
the second characteristic diagram acquisition module is used for adding a sobel operator into the characteristic diagram to obtain a remote sensing image characteristic diagram with enhanced house line characteristics;
the third characteristic diagram acquisition module is used for connecting the characteristic diagrams of the remote sensing images based on a direct connection structure in the U-Net network to obtain a final characteristic diagram of the remote sensing images;
the up-sampling module is used for keeping the size of the final remote sensing image feature map consistent with that of the original image;
the first convolution module is used for performing convolution operation on the feature map subjected to the upsampling operation to decode the house feature image in the feature map; and
and the two-classification module is used for performing two-classification on each pixel point in the house characteristic image by adopting convolution operation, and comparing the two-classification with a preset threshold value so as to obtain house output results with different colors.
According to another embodiment of the invention, there is also provided a multi-scale remote sensing image dense house detection device, including: a memory having computer instructions stored therein; and the processor is in data connection with the memory and executes the computer instructions so as to execute the multi-scale remote sensing image dense house detection method.
By adopting the technical scheme, the remote sensing image processed by the method has large scale, and the house detection precision in the remote sensing image is high.
Detailed Description
For the convenience of understanding, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the embodiments and features of the embodiments may be combined with each other without conflict. The present invention will be described in detail below with reference to the embodiments with reference to the attached drawings.
In this embodiment, the electronic device on which the remote sensing image processing method is executed may extract features from the remote sensing image.
FIG. 1 shows a network framework diagram in a multi-scale remote sensing image dense house detection method based on a U-Net network, and FIG. 2 shows a flow chart of the multi-scale remote sensing image dense house detection method based on the U-Net network in the invention.
As shown in fig. 2, the method comprises the steps of:
s1, obtaining remote sensing images containing different levels of a house, and extracting the characteristics of the remote sensing images based on a convolution mode to obtain a characteristic diagram of the remote sensing images.
According to the embodiment of the present invention, step S1 specifically includes:
and S101, making an annotation graph of the remote sensing image, wherein the result is that the house target pixel is marked as white, and the rest background pixels are marked as black. And then, carrying out a series of preprocessing operations on the labeled image to smooth the labeling result, and finally sending the remote sensing image and the corresponding labeled image into a network for training.
According to the embodiment of the invention, the remote sensing images of different levels containing the house target are obtained, and the remote sensing images comprise the following contents: acquiring sensory images of different levels containing house targets; making an annotation graph of the remote sensing image; selecting a first channel in a hundred-degree pure graph in the same area as a house label image; carrying out a plurality of times of preprocessing operations on the house annotation image, wherein the preprocessing operations comprise a plurality of times of expansion, contraction, denoising and other operations; combining the remote sensing image and the house mark image into a training set, and selecting a part of the training set as a verification set; every 20 images and the corresponding annotation pictures are combined into a whole and sent into the network structure designed by the invention.
And S102, obtaining the calculation results of the specific area of a single channel and all areas in the remote sensing image feature map respectively through convolution operation.
Wherein, the convolution formula is:
G=∑C∑i∑jA(i,j,c)*w(i,j,c),
wherein i, j, c are variables of the image length, width, and channel direction, respectively, a is the original image, and W is a parameter in the convolution kernel.
According to the embodiment of the invention, in the invention, the convolution operation extracts the training image characteristics, the 3-channel image is expanded into a 64-channel tensor form, and the new characteristic diagram is sent into four sets of resnet network modules.
And S103, repeating the operation of the step S102 on all convolution kernels, and finally obtaining the calculation results of all channels of the remote sensing image characteristic diagram, thereby obtaining the characteristic diagram of the remote sensing image.
According to the embodiment of the invention, in the invention, part of convolution operation extracts the training image characteristics, the channel number of the characteristic diagram is expanded, and the step length of the convolution kernel moving in the length and the wide latitude of the characteristic diagram is 2; and the other part of convolution operation is used for extracting the characteristic image characteristics, expanding the channel number of the characteristic image, and the step length of the convolution kernel moving in the length dimension and the width dimension of the characteristic image is 1.
The area covered by the convolution operation in this embodiment is usually 3 × 3 pixel areas, and as can be seen from the convolution formula, the 3 × 3 pixel areas of each channel are finally calculated as one value. The remote sensing image initial channel has three RGB channels, and the characteristic image of the remote sensing image has more channels after continuous convolution operation. Since each wrap operation adds all the channel convolution values, the entire convolution operation can be viewed as a linear sum of the regional pixel values.
The corresponding region of the convolution operation is a 3 × 3 pixel region in this embodiment. After convolution operation, the central point of convolution kernel moves a certain step length in the image length and width directions to calculate the adjacent area.
The step size of the shift in this embodiment is usually two options, including shifting by one pixel or shifting by two pixels. The case of moving one pixel is to move the center point of the convolution kernel to an adjacent pixel point. The case of moving two pixels is to move the center point of the convolution kernel to the spaced pixel points. The operation of shifting one pixel can obtain more detailed feature information, but at the same time, the amount of calculation and parameter storage of the convolution operation are increased. The operation of moving two pixels can effectively increase the calculation area of the convolution operation and reduce redundant information of the convolution result, but at the same time, the operation loses certain detailed information.
In the embodiment of the invention, convolution operation is carried out on the convolution kernel and the specific area in each channel of the remote sensing image, and finally the convolution results of all the channels are added to obtain the calculation result of the specific area of a single channel in the remote sensing image characteristic diagram, wherein the convolution operation can be regarded as linear summation of pixel values in the remote sensing image and the characteristic diagram, and complex linear characteristics of the remote sensing image can be extracted through multiple times of convolution operation.
And performing sliding operation on the length dimension and the width dimension of the same convolution kernel on each channel of the remote sensing image to obtain the calculation results of all regions of a single channel in the remote sensing image characteristic diagram. And repeating the two steps of operations on all the convolution kernels to finally obtain the calculation results of all the channels of the remote sensing image characteristic diagram. And the center of the convolution kernel performs repeated convolution operation on every other pixel point in the length direction and the width direction of the remote sensing image characteristic diagram.
Since the convolution operation is a linear addition of the regional pixel values, the problem of the non-linear space cannot be handled. For this reason, a non-linear activation function is usually added after the convolution operation. The nonlinear activation function has the functions of improving the expression capability of the model, better processing the nonlinear problem and improving the robustness of the model.
And S2, adding a sobel operator into the characteristic graph to obtain the remote sensing image characteristic graph with the enhanced house line characteristic.
The equation for the sobel operation here is:
wherein the content of the first and second substances,
a is the original and G is the result.
In this embodiment, on the basis of the feature map obtained in step 101, a dynamic sobel operation is added after the feature map is acquired. The sobel operator has excellent performance in the aspect of extracting the line texture features of the image. And after the sobel operator is added, the line contour characteristics of the house in the feature map of the remote sensing image can be highlighted.
In order to adapt to different degrees of house feature extraction of feature maps of different levels, the invention designs a sobel operator capable of dynamically changing parameters along with the levels of the feature maps and the training process. Because of the fixed proportional relation between the transverse gradient and the longitudinal gradient in the sobel operator, the sobel cannot adapt to the remote sensing image targets of different levels. The parameters of the dynamic sobel operator of the invention participate in the optimization process of the optimization function along with the network parameters.
In the embodiment of the invention, after the first two convolution operations, the convolution operation is carried out on the 3 x 3 region of each pixel and the sobel operator to enhance the line characteristics in the remote sensing image characteristic diagram, wherein the parameters in the sobel operator dynamically change according to the gradient value of the loss function in each training process, and the parameter values are updated by the sobel operator through a gradient descent formula according to the network returning gradient. Because the method can dynamically change the proportional relation between the transverse gradient and the longitudinal gradient in the sobel operator, the adaptability of the network to the house target in the image of different levels can be improved. The gradient descent formula used in the present invention is a random gradient descent formula. A local optimum point in the feature space will be better found with a random gradient.
And S3, connecting the remote sensing image characteristic graphs based on a direct connection structure in the U-Net network to obtain a final remote sensing image characteristic graph.
In the embodiment of the invention, a plurality of direct connection structures are used for connecting the characteristic diagrams of the remote sensing images of different layers, so that the adaptivity of the model for processing the multi-scale remote sensing images is improved. The convolution kernel at the shallow level extracts local features of the remote sensing image, and the operator receptive field of the convolution operation at the shallow level is smaller. The convolution kernel at the deep level extracts the global features of the remote sensing image, and the operator receptive field of the convolution operation at the deep level is larger.
Compared with the U-Net network structure, the original network structure only has a direct connection structure at the same level. The same level of direct connection structure stacks the same level of feature images in the channel dimension to increase the expressive power of the feature images on the house features. However, the straight-connected structure at the same level cannot represent detail information in the upper-layer feature image and global information in the lower-layer feature image.
Aiming at the problem, the invention integrates the upper layer feature map, the same layer feature map and the lower layer feature map into a new feature map together, and retains the detail information and the global information of the remote sensing image feature map. The step of performing convolution operation after stacking the feature maps of the remote sensing images of the same level by the direct connection structure comprises the step of stacking the feature maps of the remote sensing images of the same level on a channel dimension, wherein the channel number of the new feature map is the sum of the channel numbers of the previous feature maps.
The direct connection structure firstly carries out up-sampling operation on the feature maps of the remote sensing images of different levels, and then carries out convolution operation after stacking the new feature maps, so that the step of obtaining the final remote sensing image feature map comprises the following steps:
the upsampling operation adopts a secondary interpolation mode to sample feature maps of different levels to the same size;
carrying out up-sampling operation on the feature map of the low-level remote sensing image, wherein the length and the width of a new feature map are doubled;
stacking the remote sensing image feature maps of different levels in the channel direction, wherein the channel number of the new feature map is the sum of the channel numbers of the previous feature maps.
And S4, using up-sampling operation to enable the size of the final remote sensing image feature image to be consistent with that of the original image, and using convolution operation to decode the house feature image in the feature image after the up-sampling operation.
In an embodiment of the invention, an upsampling operation is used to keep the final result graph consistent with the original graph size, and a convolution operation is used to decode the house features in the feature graph after the upsampling operation. And performing secondary classification on each pixel point in the last layer of feature map.
In the present embodiment, the upsampling operation uses a quadratic interpolation method.
The formula of the quadratic interpolation is:
F(x,y)=(j+1-y)(i+1-x)F(i,j)+(j+1-y)(x-i)F(i+1,j)+(y-j)(i+1-x)F(i,j+1)+(y-j)(x-i)F(i+1,j+1)
where F (x, y) is the pixel point value of the image after interpolation, and F (i, j) is the pixel point value of the image before interpolation.
In this embodiment, the feature map after the second interpolation is twice as large in the length and width dimensions as the original feature map.
And S5, performing secondary classification on each pixel point in the house characteristic image by adopting convolution operation, and comparing the secondary classification with a preset threshold value to obtain an output result.
In this embodiment, a convolution operation is used to perform a dimension reduction operation on the channel dimensions of the feature image. The dimension reduction operation continuously reduces the channel number of the feature image, and the convolution operation after the last up-sampling feature image integration operation carries out secondary classification on each pixel of the feature image. The function used for the second classification is a sigmoid function.
The sigmoid function is of the form:
the sigmoid function scales the discrete values to within the [0,1] interval, corresponding to the probability values.
The sigmoid function result is the probability that each pixel point is the target pixel. And finally, judging the relationship between the probability value and the artificially set threshold value. If the probability value is larger than the threshold value, the pixel point is considered as a pixel point in the house and is white in the final output image; and if the probability value is smaller than the threshold value, the pixel point is regarded as the pixel point in the background and is black in the final output image.
In the embodiment of the invention, the method further comprises the step of calculating the loss of the generated segmentation image and the annotation image of the original image by using a cross entropy loss function so as to finally obtain an optimized detection model, and storing the model for later use.
Meanwhile, fig. 3-6 show comparison graphs of the actually used remote sensing image and the detection result thereof in the scheme of the invention, and obviously, the remote sensing image processed by the embodiment has large scale and the house in the remote sensing image is detected with higher precision.
According to another embodiment of the invention, there is also provided a multi-scale remote sensing image dense house detection device for performing the detection method, the device including:
the first characteristic diagram acquisition module is used for acquiring remote sensing images containing different levels of a house, and extracting the characteristics of the remote sensing images based on a convolution mode to obtain a characteristic diagram of the remote sensing images;
the second characteristic diagram acquisition module is used for adding a sobel operator into the characteristic diagram to obtain a remote sensing image characteristic diagram with enhanced house line characteristics;
the third characteristic diagram acquisition module is used for connecting the characteristic diagrams of the remote sensing images based on a direct connection structure in the U-Net network to obtain a final characteristic diagram of the remote sensing images;
the up-sampling module is used for keeping the size of the final remote sensing image feature map consistent with that of the original image;
the first convolution module is used for performing convolution operation on the feature map subjected to the upsampling operation to decode the house feature image in the feature map; and
and the two-classification module is used for performing two-classification on each pixel point in the house characteristic image by adopting convolution operation, and comparing the two-classification with a preset threshold value so as to obtain house output results with different colors.
According to another embodiment of the invention, there is also provided a multi-scale remote sensing image dense house detection device, including: a memory having computer instructions stored therein; and the processor is in data connection with the memory and executes the computer instructions so as to execute the multi-scale remote sensing image dense house detection method.
It will be evident to those skilled in the art that the embodiments of the present invention are not limited to the details of the foregoing illustrative embodiments, and that the embodiments of the present invention are capable of being embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the embodiments being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned. Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. Several units, modules or means recited in the system, apparatus or terminal claims may also be implemented by one and the same unit, module or means in software or hardware.
Finally, it should be noted that the above embodiments are only used for illustrating the technical solutions of the embodiments of the present invention and not for limiting, and although the embodiments of the present invention are described in detail with reference to the above preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions can be made on the technical solutions of the embodiments of the present invention without departing from the spirit and scope of the technical solutions of the embodiments of the present invention.