CN115909088A

CN115909088A - Optical remote sensing image target detection method based on super-resolution feature aggregation

Info

Publication number: CN115909088A
Application number: CN202211628874.2A
Authority: CN
Inventors: 徐洋; 壮志岩; 吴泽彬; 韦志辉
Original assignee: Nanjing University of Science and Technology
Current assignee: Nanjing University of Science and Technology
Priority date: 2022-12-18
Filing date: 2022-12-18
Publication date: 2023-04-04

Abstract

The invention discloses an optical remote sensing image target detection method based on super-resolution feature aggregation, which comprises the following steps: the optical remote sensing image is subjected to down-sampling and input into an image super-resolution network, and training is carried out to obtain a super-resolution network of a SISR module; inputting the optical remote sensing image into a SISR module to obtain a super-resolution image; inputting the optical remote sensing image into a backbone module used for extracting image characteristic information to generate multi-scale and multi-level characteristics; introducing an auxiliary characteristic super-resolution module to reconstruct a high-resolution image; performing fusion of multi-scale features by using an auxiliary feature super-resolution module to obtain super-resolution features for detection; and the super-resolution features are fused with the feature graphs for detecting the small targets in the feature pyramid, and the super-resolution features for detection are obtained through a feature aggregation module.

Description

Optical remote sensing image target detection method based on super-resolution feature aggregation

Technical Field

The invention relates to the technical field of image processing, in particular to an optical remote sensing image target detection method based on super-resolution feature aggregation.

Background

The traditional optical remote sensing image target detection algorithm is a method taking region selection-feature extraction-classifier as a main line. However, as the number of satellites increases, the revisit period is shortened, and the image resolution is improved, the data volume of the remote sensing image is larger and larger, and the data volume is increased as new satellites are lifted. The traditional target detection algorithm is difficult to meet the requirements of real-time performance and accuracy, and with the development of deep learning, the deep neural network is widely applied to a remote sensing image target detection task due to the strong capability of automatically extracting features. Compared with the traditional optical remote sensing image target detection algorithm, the method based on deep learning overcomes the defects of low adaptability, high updating requirement on a background model, poor robustness of extracted features, poor detection real-time performance and the like of the traditional detection algorithm.

The optical remote sensing image is different from a common optical image, and the difference is mainly reflected on a sensor and a shooting angle. Most of things shot by the common optical image are distributed in the vertical direction, often, the target occupies a large area of the whole image, and semantic information of the whole image is simple. The situation of the optical remote sensing image is much more complicated, and the problems brought by the shooting angle above the earth surface are as follows: the target direction and size are variable, the targets are densely arranged, and a complex background area occupies a larger area of the whole image. Based on various characteristics of optical remote sensing images, a classic target detection algorithm based on a convolutional neural network cannot well detect the remote sensing images.

Disclosure of Invention

The invention aims to provide a target detection method of an optical remote sensing image based on super-resolution feature aggregation, which utilizes a convolutional neural network to extract features of the optical remote sensing image, fuses shallow features and deep features for super-resolution, and obtains a restored super-resolution image and super-resolution features. And the hyper-resolution image is used as auxiliary information, and the hyper-resolution characteristics and the detection layer characteristics are subjected to characteristic aggregation to detect and improve the precision.

The purpose of the invention can be achieved by adopting the following technical scheme: in a first aspect, the invention provides a method for detecting an optical remote sensing image target based on super-resolution feature aggregation, which comprises the following steps:

s1, down-sampling an optical remote sensing image, inputting the optical remote sensing image into an image super-resolution network, and training to obtain a network serving as a super-resolution network of a SISR module;

s2, inputting the optical remote sensing image into a SISR module to obtain a super-resolution image;

s3, inputting the optical remote sensing image into a backbone module used for extracting image characteristic information to generate multi-scale and multi-level characteristics;

s4, introducing an auxiliary feature super-resolution module by utilizing multi-scale features, reconstructing a high-resolution image, and comparing the reconstructed high-resolution image with the super-resolution image obtained by the SISR module in the S2 so as to guide the learning of a backbone network in a spatial dimension;

s5, fusing multi-scale features by using an auxiliary feature super-resolution module to obtain super-resolution features for detection;

s6, a multi-scale feature generated by a backbone module is utilized, a feature pyramid network is introduced, feature graphs for detecting large, medium and small targets are obtained, the super-resolution feature is fused with the feature graph for detecting the small target in the feature pyramid, and the super-resolution feature for detection is obtained through a feature aggregation module; and inputting the characteristic diagrams and the super-resolution characteristics for detecting the large, medium and small targets into a detection head together for target detection to obtain a final detection result.

In a second aspect, the present invention provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method of the first aspect when executing the program.

In a third aspect, the invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the method of the first aspect.

In a fourth aspect, the invention provides a computer program product comprising a computer program which, when executed by a processor, performs the steps of the method of the first aspect.

The method fully utilizes the information of different characteristic diagrams in the convolutional neural network, designs the image super-resolution module and the characteristic super-resolution aggregation module to enrich the characteristic information of a smaller target in the optical remote sensing image, utilizes the super-resolution image to perform auxiliary adjustment on the network to obtain an enhanced characteristic diagram, and finally performs detection on the multi-scale characteristic diagram to obtain a final detection result, is suitable for an optical remote sensing target detection scene, and has the following advantages:

(1) Aiming at the situation that the target size in the optical remote sensing image is changeable and dense, a detection layer is added for detecting a small target through an image characteristic super-resolution aggregation module, and the detection precision is improved.

(2) Aiming at the conditions that the target background in the optical remote sensing image is complex and the small target features are small, the small target features of the image are enhanced on the basis of reducing certain calculation cost through the image auxiliary super-resolution module.

Drawings

FIG. 1 is a flow chart of the optical remote sensing image target detection method based on super-resolution feature aggregation.

Fig. 2 is a diagram of a model architecture according to the present invention.

FIG. 3 is a flow chart of a single-image super-resolution SISR module according to the present invention.

FIG. 4 is a flow diagram of an object detection module in accordance with the present invention.

Fig. 5 is a graph showing the result of target detection according to the present invention.

Detailed Description

In order to make the technical solutions of the present invention more clear and definite for those skilled in the art, the present invention is further described in detail below with reference to the following examples and the accompanying drawings, but the embodiments of the present invention are not limited thereto.

As shown in fig. 1 to fig. 5, the method for detecting a target in an optical remote sensing image with aggregated super-resolution features provided in this embodiment includes the following steps:

s1, down-sampling an optical remote sensing image, inputting the optical remote sensing image into an image super-resolution network, training the optical remote sensing image, and taking the obtained network as a super-resolution network of a SISR module;

s3, inputting the optical remote sensing image to a backbone module used for extracting image characteristic information to generate multi-scale and multi-level characteristics;

and S6, introducing a characteristic pyramid network by using the multi-scale characteristics generated by the backbone module to obtain characteristic graphs for detecting the large, medium and small targets, fusing the super-resolution characteristics with the characteristic graphs for detecting the small targets in the characteristic pyramid, and obtaining the super-resolution characteristics for detection through the characteristic aggregation module. Inputting the characteristic graphs and the super-resolution characteristics for detecting the large, medium and small targets into a detection head together for target detection to obtain a final detection result, wherein the final detection result is suitable for target detection of the optical remote sensing image;

in the present embodiment, as shown in fig. 1, step S1 specifically includes the following steps

And S1.1, performing down-sampling on the input optical remote sensing image by utilizing bilinear interpolation to obtain an image of one half resolution of the original image, wherein the image is marked as LR, and the image is marked as HR and is input into an image super-resolution network.

S1.2, selecting a simple and quick EDSR (Enhanced Deep Residual Networks) network from the image super-resolution network in the single-image super-resolution module, wherein the network consists of a convolutional layer and a plurality of Residual modules, and obtaining a super-resolution image after upsampling and convolution calculation. Inputting an LR image into an EDSR network to obtain a super-resolution output Low _ SR of the LR image; and calculating L1 Loss through Low _ SR and HR to realize the training of the super-resolution network.

In the present embodiment, as shown in fig. 1, step S2 specifically includes the following steps

And inputting the input picture into a trained EDSR network, namely a single-picture super-resolution module, outputting the obtained super-resolution picture, and recording the picture as SR.

In the present embodiment, as shown in fig. 1, step S3 specifically includes the following steps

Step S3.1, CSPnet is used as a backbone network for extracting feature information, and is composed of CBS components and CSP modules. The CBS consists of operations of convolution, batch normalization and activation of the function SiLu, and the CSP copies the feature map of the previous layer into two branches, and then reduces the computation by halving the number of channels by 1 × 1 convolution. Two profiles, one connected to the end of the phase and one fed into the ResNet block or CBS module as input. Finally, the two profiles are concatenated and combined. Then, the remote sensing image is input to the backbone network, and feature extraction is performed.

And S3.2, inputting the image into a backbone network, performing feature extraction through multiple convolutions to generate multi-scale and multi-level features, selecting appropriate feature maps with different scales of the upper layer, the middle layer and the lower layer, recording the feature maps as F3, F2 and F1, and using the feature maps as the input of a later auxiliary super-resolution module and a detection module.

In the present embodiment, as shown in fig. 1, step S4 specifically includes the following steps

And S4.1, selecting F3 as a shallow layer characteristic and F1 as a deep layer characteristic, and inputting the characteristics into the auxiliary super-resolution module. The auxiliary super-resolution module can be regarded as a simple encoder-decoder model, and shallow features and deep features are subjected to fusion encoding, so that the fused features have shallow texture and fine granularity information and deep semantic information. In the encoder, the shallow layer features are first subjected to CBR module and frequency up-conversion operation to match the deep layer features, and then are merged by using series operation and two CBRD modules. CBR includes convolution, batch normalization and ReLu, CBRD includes an additional dropout layer. And for a decoder, the decoder decodes the obtained fusion features, and after three times of deconvolution operation, a super-resolution image SR' corresponding to the input image is obtained, so that the reconstruction of the high-resolution image is realized.

And S4.2, performing MSE (mean square error) calculation on the image SR' obtained by the auxiliary super-resolution module and the image SR obtained by the single-image super-resolution module in the step S2 to obtain SR reconstruction loss, recording the SR reconstruction loss as Ls, and introducing the SR reconstruction loss into the loss calculation for detection so as to guide the relevant learning of the spatial dimension.

In the present embodiment, as shown in fig. 1, step S5 specifically includes the following steps

And fusing the shallow feature F3 and the deep feature F1, and obtaining a super-resolution feature larger than the original shallow feature through a CBRD module and a deconvolution operation, and recording the super-resolution feature as SR feature.

In the present embodiment, as shown in fig. 1, step S6 specifically includes the following steps

Step S6.1, in order to effectively combine the multi-scale and multi-level features generated by the backbone network, the head module integrates FPN and PANet, generates a feature pyramid network according to the high, medium and low features, F3, F2 and F1, obtained in the step S3, and obtains corresponding three-layer features for detecting large, medium and small targets, which are respectively marked as N1, N2 and N3.

And S6.2, fusing the super-resolution features, SR feature and N3 obtained in the S5, and passing through a Feature Aggregation Module (FAM). The FAM takes the fused features as input, converts the fused feature map into a plurality of feature spaces, inputs the input feature map into different scale spaces, transmits the input feature map into average pooling layers with different down-sampling rates, then combines up-sampling feature maps from different sub-branches together, and then performs one-layer convolution to obtain super-resolution features for detection, and records the super-resolution features as N4, so that aliasing effects generated by feature fusion are reduced.

And S6.2, inputting the feature maps N1, N2 and N3 for detecting the large, medium and small targets and the super-resolution feature N4 into the detection head together for target detection to obtain a final detection result.

Claims

1. A target detection method of an optical remote sensing image based on super-resolution feature aggregation is characterized by comprising the following steps:

s6, introducing a characteristic pyramid network by utilizing multi-scale characteristics generated by a backbone module to obtain characteristic graphs for detecting large, medium and small targets, fusing the super-resolution characteristics with the characteristic graphs for detecting the small targets in the characteristic pyramid, and obtaining the super-resolution characteristics for detection through a characteristic aggregation module; and inputting the characteristic diagrams for detecting the large, medium and small targets and the super-resolution characteristics into a detection head together for target detection to obtain a final detection result.

2. The method for detecting the optical remote sensing image target based on the super-resolution feature aggregation is characterized in that the step S1 specifically comprises the following steps:

s1.1, performing down-sampling on an input optical remote sensing image by utilizing bilinear interpolation to obtain an image with one-half resolution of an original image, recording the image as LR, recording the original image as HR and inputting the image into an image super-resolution network;

s1.2, selecting an EDSR network from an image super-resolution network in the single-image super-resolution module, wherein the network consists of a convolution layer and a plurality of residual modules, and obtaining a super-resolution image after upsampling and convolution calculation; inputting an LR image into an EDSR network to obtain a super-resolution output Low _ SR of the LR image; and calculating the L1 Loss through the Low _ SR and the HR to realize the training of the super-resolution network.

3. The method for detecting the optical remote sensing image target based on the super-resolution feature aggregation according to claim 2, wherein the step S2 specifically comprises the following steps: and inputting the input picture into a trained EDSR network, namely a single-picture super-resolution module, outputting the obtained super-resolution picture, and recording the super-resolution picture as SR.

4. The method for detecting the optical remote sensing image target based on the super-resolution feature aggregation is characterized in that the step S3 specifically comprises the following steps:

step S3.1, CSPnet is used as a backbone network for extracting characteristic information and is composed of a CBS component and a CSP module; the CBS is formed by operations of convolution, batch normalization and an activation function SiLu, the CSP copies the feature map of the upper layer into two branches, and then the number of channels is halved through convolution of 1 multiplied by 1; two signatures, one connected to the end of the phase, one fed into the ResNet block or CBS module as input; the two characteristic diagrams are connected in series and are combined;

and S3.2, inputting the image into a backbone network, performing feature extraction through multiple convolutions to generate multi-scale and multi-level features, selecting appropriate feature maps with different scales of three layers, namely, a high layer, a middle layer, a low layer and a high layer, and recording the feature maps as F3, F2 and F1 to serve as input of a later auxiliary super-resolution module and a detection module.

5. The method for detecting the optical remote sensing image target based on the super-resolution feature aggregation according to claim 4, wherein the step S4 specifically comprises the following steps:

s4.1, selecting F3 as a shallow layer characteristic and F1 as a deep layer characteristic, and inputting the characteristics into an auxiliary super-resolution module; the auxiliary super-resolution module is an encoder-decoder model, and shallow features and deep features are subjected to fusion encoding, so that the fused features have shallow texture and fine granularity information and deep semantic information; in an encoder, firstly carrying out CBR module and frequency raising operation on shallow layer features to match deep layer features, and then merging by utilizing series operation and two CBRD modules; CBR comprises convolution, batch normalization and ReLu, and CBRD comprises an additional dropout layer; for a decoder, the decoder decodes the obtained fusion features, and after three times of deconvolution operation, a super-resolution image SR' corresponding to the input image is obtained, so that the reconstruction of a high-resolution image is realized;

6. The method for detecting the optical remote sensing image target based on the super-resolution feature aggregation according to claim 5, wherein the step S5 specifically comprises the following steps: and fusing the shallow feature F3 and the deep feature F1, and obtaining a super-resolution feature larger than the original shallow feature through a CBRD module and a deconvolution operation, and recording the super-resolution feature as SR feature.

7. The optical remote sensing image target detection method based on super-resolution feature aggregation according to claim 6, wherein the step S6 specifically comprises the following steps:

s6.1, generating a characteristic pyramid network according to the high, medium and low characteristics F3, F2 and F1 obtained in the step S3, and obtaining three layers of characteristics which are used for detecting large, medium and small targets and are respectively marked as N1, N2 and N3;

s6.2, fusing the super-resolution features, SR features and N3 obtained in the step S5, and passing through a feature aggregation module; the feature aggregation module takes the fused features as input, converts the fused feature maps into a plurality of feature spaces, inputs the input feature maps into different scale spaces, transmits the input feature maps into average pooling layers with different down-sampling rates, then combines up-sampling feature maps from different sub-branches together, and then performs one-layer convolution to obtain super-resolution features for detection, and the super-resolution features are marked as N4;

8. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method according to any of claims 1-7 are implemented when the processor executes the program.

9. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.

10. A computer program product comprising a computer program, characterized in that the computer program realizes the steps of the method of any one of claims 1-7 when executed by a processor.