CN111709307B

CN111709307B - Resolution enhancement-based remote sensing image small target detection method

Info

Publication number: CN111709307B
Application number: CN202010444356.XA
Authority: CN
Inventors: 谷延锋; 叶树嘉; 高国明
Original assignee: Harbin Institute of Technology
Current assignee: Harbin Institute of Technology
Priority date: 2020-05-22
Filing date: 2020-05-22
Publication date: 2022-08-30
Anticipated expiration: 2040-05-22
Also published as: CN111709307A

Abstract

A resolution enhancement-based remote sensing image small target detection method belongs to the technical field of target detection in remote sensing images. The method solves the problem that the effect of detecting the small target in the remote sensing image by using the existing method is poor due to less available characteristic information of the small target in the remote sensing image and the geometric deformation of the small target area. The invention carries out super-resolution processing on the remote sensing image containing the small target and then carries out target detection, thereby expanding the application range of a deep learning target detection model and realizing the deep utilization of the remote sensing image with higher spatial resolution. Aiming at the problems of few available characteristic information and geometric deformation of small and medium targets in the remote sensing image, the method further improves the detailed characteristic information of the small target by adopting a super-resolution processing technology, fully utilizes the limited characteristic information of the small target by applying a deformable convolution network based on a region, and improves the detection capability of the small target in the remote sensing image. The method can be applied to small target detection in the remote sensing image.

Description

Resolution enhancement-based remote sensing image small target detection method

Technical Field

The invention belongs to the technical field of target detection in remote sensing images, and particularly relates to a method for detecting a small target in a remote sensing image.

Background

The spatial resolution of the optical remote sensing image mainly depends on the instantaneous field angle of a satellite and the distance between the satellite and the earth surface, the size of the optical remote sensing image often depends on the performance of the satellite, and with the continuous development of remote sensing technology, more and more remote sensing satellites such as Worldview series, domestic GF series and the like are developed and emitted into the air, so that remote sensing images with higher and higher spatial resolution are obtained, small targets (such as small vehicles and the like) in the remote sensing images have richer texture characteristic information, and the possibility is provided for solving the small target detection problem of the remote sensing images by using a deep learning method.

The existing remote sensing image small target detection mainly faces two difficulties:

although the spatial resolution of remote sensing images is continuously increased, the range of small target pixels in the images is still very small (such as small vehicles), available characteristic information is very little, detection is directly carried out by adopting an algorithm designed for targets with normal sizes, and the detection effect is very unsatisfactory;

and secondly, when the monitoring satellite shoots, because the position and the motion state of the satellite are continuously changed and the like, the small target in the optical remote sensing image can be geometrically deformed. Since the pixel information in a small target area in the remote sensing image is very limited, even slight geometric deformation can have great influence on the detection result.

Disclosure of Invention

The invention aims to solve the problem that the existing method is poor in small target detection effect in a remote sensing image due to the fact that the available characteristic information of small targets in the remote sensing image is few and the small target area has geometric deformation, and provides a resolution enhancement-based remote sensing image small target detection method.

The technical scheme adopted by the invention for solving the technical problems is as follows: a remote sensing image small target detection method based on resolution enhancement comprises the following specific processes:

firstly, giving an original remote sensing image X and a label vector Y corresponding to a small target contained in the original remote sensing image X, wherein the small target contained in the original remote sensing image X is a target with the number of pixels within the range of (0, 100);

carrying out up-sampling and pre-segmentation processing on an original remote sensing image X to obtain a group of images X with the same size ₁ ,X ₂ ,...,X _m And image X ₁ ,X ₂ ,...,X _m The respective corresponding label vector Y ₁ ,Y ₂ ,…,Y _m ；

Step two, respectively generating images X ₁ ,X ₂ ,...,X _m Corresponding superResolution image S ₁ ,S ₂ ,...,S _m And based on the label vector Y ₁ ,Y ₂ ,…,Y _m Obtaining a super-resolution image S ₁ ,S ₂ ,...,S _m The label vectors K corresponding to each ₁ ,K ₂ ,…,K _m ；

Step three, utilizing the super-resolution image S ₁ ,S ₂ ,...,S _m And a super-resolution image S ₁ ,S ₂ ,...,S _m Corresponding label vector K ₁ ,K ₂ ,…,K _m Training the deformable full convolution network based on the region until reaching the set maximum iteration number, and obtaining the trained deformable full convolution network based on the region;

step four, for the original remote sensing image M to be detected, processing the original remote sensing image M from the step one to the step two to obtain a group of super-resolution images M with the same size of the original remote sensing image M ₁ ,M ₂ ,...,M _m ；

The super-resolution image M ₁ ,M ₂ ,...,M _m Inputting the deformable full convolution network based on the region trained in the third step to obtain the pair image M ₁ ,M ₂ ,...,M _m The target detection result of (1);

and step five, combining the target detection results obtained in the step four to obtain a target detection result N of the original remote sensing image M.

The beneficial effects of the invention are as follows: the invention provides a remote sensing image small target detection method based on resolution enhancement, which aims at the problems of few available characteristic information and geometric deformation of small and medium-sized targets in remote sensing images, adopts super-resolution processing technology to further perfect the detailed characteristic information of the small targets, applies a region-based deformable convolution network to fully utilize the limited characteristic information of the small targets, and improves the detection capability and the detection effect of the small targets in the remote sensing images.

In order to verify the performance of the method provided by the invention, the DOTA data sets from China resource satellite data and application centers, Google Earth, satellites JL-1, satellites GF-2 and the like are verified. The experimental result verifies the effectiveness of the remote sensing image small target detection algorithm based on super-resolution processing. And dividing the experimental data set at random according to the proportion of a training set, a verification set and a test set which are 2:1:1, wherein the single-class detection precision reaches about 80%.

Drawings

FIG. 1 is a schematic flow chart of an implementation of the present invention;

FIG. 2a is a schematic diagram showing an image (including a small vehicle) in a DOTA dataset and a corresponding label of a small target detection box after label processing (only the small vehicle is reserved);

FIG. 2b is a schematic diagram of a set of image blocks of the same size obtained from a remote sensing image after upsampling and pre-segmentation processing;

FIG. 3a is a basic schematic diagram of generation of a countermeasure network;

FIG. 3b is a block diagram of a network structure of a generative model G (sub-generator network) corresponding to that of FIG. 3 a;

the method mainly adopts block layout, the core part of the network consists of a plurality of same residual blocks, and two deconvolution operations with the step length of 0.5 are used for improving the resolution of an output image at the output stage of the network;

n64 represents the number of convolution kernel filters, i.e., the dimension of the output feature map, and s represents the convolution step size;

FIG. 3c is a block diagram of the network structure of the discriminant model D (discriminator subnetwork) corresponding to FIG. 3 a;

the main network structure is VGG, the whole network comprises 8 convolutional layers, and two full-connection layers are used for mapping the characteristic graph into a probability value in the output stage of the network, wherein the probability value is the confidence score of whether the discriminator can discriminate the super-resolution image and the real image;

FIG. 3d shows the visual change of the remote sensing images before and after super resolution processing, where the left side is the original remote sensing image before super resolution processing and the right side is the super resolution remote sensing image after super resolution processing;

FIG. 4a illustrates a training network framework diagram of a region-based deformable convolutional network;

FIG. 4b shows a shape diagram of a normal convolution kernel;

FIG. 4c illustrates a shape diagram of a deformable convolution kernel;

they are obtained by adding a displacement (arrow) to the normal sample coordinates, indicating that the deformable convolution kernel can fit the severe deformation of the target;

FIG. 4d shows a special case of a deformable convolution as a scale transform;

FIG. 4e shows a special case of a deformable convolution as a rotational transformation;

fig. 5 is a result diagram obtained by combining the detection results of the segmented image blocks, and is an inversion of the preprocessing process to finally obtain the target detection result of the original remote sensing image.

Detailed Description

The first embodiment is as follows: this embodiment will be described with reference to fig. 1. The resolution enhancement-based remote sensing image small target detection method in the embodiment comprises the following specific processes:

firstly, giving an original remote sensing image X to be trained and a label vector Y corresponding to a small target contained in the original remote sensing image X, wherein the small target contained in the original remote sensing image X is a target with the number of pixels in the range of (0, 100);

Step two, respectively generating images X ₁ ,X ₂ ,...,X _m Corresponding super-resolution image S ₁ ,S ₂ ,...,S _m And based on the label vector Y ₁ ,Y ₂ ,…,Y _m Obtaining a super-resolution image S ₁ ,S ₂ ,...,S _m The label vectors K corresponding to each ₁ ,K ₂ ,…,K _m ；

Step three, utilizing super-resolutionImage S ₁ ,S ₂ ,...,S _m And super-resolution image S ₁ ,S ₂ ,...,S _m Corresponding label vector K ₁ ,K ₂ ,…,K _m Training the deformable full convolution network based on the region until reaching the set maximum iteration number, and obtaining the trained deformable full convolution network based on the region;

the region-based deformable full convolution network comprises ResNet-101, a convolution layer, RPN, ROI pooling layer and softmax classifier;

super-resolution image S ₁ ,S ₂ ,...,S _m Extracting a characteristic image through ResNet-101, and reducing the dimension of the extracted characteristic image through a convolution layer to obtain a dimension-reduced characteristic image; at the same time, the super-resolution image S ₁ ,S ₂ ,...,S _m Outputting a region of interest RoI through an RPN, mapping the region of interest RoI into a feature image after dimensionality reduction, performing pooling operation on the mapped image by using an ROI pooling layer, averaging results of the pooling operation, and inputting the averaged results into a softmax classifier to obtain a target classification result;

Image M ₁ ,M ₂ ,...,M _m And inverting the target detection result to the target detection result in the original image M.

The invention realizes the super-resolution processing and then the target detection of the remote sensing image containing a small target (such as a small vehicle), expands the application range of the deep learning target detection model and realizes the deep utilization of the remote sensing image with higher spatial resolution.

The second embodiment is as follows: this embodiment will be described with reference to fig. 2a and 2 b. The first difference between the present embodiment and the specific embodiment is: the specific process of the step one is as follows:

carrying out up-sampling (the adjustment range is small) processing on an original remote sensing image X to obtain an image subjected to up-sampling processing; pre-dividing the up-sampled image, i.e. dividing the up-sampled image into a group of images X with the same size ₁ ,X ₂ ,...,X _m When image segmentation is performed, X is stored ₁ ,X ₂ ,...,X _m Position information in the up-sampled image, m representing the total number of the divided images;

when the image is divided, the label vector Y corresponding to the small target contained in the original remote sensing image X is also divided and is used as the image X ₁ ,X ₂ ,...,X _m Assigning corresponding segmented label vectors Y ₁ ,Y ₂ ,…,Y _m 。

Segmenting the label vector Y to obtain a segmented label vector Y ₁ ,Y ₂ ,…,Y _m 。

Other steps and parameters are the same as those in the first embodiment.

The third concrete implementation mode: this embodiment will be described with reference to fig. 3a, 3b, 3c and 3 d. The present embodiment differs from the first or second embodiment in that: the specific process of the second step is as follows:

step two, establishing a generator sub-network

And discriminator subnetwork

Forming a countermeasure network;

step two, a given group has a comparisonHigh resolution remote sensing image

N1.. N, by comparing the remote sensing image

Down sampling to obtain remote sensing image

A corresponding set of remote sensing images with a lower resolution

N is 1,. N, N is the number of remote sensing images included in each group;

wherein, the higher resolution refers to the remote sensing image

Relative to the remote sensing image

Is higher, lower resolution means remote sensing image

Relative to the remote sensing image

Is lower.

Step two and step three, utilizing remote sensing image

Training generator subnetwork

The problem that the generator sub-network needs to solve is described using equation (1):

in the formula, theta _G }W _1:L ；b _1:L }，θ _G Is a generator subnetwork

The set of parameters for all weights and offsets,

to obtain remote sensing images

When the generator subnetwork is input, the generator subnetwork outputs a reconstructed image; l ^SR To generate a loss function for the subnetwork;

loss function l ^SR The device consists of the following three parts:

wherein, the first and the second end of the pipe are connected with each other,

as a function of content loss, γ ₁ Is composed of

The weight parameter of (a) is determined,

to combat the loss function, gamma ₂ Is composed of

The weight parameter of (a) is determined,

for regularizing the loss function, gamma ₃ Is composed of

Right of (1)A weight parameter;

wherein, W _i,j And H _i,j Respectively representing discriminator subnetworks

The width and height of the feature map output by the jth convolutional layer before the ith maximum pooling layer (i is 5, j is 4, and may be selected practically and arbitrarily in the present invention);

indicating when discriminator subnetwork

Input is as

Time, discriminator subnetwork

The value of pixel point (x, y) in the characteristic diagram output by the jth convolutional layer before the ith maximum pooling layer,

indicating when discriminator subnetwork

Input as a reconstructed image

Time, discriminator subnetwork

The value of a pixel point (x, y) in a feature map output by the jth convolutional layer before the ith maximum pooling layer; x is 1,2, …, W _i,j ，y＝1,2，…，H _i,j ；

Wherein the content of the first and second substances,

representing the image to be reconstructed

Input discriminator subnetwork

Time, discriminator subnetwork

An output of (d);

wherein, | | | | represents a 1 norm, and r represents a reconstructed image

W and H denote reconstructed images

The width and the height of the base material,

representing reconstructed images

Performing pixel-by-pixel partial derivation, (x ', y') represents pixel points in the reconstructed image, wherein x 'is 1,2, …, rW, y' is 1,2, … and rH;

using remote sensing images

And reconstructing the image

To train a discriminator subnetwork

N-1.. N, the problem to be solved by the discriminator subnetwork is described by equation (6):

wherein the content of the first and second substances,

represents when the input is

Time, discriminator subnetwork

A probability value of the output;

representing when the input is a reconstructed image

Time, discriminator subnetwork

A probability value of the output; e [. C]Representing and expecting; theta _D Is a discriminator subnetwork

A set of parameters for all weights and offsets;

solving theta satisfying the formula (1) and the formula (6) _G And theta _D Then, a well-trained generated confrontation network is obtained;

after training and generating the confrontation network, image X is taken ₁ ,X ₂ ,...,X _m The super-resolution image S can be obtained by inputting the generation countermeasure network ₁ ,S ₂ ,...,S _m When the confrontation network is generated for training, a discriminator subnetwork is needed to be utilized, and the cooperation of the two networks is needed during training;

step two and four, image X ₁ ,X ₂ ,...,X _m Inputting into a trained generative confrontation network, and outputting an image X by a generator subnetwork in the generative confrontation network ₁ ,X ₂ ,...,X _m Corresponding super-resolution image S ₁ ,S ₂ ,...,S _m ；

Tag vector Y is transformed by matlab or python program ₁ ,Y ₂ ,…,Y _m Multiplying the coordinates by 4 to obtain a processed label vector K ₁ ,K ₂ ,…,K _m 。

Vector Y of labels ₁ ,Y ₂ ,…,Y _m By 4 is meant: the label vector is essentially a vector consisting of four vertex coordinates of a labeling frame of a small target, and the label vector Y is ₁ ,Y ₂ ,…,Y _m Multiplying 4 by the coordinate of (1) means that each element in the vector is correspondingly multiplied by 4 to obtain a processed label vector K ₁ ,K ₂ ,…,K _m Processed label vector K ₁ ,K ₂ ,…,K _m And image X ₁ ,X ₂ ,...,X _m And correspond to each other. In fact, the super-resolution image S ₁ ,S ₂ ,...,S _m Can be set to image X ₁ ,X ₂ ,...,X _m Any multiple of the resolution of (1), other multiples being implemented by modifying parameters that generate the countermeasure network, the tag vector Y ₁ ,Y ₂ ,…,Y _m Also multiply the corresponding multiple, in the present invention, the super-resolution image S ₁ ,S ₂ ,...,S _m Is the image X ₁ ,X ₂ ,...,X _m 4 times the resolution.

Other steps and parameters are the same as those in the first or second embodiment.

Generator subnetwork G _θG As shown in FIG. 3b, generatingThe block layout is mainly adopted by the sub-network G, the core part of the network is composed of a plurality of identical residual blocks, each residual block is composed of a convolution layer, a Batch Normalization (BN) layer and a linear rectification function (ReLU) layer, specifically, the convolution kernel of each convolution layer is 3 multiplied by 3, the feature map dimension output by each residual block is 64, and the resolution of the feature map is kept unchanged through a filling operation in the convolution process. In the output stage of the network, two deconvolution operations with a step size of 0.5 are used to improve the resolution of the output image.

Discriminator subnetwork D _θD As shown in fig. 3c, for the discriminator subnetwork D, the main network structure is VGG, and the LeakyReLU layer is used for activation at the end of each layer of the convolutional network, and the BN layer is used for normalization before the feature map is output. The whole network comprises 8 convolutional layers, the step length of the convolutional layers and the number of the filters are continuously increased along with the continuous deepening of the network, so that the resolution of the feature map is continuously reduced, the dimension of the feature map is continuously increased, the last convolutional layer can output a 512-dimensional low-resolution feature map, in the output stage of the network, two full-connection layers are used for mapping the feature map into a probability value, and the probability value is the confidence score of whether the discriminator can discriminate the super-resolution image from the real image.

The fourth concrete implementation mode: this embodiment will be described with reference to fig. 4 a. The difference between this embodiment mode and one of the first to third embodiment modes is: the specific process of the third step is as follows:

step three, firstly, obtaining the super-resolution image S ₁ ,S ₂ ,...,S _m And a super-resolution image S ₁ ,S ₂ ,...,S _m Corresponding label vector K ₁ ,K ₂ ,…,K _m Inputting a region-based deformable full convolution network, and extracting a super-resolution image S by using ResNet-101 in the region-based deformable full convolution network ₁ ,S ₂ ,...,S _m Using the RPN to output a super-resolution image S ₁ ,S ₂ ,...,S _m RoI of (1);

the invention adopts ResNet-101 as a feature extraction network based on the R-FCN thought, the ResNet-101 is provided with 100 convolutional layers in front, and is followed by a global average pooling layer and 1000 classes of full-connection layers, the invention deletes the subsequent average pooling layer and full-connection layer, and calculates the feature mapping by using the convolutional layers only, and adopts a transfer learning method, firstly pre-training on ImageNet to obtain a trained ResNet-101 classification network, removing the subsequent classification layer and loss calculation part, only keeping the feature extraction part therein, and inserting a randomly initialized convolutional layer with the size of 1024 × 1 × 1 into the network to reduce the dimension of the 2048-dimensional convolutional layer.

Step three, mapping the region of interest RoI to a super-resolution image S ₁ ,S ₂ ,...,S _m Obtaining the mapped characteristic image from the characteristic image;

thirdly, performing pooling operation on the mapped feature images to obtain pooling results, and performing averaging operation on the pooling results to obtain target classification results;

for example, for the (i, j) th chunk (0 ≦ i, j ≦ k-1), a location-sensitive RoI pooling operation is defined that pools only in the (i, j) th chunk:

wherein r is _c (i, j | Θ) is the pooled value of the c-th class in the (i, j) -th chunk, z _i,j,c Is a score mapping for the c-th category, (x) ₀ ,y ₀ ) The top left corner element representing the RoI area, n the number of elements in the chunk, and Θ all the learning parameters of the network. The area occupied by the (i, j) th block is:

and

w and h represent the width and height of each RoI region, respectively, dividing the RoI region into k ² And (4) a plurality of parts.

Then, the pooled material is treatedThe result is averaged to obtain the final classification result, and after averaging, each RoI generates a (C +1) -dimensional vector: r is _c (Θ)＝∑ _i,j r _c (i, j | Θ). Then, its softmax response is calculated:

step three and four, extracting super-resolution image S ₁ ,S ₂ ,...,S _m When the feature image is mapped, the coordinate of the boundary frame of the target is learned at the same time, when the mapped feature image is subjected to pooling operation, a position vector is generated for each RoI area at the same time, and the coordinate of the boundary frame and the position vector are aggregated into a 4-dimensional vector t (t is t) through averaging operation _x ,t _y ,t _w ,t _h ) In, t _x Detect the top left corner x coordinate, t, of the box for the target _y For the upper left-hand y coordinate, t, of the target detection box _w Width of target detection frame, t _h The height of the target detection box.

The fifth concrete implementation mode: this embodiment will be described with reference to fig. 4b to 4 e. The difference between this embodiment and one of the first to fourth embodiments is: the convolution in the deformable full convolution network based on the region adopts deformable convolution, and the deformable convolution adds an offset delta p on each convolution grid point _n The modified formula is expressed as follows:

wherein p is ₀ Point, p, representing the upper left corner of the convolution field _n Is the relative offset, Δ p, of other points in the convolution field with respect to the top left corner _n Is the offset, w (p), learned during convolution _n ) Representing the number of convolution kernel filters, i.e. weights,

representing the sampling grid, i.e. the convolution field size, y (p) ₀ ) Is a convolution ofThe result of the post-output, x (p) ₀ +p _n +Δp _n ) Representing p in the input feature image after introducing the offset ₀ +p _n The pixel value of the dot.

Other steps and parameters are the same as in one of the first to fourth embodiments.

The present invention replaces the conventional convolution structure with a deformable convolution structure. The conventional convolution structure is defined as follows, where p _n Is the ratio of each point on the receptive field to p ₀ The offset of (c).

Wherein p is ₀ Is the point in the upper left corner of the convolution receptive field,

is a convolved sampling grid.

The sixth specific implementation mode is as follows: this embodiment will be described with reference to fig. 5. The difference between this embodiment and one of the first to fifth embodiments is: the concrete process of the step five is as follows:

for image M ₁ ,M ₂ ,...,M _m After the target detection result is zoomed, the zoom processing is carried out according to the image M ₁ ,M ₂ ,...,M _m And combining the zooming processing results in the position information in the original remote sensing image M to obtain a target detection result N.

Other steps and parameters are the same as those in one of the first to fifth embodiments.

The following examples were used to demonstrate the beneficial effects of the present invention:

the first embodiment is as follows:

the resolution enhancement-based remote sensing image small target detection method is specifically carried out according to the following steps:

the data used in the experiment are DOTA data sets with different resolutions obtained according to an open satellite, and fig. 2a shows an image (containing a small vehicle) in the DOTA data set and a corresponding labeling situation of a small target detection frame after label processing (only the small vehicle is reserved); FIG. 2b is a group of image blocks with the same size obtained by performing upsampling and pre-segmentation on a remote sensing image, and position information of an original image is saved when the image is cut;

table 1 shows that three target detection models are trained and tested under different training sample (remote sensing image data sets before and after the super-resolution) data conditions to obtain a single-class detection precision (AP) experimental result;

TABLE 1 Single class detection accuracy (% Unit)

FIG. 5 is a result obtained by combining the detection results of the segmented image blocks, which is an inversion of the preprocessing process, and a target detection result of the original remote sensing image is finally obtained; table 2 shows the experimental results obtained by training and testing six target detection models using the remote sensing image dataset after the super-resolution, wherein the last item is the experimental result of the target detection algorithm used in the present invention, and the last item includes three indexes, namely, single-class detection Accuracy (AP), training time (train _ time), and detection time (test _ time). By comparison, the algorithm provided by the method has the best effect under all indexes, and the single-class detection precision on the DOTA data set reaches about 80%.

TABLE 2

The present invention is capable of other embodiments and its several details are capable of modifications in various obvious respects, all without departing from the spirit and scope of the present invention.

Claims

1. A remote sensing image small target detection method based on resolution enhancement is characterized by specifically comprising the following steps:

The specific process of the second step is as follows:

step two, establishing a generator sub-network

And discriminator subnetwork

Forming a countermeasure network;

step two, a group of remote sensing images with higher resolution ratio is given

By aligning remote sensing images

Down sampling to obtain remote sensing image

A corresponding set of remote sensing images with a lower resolution

N is the number of remote sensing images contained in each group;

step two and step three, utilizing remote sensing image

Training generator subnetwork

in the formula, theta _G Is a generator subnetwork

The set of parameters for all weights and offsets,

to obtain remote sensing images

loss function l ^SR The device consists of the following three parts:

wherein the content of the first and second substances,

as a function of content loss, γ ₁ Is composed of

The weight parameter of (a) is determined,

to combat the loss function, gamma ₂ Is composed of

The weight parameter of (a) is determined,

as a regularizing loss function, gamma ₃ Is composed of

The weight parameter of (2);

wherein, W _i,j And H _i,j Respectively representing discriminator subnetworks

The width and height of the feature map output by the jth convolutional layer before the ith maximum pooling layer;

indicating when discriminator subnetwork

Input is as

Time, discriminator subnetwork

indicating when discriminator subnetwork

Input as a reconstructed image

Time, discriminator subnetwork

Wherein the content of the first and second substances,

representing the image to be reconstructed

Input discriminator subnetwork

Time, discriminator subnetwork

An output of (d);

wherein, | | | | represents a 1 norm, and r represents a reconstructed image

W and H denote reconstructed images

The width and the height of the base material,

representing reconstructed images

Performing pixel-by-pixel derivation, (x ', y') represents a pixel point in the reconstructed image, wherein x 'is 1,2, …, rW, y' is 1,2, …, rH;

using remote sensing images

And reconstructing the image

To train a discriminator subnetwork

The problem that needs to be solved by the discriminator sub-network is described by equation (6):

wherein the content of the first and second substances,

represents when the input is

Time, discriminator subnetwork

A probability value of the output;

representing when the input is a reconstructed image

Time, discriminator subnetwork

A set of parameters for all weights and offsets;

solving theta satisfying the formula (1) and the formula (6) _G And theta _D Then, a well-trained generated countermeasure network is obtained;

Tag vector Y is transformed by matlab or python program ₁ ,Y ₂ ,…,Y _m Multiplying the coordinates by 4 to obtain a processed label vector K ₁ ,K ₂ ,…,K _m ；

Step three, utilizing the super-resolution image S ₁ ,S ₂ ,...,S _m And super-resolution imageS ₁ ,S ₂ ,...,S _m Corresponding label vector K ₁ ,K ₂ ,…,K _m Training the deformable full convolution network based on the region until reaching the set maximum iteration number, and obtaining the trained deformable full convolution network based on the region;

the third step comprises the following specific processes:

step three, one, the super-resolution image S ₁ ,S ₂ ,...,S _m And a super-resolution image S ₁ ,S ₂ ,...,S _m Corresponding label vector K ₁ ,K ₂ ,…,K _m Inputting a region-based deformable full convolution network, and extracting a super-resolution image S by using ResNet-101 in the region-based deformable full convolution network ₁ ,S ₂ ,...,S _m The feature image of (2) is utilized to output a super-resolution image S ₁ ,S ₂ ,...,S _m RoI of (1);

step three and four, extracting super-resolution image S ₁ ,S ₂ ,...,S _m When the feature image is mapped, the coordinate of the boundary frame of the target is learned at the same time, when the mapped feature image is subjected to pooling operation, a position vector is generated for each RoI area at the same time, and the coordinate of the boundary frame and the position vector are aggregated into a 4-dimensional vector t (t is t) through averaging operation _x ,t _y ,t _w ,t _h ) In, t _x Detect the top left corner x coordinate of the box, t, for the target _y For the upper left-hand y coordinate, t, of the target detection box _w Width of target detection frame, t _h The height of the target detection frame;

step four, for the original remote sensing image M to be detected, the step one is carried out on the original remote sensing image MAfter the processing of the second step, a group of super-resolution images M with the same size of the original remote sensing image M are obtained ₁ ,M ₂ ,...,M _m ；

2. The method for detecting the small target of the remote sensing image based on the resolution enhancement as claimed in claim 1, wherein the specific process of the first step is as follows:

carrying out up-sampling processing on an original remote sensing image X to obtain an image subjected to up-sampling processing; the images after the up-sampling processing are pre-divided, namely, the images after the up-sampling processing are divided into a group of images X with the same size ₁ ,X ₂ ,...,X _m When image segmentation is performed, X is stored ₁ ,X ₂ ,...,X _m Position information in the up-sampled image, m representing the total number of the divided images;

3. The method for detecting the small target of the remote sensing image based on the resolution enhancement as claimed in claim 2, wherein the convolution in the area-based deformable full convolution network adopts deformable convolution, and the deformable convolution is added with an offset Δ p on each convolution grid point _n The modified formula is expressed as follows:

wherein p is ₀ Point, p, representing the upper left corner of the convolution field _n Is the relative offset, Δ p, of other points in the convolution field with respect to the top left corner _n Is the offset, w (p), learned during convolution _n ) The weight is represented by a weight that is,

representing the size of the convolution field, y (p) ₀ ) Is the result of the convolution output, x (p) ₀ +p _n +Δp _n ) Representing p in the input feature image after introducing the offset ₀ +p _n The pixel value of the dot.

4. The method for detecting the small target of the remote sensing image based on the resolution enhancement as claimed in claim 3, wherein the concrete process of the fifth step is as follows: