CN111709307A

CN111709307A - Resolution enhancement-based remote sensing image small target detection method

Info

Publication number: CN111709307A
Application number: CN202010444356.XA
Authority: CN
Inventors: 谷延锋; 叶树嘉; 高国明
Original assignee: Harbin Institute of Technology
Current assignee: Harbin Institute of Technology
Priority date: 2020-05-22
Filing date: 2020-05-22
Publication date: 2020-09-25
Anticipated expiration: 2040-05-22
Also published as: CN111709307B

Abstract

A resolution enhancement-based remote sensing image small target detection method belongs to the technical field of target detection in remote sensing images. The method solves the problem that the effect of detecting the small target in the remote sensing image by using the existing method is poor due to less available characteristic information of the small target in the remote sensing image and the existence of geometric deformation in the small target area. The invention carries out super-resolution processing on the remote sensing image containing the small target and then carries out target detection, thereby expanding the application range of a deep learning target detection model and realizing the deep utilization of the remote sensing image with higher spatial resolution. Aiming at the problems of few available characteristic information and geometric deformation of small and medium targets in the remote sensing image, the detail characteristic information of the small target is further perfected by adopting a super-resolution processing technology, the limited characteristic information of the small target is fully utilized by applying a deformable convolution network based on a region, and the detection capability of the small target in the remote sensing image is improved. The method can be applied to small target detection in the remote sensing image.

Description

Resolution enhancement-based remote sensing image small target detection method

Technical Field

The invention belongs to the technical field of target detection in remote sensing images, and particularly relates to a method for detecting a small target in a remote sensing image.

Background

The spatial resolution of the optical remote sensing image mainly depends on the instantaneous field angle of a satellite and the distance between the satellite and the earth surface, the size of the optical remote sensing image often depends on the performance of the satellite, and with the continuous development of remote sensing technology, more and more remote sensing satellites such as Worldview series, domestic GF series and the like are developed and emitted into the air, so that remote sensing images with higher and higher spatial resolution are obtained, small targets (such as small vehicles and the like) in the remote sensing images have richer texture characteristic information, and the possibility is provided for solving the small target detection problem of the remote sensing images by using a deep learning method.

The existing remote sensing image small target detection mainly faces two major difficulties:

although the spatial resolution of remote sensing images is continuously increased, the pixel range of small targets in the images is still very small (such as small vehicles), available characteristic information is very little, detection is directly carried out by adopting an algorithm designed for targets with normal sizes, and the detection effect is very unsatisfactory;

and secondly, when the monitoring satellite shoots, because the position and the motion state of the satellite are continuously changed and the like, the small target in the optical remote sensing image can be geometrically deformed. Since the pixel information in a small target area in the remote sensing image is very limited, even slight geometric deformation can have great influence on the detection result.

Disclosure of Invention

The invention aims to solve the problem that the existing method is poor in small target detection effect in a remote sensing image due to the fact that the available characteristic information of small targets in the remote sensing image is few and the small target area has geometric deformation, and provides a resolution enhancement-based remote sensing image small target detection method.

The technical scheme adopted by the invention for solving the technical problems is as follows: a remote sensing image small target detection method based on resolution enhancement comprises the following specific processes:

firstly, giving an original remote sensing image X and a label vector Y corresponding to a small target contained in the original remote sensing image X, wherein the small target contained in the original remote sensing image X is a target with the number of pixels within the range of (0, 100);

carrying out up-sampling and pre-segmentation processing on an original remote sensing image X to obtain a group of images X with the same size₁,X₂,...,X_mAnd image X₁,X₂,...,X_mThe respective corresponding label vector Y₁,Y₂,…,Y_m；

Step two, respectively generating images X₁,X₂,...,X_mCorresponding super-resolution image S₁,S₂,...,S_mAnd based on the label vector Y₁,Y₂,…,Y_mObtaining a super-resolution image S₁,S₂,...,S_mThe label vectors K corresponding to each₁,K₂,…,K_m；

Step three, utilizing the super-resolution image S₁,S₂,...,S_mAnd a super-resolution image S₁,S₂,...,S_mCorresponding label vector K₁,K₂,…,K_mTraining the deformable full convolution network based on the region until reaching the set maximum iteration number, and obtaining the trained deformable full convolution network based on the region;

step four, for the original remote sensing image M to be detected, processing the original remote sensing image M from the step one to the step two to obtain a group of super-resolution images M with the same size of the original remote sensing image M₁,M₂,...,M_m；

The super-resolution image M₁,M₂,...,M_mInputting the deformable full convolution network based on the region trained in the third step to obtain the pair image M₁,M₂,...,M_mThe target detection result of (1);

and step five, combining the target detection results obtained in the step four to obtain a target detection result N of the original remote sensing image M.

The invention has the beneficial effects that: the invention provides a remote sensing image small target detection method based on resolution enhancement, which aims at the problems of few available characteristic information and geometric deformation of small and medium-sized targets in remote sensing images, adopts super-resolution processing technology to further perfect the detailed characteristic information of the small targets, applies a region-based deformable convolution network to fully utilize the limited characteristic information of the small targets, and improves the detection capability and the detection effect of the small targets in the remote sensing images.

In order to verify the performance of the method provided by the invention, the method is verified aiming at DOTA data sets from China resource satellite data and application centers, Google Earth, satellites JL-1, satellites GF-2 and the like. The experimental result verifies the effectiveness of the remote sensing image small target detection algorithm based on super-resolution processing. And dividing the experimental data set at random according to the proportion of a training set, a verification set and a test set which are 2:1:1, wherein the single-class detection precision reaches about 80%.

Drawings

FIG. 1 is a schematic flow chart of an implementation of the present invention;

FIG. 2a is a schematic diagram showing an image (including a small vehicle) in a DOTA dataset and a corresponding label of a small target detection box after label processing (only the small vehicle is reserved);

FIG. 2b is a schematic diagram of a set of image blocks of the same size obtained by upsampling and pre-dividing a remote sensing image;

FIG. 3a is a basic schematic diagram of generation of a countermeasure network;

FIG. 3b is a block diagram of a network structure of a generative model G (sub-generator network) corresponding to that of FIG. 3 a;

the method mainly adopts block layout, the core part of the network consists of a plurality of same residual blocks, and two deconvolution operations with the step length of 0.5 are used for improving the resolution of an output image in the output stage of the network;

n64 represents the number of convolution kernel filters, i.e., the dimension of the output feature map, and s represents the convolution step size;

FIG. 3c is a block diagram of the network structure of the discriminant model D (discriminator subnetwork) corresponding to FIG. 3 a;

the main network structure is VGG, the whole network comprises 8 convolutional layers, and two full-connection layers are used for mapping the characteristic graph into a probability value in the output stage of the network, wherein the probability value is the confidence score of whether the discriminator can discriminate the super-resolution image and the real image;

FIG. 3d shows the visual change of the remote sensing images before and after super resolution processing, where the left side is the original remote sensing image before super resolution processing and the right side is the super resolution remote sensing image after super resolution processing;

FIG. 4a illustrates a training network framework diagram of a region-based deformable convolutional network;

FIG. 4b shows a shape diagram of a normal convolution kernel;

FIG. 4c illustrates a shape diagram of a deformable convolution kernel;

they are obtained by adding a displacement (arrow) to the normal sample coordinates, indicating that the deformable convolution kernel can fit the severe deformation of the target;

FIG. 4d shows a special case of a deformable convolution as a scale transform;

FIG. 4e shows a special case of a deformable convolution as a rotational transformation;

fig. 5 is a result diagram obtained by combining the detection results of the segmented image blocks, and is an inversion of the preprocessing process to finally obtain the target detection result of the original remote sensing image.

Detailed Description

The first embodiment is as follows: this embodiment will be described with reference to fig. 1. The resolution enhancement-based remote sensing image small target detection method in the embodiment comprises the following specific processes:

firstly, giving an original remote sensing image X to be trained and a label vector Y corresponding to a small target contained in the original remote sensing image X, wherein the small target contained in the original remote sensing image X is a target with the number of pixels in the range of (0, 100);

the region-based deformable full convolution network comprises ResNet-101, a convolution layer, RPN, ROI pooling layer and softmax classifier;

super-resolution image S₁,S₂,...,S_mExtracting a characteristic image through ResNet-101, and reducing the dimension of the extracted characteristic image through a convolution layer to obtain a feature image after dimension reduction; at the same time, the super-resolution image S₁,S₂,...,S_mOutputting a region of interest RoI through an RPN, mapping the region of interest RoI into a feature image after dimensionality reduction, performing pooling operation on the mapped image by using an ROI pooling layer, averaging results of the pooling operation, and inputting the averaged results into a softmax classifier to obtain a target classification result;

Super-resolutionImage M₁,M₂,...,M_mInputting the deformable full convolution network based on the region trained in the third step to obtain the pair image M₁,M₂,...,M_mThe target detection result of (1);

Image M₁,M₂,...,M_mAnd inverting the target detection result to the target detection result under the original image M.

The invention realizes the super-resolution processing and then the target detection of the remote sensing image containing a small target (such as a small vehicle), expands the application range of the deep learning target detection model and realizes the deep utilization of the remote sensing image with higher spatial resolution.

The second embodiment is as follows: this embodiment will be described with reference to fig. 2a and 2 b. The first difference between the present embodiment and the specific embodiment is: the specific process of the step one is as follows:

carrying out up-sampling (the adjustment range is small) processing on an original remote sensing image X to obtain an image subjected to up-sampling processing; pre-dividing the up-sampled image, i.e. dividing the up-sampled image into a group of images X with the same size₁,X₂,...,X_mWhen image segmentation is performed, X is stored₁,X₂,...,X_mPosition information in the up-sampled image, m representing the total number of the divided images;

when the image is divided, the label vector Y corresponding to the small target contained in the original remote sensing image X is also divided and is used as the image X₁,X₂,...,X_mAssigning corresponding segmented label vectors Y₁,Y₂,…,Y_m。

Segmenting the label vector Y to obtain a segmented label vector Y₁,Y₂,…,Y_m。

Other steps and parameters are the same as those in the first embodiment.

The third concrete implementation mode: this embodiment will be described with reference to fig. 3a, 3b, 3c and 3 d. The present embodiment differs from the first or second embodiment in that: the specific process of the second step is as follows:

step two, establishing a generator sub-network

And discriminator subnetwork

Forming a countermeasure network;

step two, a group of remote sensing images with higher resolution ratio is given

N1.. N, by comparing the remote sensing image

Down sampling to obtain remote sensing image

A corresponding set of remote sensing images with a lower resolution

N is the number of remote sensing images contained in each group;

wherein, the higher resolution refers to the remote sensing image

Relative to the remote sensing image

Is higher, lower resolution means remote sensing image

Relative to the remote sensing image

Is lower.

Step two and step three, utilizing remote sensing image

Training generator subnetwork

The problem that the generator sub-network needs to solve is described using equation (1):

in the formula, theta_G}W_1:L；b_1:L}，θ_GIs a generator subnetwork

The set of parameters for all weights and offsets,

to obtain remote sensing images

When the generator sub-network is input, the reconstructed image output by the generator sub-network; l^SRTo generate a loss function for the subnetwork;

loss function l^SRThe device consists of the following three parts:

wherein the content of the first and second substances,

as a function of content loss, γ₁Is composed of

The weight parameter of (a) is determined,

to combat the loss function, gamma₂Is composed of

The weight parameter of (a) is determined,

for regularizing the loss function, gamma₃Is composed of

The weight parameter of (2);

wherein, W_i,jAnd H_i,jRespectively representing discriminator sub-networks

The width and height of the feature map output by the jth convolutional layer before the ith maximum pooling layer (i is 5, j is 4, and may be selected practically and arbitrarily in the present invention);

indicating when discriminator subnetwork

Input is as

Time, discriminator subnetwork

The value of pixel point (x, y) in the characteristic diagram output by the jth convolutional layer before the ith maximum pooling layer,

indicating when discriminator subnetwork

Input as a reconstructed image

Time, discriminator subnetwork

The value of a pixel point (x, y) in a feature map output by the jth convolutional layer before the ith maximum pooling layer; x is 1,2, …, W_i,j，y＝1,2，…，H_i,j；

Wherein the content of the first and second substances,

representing the image to be reconstructed

Input discriminator subnetwork

Time, discriminator subnetwork

An output of (d);

wherein, | | | | represents a 1 norm, and r represents a reconstructed image

W and H denote reconstructed images

The width and the height of the base material,

representing reconstructed images

Performing pixel-by-pixel derivation, (x ', y') represents a pixel point in the reconstructed image, wherein x 'is 1,2, …, rW, y' is 1,2, …, rH;

using remote sensing images

And reconstructing the image

To train a discriminator subnetwork

N-1.. N, the problem to be solved by the discriminator subnetwork is described by equation (6):

wherein the content of the first and second substances,

represents when the input is

Time, discriminator subnetwork

A probability value of the output;

representing when the input is a reconstructed image

Time, discriminator subnetwork

A probability value of the output; e [. C]Representing expectations；θ_DIs a discriminator subnetwork

A set of parameters for all weights and offsets;

solving theta satisfying the formula (1) and the formula (6)_GAnd theta_DThen, a well-trained generated countermeasure network is obtained;

after training and generating the confrontation network, image X is taken₁,X₂,...,X_mThe super-resolution image S can be obtained by inputting the generation countermeasure network₁,S₂,...,S_mWhen the confrontation network is generated for training, a discriminator subnetwork is needed to be utilized, and the cooperation of the two networks is needed during training;

step two and four, image X₁,X₂,...,X_mInputting into a trained generative confrontation network, and outputting an image X by a generator subnetwork in the generative confrontation network₁,X₂,...,X_mCorresponding super-resolution image S₁,S₂,...,S_m；

Tag vector Y is transformed by matlab or python program₁,Y₂,…,Y_mMultiplying the coordinates by 4 to obtain a processed label vector K₁,K₂,…,K_m。

Vector Y of labels₁,Y₂,…,Y_mBy 4 is meant: the label vector is essentially a vector consisting of four vertex coordinates of a labeling frame of a small target, and the label vector Y is₁,Y₂,…,Y_mMultiplying 4 by the coordinate of (1) means that each element in the vector is correspondingly multiplied by 4 to obtain a processed label vector K₁,K₂,…,K_mProcessed label vector K₁,K₂,…,K_mAnd image X₁,X₂,...,X_mAnd correspond to each other. In fact, the super-resolution image S₁,S₂,...,S_mCan be set to image X₁,X₂,...,X_mAny multiple of the resolution of (1), other multiples being achieved by modifying parameters that generate the countermeasure network, will be targetedSign vector Y₁,Y₂,…,Y_mAlso multiply the corresponding multiple, in the present invention, the super-resolution image S₁,S₂,...,S_mIs the image X₁,X₂,...,X_m4 times the resolution.

Other steps and parameters are the same as those in the first or second embodiment.

Generator subnetwork G_θGThe generator subnetwork G is mainly in block layout, as shown in FIG. 3b, the core part of the network is composed of a plurality of identical residual blocks, each of which is composed of a convolutional layer, a Bulk Normalization (BN) layer and a linear rectification function (ReLU) layer, specifically, the convolutional core of each convolutional layer is 3 × 3, the feature map dimension of each residual block output is 64, and the feature map resolution is kept unchanged by a padding operation during the convolution process.

Discriminator subnetwork D_θDAs shown in fig. 3c, for the discriminator subnetwork D, the main network structure is VGG, and the LeakyReLU layer is used for activation at the end of each layer of the convolutional network, and the BN layer is used for normalization before the feature map is output. The network integrally comprises 8 convolutional layers, the step length of the convolutional layers and the number of the filters are continuously increased along with the continuous deepening of the network, so that the resolution of the feature map is continuously reduced, the dimension of the feature map is continuously increased, the last convolutional layer can output a 512-dimensional low-resolution feature map, in the output stage of the network, two full-connection layers are used for mapping the feature map into a probability value, and the probability value is the confidence score of whether the discriminator can discriminate the super-resolution image from the real image.

The fourth concrete implementation mode: this embodiment will be described with reference to fig. 4 a. The difference between this embodiment mode and one of the first to third embodiment modes is: the specific process of the third step is as follows:

step three, firstly, obtaining the super-resolution image S₁,S₂,...,S_mAnd a super-resolution image S₁,S₂,...,S_mCorresponding label vector K₁,K₂,…,K_mInputting a region-based deformable full convolution network, and extracting a super-resolution image S by using ResNet-101 in the region-based deformable full convolution network₁,S₂,...,S_mUsing the RPN to output a super-resolution image S₁,S₂,...,S_mRoI of (1);

the invention adopts ResNet-101 as a feature extraction network based on the R-FCN thought, the ResNet-101 is provided with 100 convolutional layers in front, and is followed by a global average pooling layer and 1000 classes of full-connection layers, the invention deletes the subsequent average pooling layer and full-connection layer, and calculates the feature mapping by using the convolutional layers only, and adopts a transfer learning method, firstly pre-training on ImageNet to obtain a trained ResNet-101 classification network, removing the subsequent classification layer and loss calculation part, only keeping the feature extraction part therein, and inserting a randomly initialized convolutional layer with the size of 1024 × 1 × 1 into the network to reduce the dimension of the 2048-dimensional convolutional layer.

Step three, mapping the region of interest RoI to a super-resolution image S₁,S₂,...,S_mObtaining the mapped characteristic image from the characteristic image;

thirdly, performing pooling operation on the mapped feature images to obtain pooling results, and performing averaging operation on the pooling results to obtain target classification results;

for example, for the (i, j) th chunk (0 ≦ i, j ≦ k-1), a location-sensitive RoI pooling operation is defined that pools only in the (i, j) th chunk:

wherein r is_c(i, j | Θ) is the pooled value of the c-th class in the (i, j) -th chunk, z_i,j,cIs a score mapping for the c-th category, (x)₀,y₀) The top left corner element representing the RoI area, n the number of elements in the chunk, and Θ all the learning parameters of the network. (i, j) th blockThe occupied area is as follows:

and

w and h represent the width and height of each RoI region, respectively, dividing the RoI region into k²And (4) a plurality of parts.

Next, averaging the pooled results to obtain a final classification result, wherein each RoI generates a (C +1) -dimensional vector after averaging: r is_c(Θ)＝∑_i,jr_c(i, j | Θ). Then, its softmax response is calculated:

step three and four, extracting super-resolution image S₁,S₂,...,S_mWhen the feature image is mapped, the coordinate of the boundary frame of the target is learned at the same time, when the mapped feature image is subjected to pooling operation, a position vector is generated for each RoI area at the same time, and the coordinate of the boundary frame and the position vector are aggregated into a 4-dimensional vector t (t is t) through averaging operation_x,t_y,t_w,t_h) In, t_xDetect the top left corner x coordinate, t, of the box for the target_yFor the upper left-hand y coordinate, t, of the target detection box_wWidth of target detection frame, t_hThe height of the target detection box.

The fifth concrete implementation mode: this embodiment will be described with reference to fig. 4b to 4 e. The difference between this embodiment and one of the first to fourth embodiments is: the convolution in the region-based deformable full convolution network adopts deformable convolution, and the deformable convolution adds an offset delta p to each convolution grid point_nThe modified formula is expressed as follows:

wherein p is₀Representing the upper left corner of the convolution fieldPoint, p_nIs the relative offset, Δ p, of other points in the convolution field with respect to the top left corner_nIs the offset, w (p), learned during convolution_n) Representing the number of convolution kernel filters, i.e. weights,

representing the sampling grid, i.e. the convolution field size, y (p)₀) Is the result of the convolution output, x (p)₀+p_n+Δp_n) Representing p in the input feature image after introducing the offset₀+p_nThe pixel value of the dot.

Other steps and parameters are the same as in one of the first to fourth embodiments.

The present invention replaces the conventional convolution structure with a deformable convolution structure. The conventional convolution structure is defined as follows, where p_nIs the ratio of each point on the receptive field to p₀The amount of offset of (c).

Wherein p is₀Is the point in the upper left corner of the convolution receptive field,

is a convolved sampling grid.

The sixth specific implementation mode: this embodiment will be described with reference to fig. 5. The difference between this embodiment and one of the first to fifth embodiments is: the concrete process of the step five is as follows:

for image M₁,M₂,...,M_mAfter the target detection result is zoomed, the zoom processing is carried out according to the image M₁,M₂,...,M_mAnd combining the zooming processing results in the position information in the original remote sensing image M to obtain a target detection result N.

Other steps and parameters are the same as those in one of the first to fifth embodiments.

The following examples were used to demonstrate the beneficial effects of the present invention:

the first embodiment is as follows:

the resolution enhancement-based remote sensing image small target detection method is specifically carried out according to the following steps:

the data used in the experiment are DOTA data sets with different resolutions obtained according to an open satellite, and fig. 2a shows an image (containing a small vehicle) in the DOTA data set and a corresponding labeling situation of a small target detection frame after label processing (only the small vehicle is reserved); FIG. 2b is a group of image blocks with the same size obtained by performing upsampling and pre-segmentation on a remote sensing image, and position information of an original image is saved when the image is cut;

table 1 shows that three target detection models are trained and tested under different training sample (remote sensing image data sets before and after the super-resolution) data conditions to obtain single-class detection Accuracy (AP) experimental results;

TABLE 1 Single class detection accuracy (% Unit)

FIG. 5 is a result obtained by combining the detection results of the segmented image blocks, which is an inversion of the preprocessing process, and a target detection result of the original remote sensing image is finally obtained; table 2 shows the experimental results obtained by training and testing six target detection models using the remote sensing image dataset after the super-resolution, wherein the last item is the experimental result of the target detection algorithm used in the present invention, and the last item includes three indexes, namely, single-class detection Accuracy (AP), training time (train _ time), and detection time (test _ time). By comparison, the algorithm provided by the invention has the best effect under all indexes, and the single-class detection precision on the DOTA data set reaches about 80%.

TABLE 2

The present invention is capable of other embodiments and its several details are capable of modifications in various obvious respects, all without departing from the spirit and scope of the present invention.

Claims

1. A remote sensing image small target detection method based on resolution enhancement is characterized by specifically comprising the following steps:

step four, for the original remote sensing image M to be detected, the original remote sensing image M is subjected to original remote sensingAfter the sensed image M is processed from the first step to the second step, a group of super-resolution images M with the same size of the original remote sensing image M are obtained₁,M₂,...,M_m；

2. The method for detecting the small target of the remote sensing image based on the resolution enhancement as claimed in claim 1, wherein the specific process of the first step is as follows:

carrying out up-sampling processing on an original remote sensing image X to obtain an image subjected to up-sampling processing; pre-dividing the up-sampled image, i.e. dividing the up-sampled image into a group of images X with the same size₁,X₂,...,X_mWhen image segmentation is performed, X is stored₁,X₂,...,X_mPosition information in the up-sampled image, m representing the total number of the divided images;

3. The method for detecting the small target of the remote sensing image based on the resolution enhancement as claimed in claim 2, wherein the specific process of the second step is as follows:

step two, establishing a generator sub-network

And discriminator subnetwork

Forming a countermeasure network;

By aligning remote sensing images

Down sampling to obtain remote sensing image

A corresponding set of remote sensing images with a lower resolution

N is the number of remote sensing images contained in each group;

step two and step three, utilizing remote sensing image

Training generator subnetwork

in the formula, theta_GIs a generator subnetwork

The set of parameters for all weights and offsets,

to obtain a remote sensing mapImage

loss function l^SRThe device consists of the following three parts:

wherein the content of the first and second substances,

as a function of content loss, γ₁Is composed of

The weight parameter of (a) is determined,

to combat the loss function, gamma₂Is composed of

The weight parameter of (a) is determined,

for regularizing the loss function, gamma₃Is composed of

The weight parameter of (2);

wherein, W_i,jAnd H_i,jRespectively representing discriminator sub-networks

The width and height of the feature map output by the jth convolutional layer before the ith maximum pooling layer;

indicating when discriminator subnetwork

Input is as

Time, discriminator subnetwork

indicating when discriminator subnetwork

Input as a reconstructed image

Time, discriminator subnetwork

Wherein the content of the first and second substances,

representing the image to be reconstructed

Input discriminator subnetwork

Time, discriminator subnetwork

An output of (d);

wherein, | | | | represents a 1 norm, and r represents a reconstructed image

W and H denote reconstructed images

The width and the height of the base material,

representing reconstructed images

using remote sensing images

And reconstructing the image

To train a discriminator subnetwork

Then use formula (6) toDescribing the problem that the discriminator sub-network needs to solve:

wherein the content of the first and second substances,

represents when the input is

Time, discriminator subnetwork

A probability value of the output;

representing when the input is a reconstructed image

Time, discriminator subnetwork

A probability value of the output; e [. C]Representing and expecting; theta_DIs a discriminator subnetwork

A set of parameters for all weights and offsets;

Using matlab orThe python program will label vector Y₁,Y₂,…,Y_mMultiplying the coordinates by 4 to obtain a processed label vector K₁,K₂,…,K_m。

4. The method for detecting the small target of the remote sensing image based on the resolution enhancement is characterized in that the specific process of the third step is as follows:

5. The method for detecting the small target of the remote sensing image based on the resolution enhancement as claimed in claim 4, wherein the convolution in the area-based deformable full convolution network adopts deformable convolution, and the deformable convolution is added with an offset Δ p on each convolution grid point_nThe modified formula is expressed as follows:

wherein p is₀Point, p, representing the upper left corner of the convolution field_nIs the relative offset, Δ p, of other points in the convolution field with respect to the top left corner_nIs the offset, w (p), learned during convolution_n) The weight is represented by a weight that is,

representing the size of the convolution field, y (p)₀) Is the result of the convolution output, x (p)₀+p_n+Δp_n) Representing p in the input feature image after introducing the offset₀+p_nThe pixel value of the dot.

6. The method for detecting the small target of the remote sensing image based on the resolution enhancement as claimed in claim 5, wherein the concrete process of the fifth step is as follows: