CN111709307B - Resolution enhancement-based remote sensing image small target detection method - Google Patents

Resolution enhancement-based remote sensing image small target detection method Download PDF

Info

Publication number
CN111709307B
CN111709307B CN202010444356.XA CN202010444356A CN111709307B CN 111709307 B CN111709307 B CN 111709307B CN 202010444356 A CN202010444356 A CN 202010444356A CN 111709307 B CN111709307 B CN 111709307B
Authority
CN
China
Prior art keywords
image
remote sensing
resolution
sensing image
super
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010444356.XA
Other languages
Chinese (zh)
Other versions
CN111709307A (en
Inventor
谷延锋
叶树嘉
高国明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Institute of Technology
Original Assignee
Harbin Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Institute of Technology filed Critical Harbin Institute of Technology
Priority to CN202010444356.XA priority Critical patent/CN111709307B/en
Publication of CN111709307A publication Critical patent/CN111709307A/en
Application granted granted Critical
Publication of CN111709307B publication Critical patent/CN111709307B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • G06V20/13Satellite images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

A resolution enhancement-based remote sensing image small target detection method belongs to the technical field of target detection in remote sensing images. The method solves the problem that the effect of detecting the small target in the remote sensing image by using the existing method is poor due to less available characteristic information of the small target in the remote sensing image and the geometric deformation of the small target area. The invention carries out super-resolution processing on the remote sensing image containing the small target and then carries out target detection, thereby expanding the application range of a deep learning target detection model and realizing the deep utilization of the remote sensing image with higher spatial resolution. Aiming at the problems of few available characteristic information and geometric deformation of small and medium targets in the remote sensing image, the method further improves the detailed characteristic information of the small target by adopting a super-resolution processing technology, fully utilizes the limited characteristic information of the small target by applying a deformable convolution network based on a region, and improves the detection capability of the small target in the remote sensing image. The method can be applied to small target detection in the remote sensing image.

Description

Resolution enhancement-based remote sensing image small target detection method
Technical Field
The invention belongs to the technical field of target detection in remote sensing images, and particularly relates to a method for detecting a small target in a remote sensing image.
Background
The spatial resolution of the optical remote sensing image mainly depends on the instantaneous field angle of a satellite and the distance between the satellite and the earth surface, the size of the optical remote sensing image often depends on the performance of the satellite, and with the continuous development of remote sensing technology, more and more remote sensing satellites such as Worldview series, domestic GF series and the like are developed and emitted into the air, so that remote sensing images with higher and higher spatial resolution are obtained, small targets (such as small vehicles and the like) in the remote sensing images have richer texture characteristic information, and the possibility is provided for solving the small target detection problem of the remote sensing images by using a deep learning method.
The existing remote sensing image small target detection mainly faces two difficulties:
although the spatial resolution of remote sensing images is continuously increased, the range of small target pixels in the images is still very small (such as small vehicles), available characteristic information is very little, detection is directly carried out by adopting an algorithm designed for targets with normal sizes, and the detection effect is very unsatisfactory;
and secondly, when the monitoring satellite shoots, because the position and the motion state of the satellite are continuously changed and the like, the small target in the optical remote sensing image can be geometrically deformed. Since the pixel information in a small target area in the remote sensing image is very limited, even slight geometric deformation can have great influence on the detection result.
Disclosure of Invention
The invention aims to solve the problem that the existing method is poor in small target detection effect in a remote sensing image due to the fact that the available characteristic information of small targets in the remote sensing image is few and the small target area has geometric deformation, and provides a resolution enhancement-based remote sensing image small target detection method.
The technical scheme adopted by the invention for solving the technical problems is as follows: a remote sensing image small target detection method based on resolution enhancement comprises the following specific processes:
firstly, giving an original remote sensing image X and a label vector Y corresponding to a small target contained in the original remote sensing image X, wherein the small target contained in the original remote sensing image X is a target with the number of pixels within the range of (0, 100);
carrying out up-sampling and pre-segmentation processing on an original remote sensing image X to obtain a group of images X with the same size 1 ,X 2 ,...,X m And image X 1 ,X 2 ,...,X m The respective corresponding label vector Y 1 ,Y 2 ,…,Y m
Step two, respectively generating images X 1 ,X 2 ,...,X m Corresponding superResolution image S 1 ,S 2 ,...,S m And based on the label vector Y 1 ,Y 2 ,…,Y m Obtaining a super-resolution image S 1 ,S 2 ,...,S m The label vectors K corresponding to each 1 ,K 2 ,…,K m
Step three, utilizing the super-resolution image S 1 ,S 2 ,...,S m And a super-resolution image S 1 ,S 2 ,...,S m Corresponding label vector K 1 ,K 2 ,…,K m Training the deformable full convolution network based on the region until reaching the set maximum iteration number, and obtaining the trained deformable full convolution network based on the region;
step four, for the original remote sensing image M to be detected, processing the original remote sensing image M from the step one to the step two to obtain a group of super-resolution images M with the same size of the original remote sensing image M 1 ,M 2 ,...,M m
The super-resolution image M 1 ,M 2 ,...,M m Inputting the deformable full convolution network based on the region trained in the third step to obtain the pair image M 1 ,M 2 ,...,M m The target detection result of (1);
and step five, combining the target detection results obtained in the step four to obtain a target detection result N of the original remote sensing image M.
The beneficial effects of the invention are as follows: the invention provides a remote sensing image small target detection method based on resolution enhancement, which aims at the problems of few available characteristic information and geometric deformation of small and medium-sized targets in remote sensing images, adopts super-resolution processing technology to further perfect the detailed characteristic information of the small targets, applies a region-based deformable convolution network to fully utilize the limited characteristic information of the small targets, and improves the detection capability and the detection effect of the small targets in the remote sensing images.
In order to verify the performance of the method provided by the invention, the DOTA data sets from China resource satellite data and application centers, Google Earth, satellites JL-1, satellites GF-2 and the like are verified. The experimental result verifies the effectiveness of the remote sensing image small target detection algorithm based on super-resolution processing. And dividing the experimental data set at random according to the proportion of a training set, a verification set and a test set which are 2:1:1, wherein the single-class detection precision reaches about 80%.
Drawings
FIG. 1 is a schematic flow chart of an implementation of the present invention;
FIG. 2a is a schematic diagram showing an image (including a small vehicle) in a DOTA dataset and a corresponding label of a small target detection box after label processing (only the small vehicle is reserved);
FIG. 2b is a schematic diagram of a set of image blocks of the same size obtained from a remote sensing image after upsampling and pre-segmentation processing;
FIG. 3a is a basic schematic diagram of generation of a countermeasure network;
FIG. 3b is a block diagram of a network structure of a generative model G (sub-generator network) corresponding to that of FIG. 3 a;
the method mainly adopts block layout, the core part of the network consists of a plurality of same residual blocks, and two deconvolution operations with the step length of 0.5 are used for improving the resolution of an output image at the output stage of the network;
n64 represents the number of convolution kernel filters, i.e., the dimension of the output feature map, and s represents the convolution step size;
FIG. 3c is a block diagram of the network structure of the discriminant model D (discriminator subnetwork) corresponding to FIG. 3 a;
the main network structure is VGG, the whole network comprises 8 convolutional layers, and two full-connection layers are used for mapping the characteristic graph into a probability value in the output stage of the network, wherein the probability value is the confidence score of whether the discriminator can discriminate the super-resolution image and the real image;
FIG. 3d shows the visual change of the remote sensing images before and after super resolution processing, where the left side is the original remote sensing image before super resolution processing and the right side is the super resolution remote sensing image after super resolution processing;
FIG. 4a illustrates a training network framework diagram of a region-based deformable convolutional network;
FIG. 4b shows a shape diagram of a normal convolution kernel;
FIG. 4c illustrates a shape diagram of a deformable convolution kernel;
they are obtained by adding a displacement (arrow) to the normal sample coordinates, indicating that the deformable convolution kernel can fit the severe deformation of the target;
FIG. 4d shows a special case of a deformable convolution as a scale transform;
FIG. 4e shows a special case of a deformable convolution as a rotational transformation;
fig. 5 is a result diagram obtained by combining the detection results of the segmented image blocks, and is an inversion of the preprocessing process to finally obtain the target detection result of the original remote sensing image.
Detailed Description
The first embodiment is as follows: this embodiment will be described with reference to fig. 1. The resolution enhancement-based remote sensing image small target detection method in the embodiment comprises the following specific processes:
firstly, giving an original remote sensing image X to be trained and a label vector Y corresponding to a small target contained in the original remote sensing image X, wherein the small target contained in the original remote sensing image X is a target with the number of pixels in the range of (0, 100);
carrying out up-sampling and pre-segmentation processing on an original remote sensing image X to obtain a group of images X with the same size 1 ,X 2 ,...,X m And image X 1 ,X 2 ,...,X m The respective corresponding label vector Y 1 ,Y 2 ,…,Y m
Step two, respectively generating images X 1 ,X 2 ,...,X m Corresponding super-resolution image S 1 ,S 2 ,...,S m And based on the label vector Y 1 ,Y 2 ,…,Y m Obtaining a super-resolution image S 1 ,S 2 ,...,S m The label vectors K corresponding to each 1 ,K 2 ,…,K m
Step three, utilizing super-resolutionImage S 1 ,S 2 ,...,S m And super-resolution image S 1 ,S 2 ,...,S m Corresponding label vector K 1 ,K 2 ,…,K m Training the deformable full convolution network based on the region until reaching the set maximum iteration number, and obtaining the trained deformable full convolution network based on the region;
the region-based deformable full convolution network comprises ResNet-101, a convolution layer, RPN, ROI pooling layer and softmax classifier;
super-resolution image S 1 ,S 2 ,...,S m Extracting a characteristic image through ResNet-101, and reducing the dimension of the extracted characteristic image through a convolution layer to obtain a dimension-reduced characteristic image; at the same time, the super-resolution image S 1 ,S 2 ,...,S m Outputting a region of interest RoI through an RPN, mapping the region of interest RoI into a feature image after dimensionality reduction, performing pooling operation on the mapped image by using an ROI pooling layer, averaging results of the pooling operation, and inputting the averaged results into a softmax classifier to obtain a target classification result;
step four, for the original remote sensing image M to be detected, processing the original remote sensing image M from the step one to the step two to obtain a group of super-resolution images M with the same size of the original remote sensing image M 1 ,M 2 ,...,M m
The super-resolution image M 1 ,M 2 ,...,M m Inputting the deformable full convolution network based on the region trained in the third step to obtain the pair image M 1 ,M 2 ,...,M m The target detection result of (1);
and step five, combining the target detection results obtained in the step four to obtain a target detection result N of the original remote sensing image M.
Image M 1 ,M 2 ,...,M m And inverting the target detection result to the target detection result in the original image M.
The invention realizes the super-resolution processing and then the target detection of the remote sensing image containing a small target (such as a small vehicle), expands the application range of the deep learning target detection model and realizes the deep utilization of the remote sensing image with higher spatial resolution.
The second embodiment is as follows: this embodiment will be described with reference to fig. 2a and 2 b. The first difference between the present embodiment and the specific embodiment is: the specific process of the step one is as follows:
carrying out up-sampling (the adjustment range is small) processing on an original remote sensing image X to obtain an image subjected to up-sampling processing; pre-dividing the up-sampled image, i.e. dividing the up-sampled image into a group of images X with the same size 1 ,X 2 ,...,X m When image segmentation is performed, X is stored 1 ,X 2 ,...,X m Position information in the up-sampled image, m representing the total number of the divided images;
when the image is divided, the label vector Y corresponding to the small target contained in the original remote sensing image X is also divided and is used as the image X 1 ,X 2 ,...,X m Assigning corresponding segmented label vectors Y 1 ,Y 2 ,…,Y m
Segmenting the label vector Y to obtain a segmented label vector Y 1 ,Y 2 ,…,Y m
Other steps and parameters are the same as those in the first embodiment.
The third concrete implementation mode: this embodiment will be described with reference to fig. 3a, 3b, 3c and 3 d. The present embodiment differs from the first or second embodiment in that: the specific process of the second step is as follows:
step two, establishing a generator sub-network
Figure BDA0002505184590000051
And discriminator subnetwork
Figure BDA0002505184590000052
Forming a countermeasure network;
step two, a given group has a comparisonHigh resolution remote sensing image
Figure BDA0002505184590000053
N1.. N, by comparing the remote sensing image
Figure BDA0002505184590000054
Down sampling to obtain remote sensing image
Figure BDA0002505184590000055
A corresponding set of remote sensing images with a lower resolution
Figure BDA0002505184590000056
N is 1,. N, N is the number of remote sensing images included in each group;
wherein, the higher resolution refers to the remote sensing image
Figure BDA0002505184590000057
Relative to the remote sensing image
Figure BDA0002505184590000058
Is higher, lower resolution means remote sensing image
Figure BDA0002505184590000059
Relative to the remote sensing image
Figure BDA00025051845900000510
Is lower.
Step two and step three, utilizing remote sensing image
Figure BDA00025051845900000511
Training generator subnetwork
Figure BDA00025051845900000512
The problem that the generator sub-network needs to solve is described using equation (1):
Figure BDA00025051845900000513
in the formula, theta G }W 1:L ;b 1:L },θ G Is a generator subnetwork
Figure BDA00025051845900000514
The set of parameters for all weights and offsets,
Figure BDA00025051845900000515
to obtain remote sensing images
Figure BDA00025051845900000516
When the generator subnetwork is input, the generator subnetwork outputs a reconstructed image; l SR To generate a loss function for the subnetwork;
loss function l SR The device consists of the following three parts:
Figure BDA00025051845900000517
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA00025051845900000518
as a function of content loss, γ 1 Is composed of
Figure BDA00025051845900000519
The weight parameter of (a) is determined,
Figure BDA00025051845900000520
to combat the loss function, gamma 2 Is composed of
Figure BDA00025051845900000521
The weight parameter of (a) is determined,
Figure BDA00025051845900000522
for regularizing the loss function, gamma 3 Is composed of
Figure BDA00025051845900000523
Right of (1)A weight parameter;
Figure BDA00025051845900000524
wherein, W i,j And H i,j Respectively representing discriminator subnetworks
Figure BDA00025051845900000525
The width and height of the feature map output by the jth convolutional layer before the ith maximum pooling layer (i is 5, j is 4, and may be selected practically and arbitrarily in the present invention);
Figure BDA00025051845900000526
indicating when discriminator subnetwork
Figure BDA00025051845900000527
Input is as
Figure BDA00025051845900000528
Time, discriminator subnetwork
Figure BDA00025051845900000529
The value of pixel point (x, y) in the characteristic diagram output by the jth convolutional layer before the ith maximum pooling layer,
Figure BDA00025051845900000530
indicating when discriminator subnetwork
Figure BDA00025051845900000531
Input as a reconstructed image
Figure BDA00025051845900000532
Time, discriminator subnetwork
Figure BDA00025051845900000533
The value of a pixel point (x, y) in a feature map output by the jth convolutional layer before the ith maximum pooling layer; x is 1,2, …, W i,j ,y=1,2,…,H i,j
Figure BDA0002505184590000061
Wherein the content of the first and second substances,
Figure BDA0002505184590000062
representing the image to be reconstructed
Figure BDA0002505184590000063
Input discriminator subnetwork
Figure BDA0002505184590000064
Time, discriminator subnetwork
Figure BDA0002505184590000065
An output of (d);
Figure BDA0002505184590000066
wherein, | | | | represents a 1 norm, and r represents a reconstructed image
Figure BDA0002505184590000067
W and H denote reconstructed images
Figure BDA0002505184590000068
The width and the height of the base material,
Figure BDA0002505184590000069
representing reconstructed images
Figure BDA00025051845900000610
Performing pixel-by-pixel partial derivation, (x ', y') represents pixel points in the reconstructed image, wherein x 'is 1,2, …, rW, y' is 1,2, … and rH;
using remote sensing images
Figure BDA00025051845900000611
And reconstructing the image
Figure BDA00025051845900000612
To train a discriminator subnetwork
Figure BDA00025051845900000613
N-1.. N, the problem to be solved by the discriminator subnetwork is described by equation (6):
Figure BDA00025051845900000614
wherein the content of the first and second substances,
Figure BDA00025051845900000615
represents when the input is
Figure BDA00025051845900000616
Time, discriminator subnetwork
Figure BDA00025051845900000617
A probability value of the output;
Figure BDA00025051845900000618
representing when the input is a reconstructed image
Figure BDA00025051845900000619
Time, discriminator subnetwork
Figure BDA00025051845900000620
A probability value of the output; e [. C]Representing and expecting; theta D Is a discriminator subnetwork
Figure BDA00025051845900000621
A set of parameters for all weights and offsets;
solving theta satisfying the formula (1) and the formula (6) G And theta D Then, a well-trained generated confrontation network is obtained;
after training and generating the confrontation network, image X is taken 1 ,X 2 ,...,X m The super-resolution image S can be obtained by inputting the generation countermeasure network 1 ,S 2 ,...,S m When the confrontation network is generated for training, a discriminator subnetwork is needed to be utilized, and the cooperation of the two networks is needed during training;
step two and four, image X 1 ,X 2 ,...,X m Inputting into a trained generative confrontation network, and outputting an image X by a generator subnetwork in the generative confrontation network 1 ,X 2 ,...,X m Corresponding super-resolution image S 1 ,S 2 ,...,S m
Tag vector Y is transformed by matlab or python program 1 ,Y 2 ,…,Y m Multiplying the coordinates by 4 to obtain a processed label vector K 1 ,K 2 ,…,K m
Vector Y of labels 1 ,Y 2 ,…,Y m By 4 is meant: the label vector is essentially a vector consisting of four vertex coordinates of a labeling frame of a small target, and the label vector Y is 1 ,Y 2 ,…,Y m Multiplying 4 by the coordinate of (1) means that each element in the vector is correspondingly multiplied by 4 to obtain a processed label vector K 1 ,K 2 ,…,K m Processed label vector K 1 ,K 2 ,…,K m And image X 1 ,X 2 ,...,X m And correspond to each other. In fact, the super-resolution image S 1 ,S 2 ,...,S m Can be set to image X 1 ,X 2 ,...,X m Any multiple of the resolution of (1), other multiples being implemented by modifying parameters that generate the countermeasure network, the tag vector Y 1 ,Y 2 ,…,Y m Also multiply the corresponding multiple, in the present invention, the super-resolution image S 1 ,S 2 ,...,S m Is the image X 1 ,X 2 ,...,X m 4 times the resolution.
Other steps and parameters are the same as those in the first or second embodiment.
Generator subnetwork G θG As shown in FIG. 3b, generatingThe block layout is mainly adopted by the sub-network G, the core part of the network is composed of a plurality of identical residual blocks, each residual block is composed of a convolution layer, a Batch Normalization (BN) layer and a linear rectification function (ReLU) layer, specifically, the convolution kernel of each convolution layer is 3 multiplied by 3, the feature map dimension output by each residual block is 64, and the resolution of the feature map is kept unchanged through a filling operation in the convolution process. In the output stage of the network, two deconvolution operations with a step size of 0.5 are used to improve the resolution of the output image.
Discriminator subnetwork D θD As shown in fig. 3c, for the discriminator subnetwork D, the main network structure is VGG, and the LeakyReLU layer is used for activation at the end of each layer of the convolutional network, and the BN layer is used for normalization before the feature map is output. The whole network comprises 8 convolutional layers, the step length of the convolutional layers and the number of the filters are continuously increased along with the continuous deepening of the network, so that the resolution of the feature map is continuously reduced, the dimension of the feature map is continuously increased, the last convolutional layer can output a 512-dimensional low-resolution feature map, in the output stage of the network, two full-connection layers are used for mapping the feature map into a probability value, and the probability value is the confidence score of whether the discriminator can discriminate the super-resolution image from the real image.
The fourth concrete implementation mode: this embodiment will be described with reference to fig. 4 a. The difference between this embodiment mode and one of the first to third embodiment modes is: the specific process of the third step is as follows:
step three, firstly, obtaining the super-resolution image S 1 ,S 2 ,...,S m And a super-resolution image S 1 ,S 2 ,...,S m Corresponding label vector K 1 ,K 2 ,…,K m Inputting a region-based deformable full convolution network, and extracting a super-resolution image S by using ResNet-101 in the region-based deformable full convolution network 1 ,S 2 ,...,S m Using the RPN to output a super-resolution image S 1 ,S 2 ,...,S m RoI of (1);
the invention adopts ResNet-101 as a feature extraction network based on the R-FCN thought, the ResNet-101 is provided with 100 convolutional layers in front, and is followed by a global average pooling layer and 1000 classes of full-connection layers, the invention deletes the subsequent average pooling layer and full-connection layer, and calculates the feature mapping by using the convolutional layers only, and adopts a transfer learning method, firstly pre-training on ImageNet to obtain a trained ResNet-101 classification network, removing the subsequent classification layer and loss calculation part, only keeping the feature extraction part therein, and inserting a randomly initialized convolutional layer with the size of 1024 × 1 × 1 into the network to reduce the dimension of the 2048-dimensional convolutional layer.
Step three, mapping the region of interest RoI to a super-resolution image S 1 ,S 2 ,...,S m Obtaining the mapped characteristic image from the characteristic image;
thirdly, performing pooling operation on the mapped feature images to obtain pooling results, and performing averaging operation on the pooling results to obtain target classification results;
for example, for the (i, j) th chunk (0 ≦ i, j ≦ k-1), a location-sensitive RoI pooling operation is defined that pools only in the (i, j) th chunk:
Figure BDA0002505184590000081
wherein r is c (i, j | Θ) is the pooled value of the c-th class in the (i, j) -th chunk, z i,j,c Is a score mapping for the c-th category, (x) 0 ,y 0 ) The top left corner element representing the RoI area, n the number of elements in the chunk, and Θ all the learning parameters of the network. The area occupied by the (i, j) th block is:
Figure BDA0002505184590000082
and
Figure BDA0002505184590000083
w and h represent the width and height of each RoI region, respectively, dividing the RoI region into k 2 And (4) a plurality of parts.
Then, the pooled material is treatedThe result is averaged to obtain the final classification result, and after averaging, each RoI generates a (C +1) -dimensional vector: r is c (Θ)=∑ i,j r c (i, j | Θ). Then, its softmax response is calculated:
Figure BDA0002505184590000084
step three and four, extracting super-resolution image S 1 ,S 2 ,...,S m When the feature image is mapped, the coordinate of the boundary frame of the target is learned at the same time, when the mapped feature image is subjected to pooling operation, a position vector is generated for each RoI area at the same time, and the coordinate of the boundary frame and the position vector are aggregated into a 4-dimensional vector t (t is t) through averaging operation x ,t y ,t w ,t h ) In, t x Detect the top left corner x coordinate, t, of the box for the target y For the upper left-hand y coordinate, t, of the target detection box w Width of target detection frame, t h The height of the target detection box.
The fifth concrete implementation mode: this embodiment will be described with reference to fig. 4b to 4 e. The difference between this embodiment and one of the first to fourth embodiments is: the convolution in the deformable full convolution network based on the region adopts deformable convolution, and the deformable convolution adds an offset delta p on each convolution grid point n The modified formula is expressed as follows:
Figure BDA0002505184590000091
wherein p is 0 Point, p, representing the upper left corner of the convolution field n Is the relative offset, Δ p, of other points in the convolution field with respect to the top left corner n Is the offset, w (p), learned during convolution n ) Representing the number of convolution kernel filters, i.e. weights,
Figure BDA0002505184590000095
representing the sampling grid, i.e. the convolution field size, y (p) 0 ) Is a convolution ofThe result of the post-output, x (p) 0 +p n +Δp n ) Representing p in the input feature image after introducing the offset 0 +p n The pixel value of the dot.
Other steps and parameters are the same as in one of the first to fourth embodiments.
The present invention replaces the conventional convolution structure with a deformable convolution structure. The conventional convolution structure is defined as follows, where p n Is the ratio of each point on the receptive field to p 0 The offset of (c).
Figure BDA0002505184590000092
Wherein p is 0 Is the point in the upper left corner of the convolution receptive field,
Figure BDA0002505184590000093
is a convolved sampling grid.
The sixth specific implementation mode is as follows: this embodiment will be described with reference to fig. 5. The difference between this embodiment and one of the first to fifth embodiments is: the concrete process of the step five is as follows:
for image M 1 ,M 2 ,...,M m After the target detection result is zoomed, the zoom processing is carried out according to the image M 1 ,M 2 ,...,M m And combining the zooming processing results in the position information in the original remote sensing image M to obtain a target detection result N.
Other steps and parameters are the same as those in one of the first to fifth embodiments.
The following examples were used to demonstrate the beneficial effects of the present invention:
the first embodiment is as follows:
the resolution enhancement-based remote sensing image small target detection method is specifically carried out according to the following steps:
the data used in the experiment are DOTA data sets with different resolutions obtained according to an open satellite, and fig. 2a shows an image (containing a small vehicle) in the DOTA data set and a corresponding labeling situation of a small target detection frame after label processing (only the small vehicle is reserved); FIG. 2b is a group of image blocks with the same size obtained by performing upsampling and pre-segmentation on a remote sensing image, and position information of an original image is saved when the image is cut;
table 1 shows that three target detection models are trained and tested under different training sample (remote sensing image data sets before and after the super-resolution) data conditions to obtain a single-class detection precision (AP) experimental result;
TABLE 1 Single class detection accuracy (% Unit)
Figure BDA0002505184590000094
Figure BDA0002505184590000101
FIG. 5 is a result obtained by combining the detection results of the segmented image blocks, which is an inversion of the preprocessing process, and a target detection result of the original remote sensing image is finally obtained; table 2 shows the experimental results obtained by training and testing six target detection models using the remote sensing image dataset after the super-resolution, wherein the last item is the experimental result of the target detection algorithm used in the present invention, and the last item includes three indexes, namely, single-class detection Accuracy (AP), training time (train _ time), and detection time (test _ time). By comparison, the algorithm provided by the method has the best effect under all indexes, and the single-class detection precision on the DOTA data set reaches about 80%.
TABLE 2
Figure BDA0002505184590000102
The present invention is capable of other embodiments and its several details are capable of modifications in various obvious respects, all without departing from the spirit and scope of the present invention.

Claims (4)

1. A remote sensing image small target detection method based on resolution enhancement is characterized by specifically comprising the following steps:
firstly, giving an original remote sensing image X and a label vector Y corresponding to a small target contained in the original remote sensing image X, wherein the small target contained in the original remote sensing image X is a target with the number of pixels within the range of (0, 100);
carrying out up-sampling and pre-segmentation processing on an original remote sensing image X to obtain a group of images X with the same size 1 ,X 2 ,...,X m And image X 1 ,X 2 ,...,X m The respective corresponding label vector Y 1 ,Y 2 ,…,Y m
Step two, respectively generating images X 1 ,X 2 ,...,X m Corresponding super-resolution image S 1 ,S 2 ,...,S m And based on the label vector Y 1 ,Y 2 ,…,Y m Obtaining a super-resolution image S 1 ,S 2 ,...,S m The label vectors K corresponding to each 1 ,K 2 ,…,K m
The specific process of the second step is as follows:
step two, establishing a generator sub-network
Figure FDA0003717083130000011
And discriminator subnetwork
Figure FDA0003717083130000012
Forming a countermeasure network;
step two, a group of remote sensing images with higher resolution ratio is given
Figure FDA0003717083130000013
By aligning remote sensing images
Figure FDA0003717083130000014
Down sampling to obtain remote sensing image
Figure FDA0003717083130000015
A corresponding set of remote sensing images with a lower resolution
Figure FDA0003717083130000016
N is the number of remote sensing images contained in each group;
step two and step three, utilizing remote sensing image
Figure FDA0003717083130000017
Training generator subnetwork
Figure FDA0003717083130000018
The problem that the generator sub-network needs to solve is described using equation (1):
Figure FDA0003717083130000019
in the formula, theta G Is a generator subnetwork
Figure FDA00037170831300000110
The set of parameters for all weights and offsets,
Figure FDA00037170831300000111
to obtain remote sensing images
Figure FDA00037170831300000112
When the generator subnetwork is input, the generator subnetwork outputs a reconstructed image; l SR To generate a loss function for the subnetwork;
loss function l SR The device consists of the following three parts:
Figure FDA00037170831300000113
wherein the content of the first and second substances,
Figure FDA00037170831300000114
as a function of content loss, γ 1 Is composed of
Figure FDA00037170831300000115
The weight parameter of (a) is determined,
Figure FDA00037170831300000116
to combat the loss function, gamma 2 Is composed of
Figure FDA00037170831300000117
The weight parameter of (a) is determined,
Figure FDA00037170831300000118
as a regularizing loss function, gamma 3 Is composed of
Figure FDA00037170831300000119
The weight parameter of (2);
Figure FDA00037170831300000120
wherein, W i,j And H i,j Respectively representing discriminator subnetworks
Figure FDA00037170831300000121
The width and height of the feature map output by the jth convolutional layer before the ith maximum pooling layer;
Figure FDA0003717083130000021
indicating when discriminator subnetwork
Figure FDA0003717083130000022
Input is as
Figure FDA0003717083130000023
Time, discriminator subnetwork
Figure FDA0003717083130000024
The value of pixel point (x, y) in the characteristic diagram output by the jth convolutional layer before the ith maximum pooling layer,
Figure FDA0003717083130000025
indicating when discriminator subnetwork
Figure FDA0003717083130000026
Input as a reconstructed image
Figure FDA0003717083130000027
Time, discriminator subnetwork
Figure FDA0003717083130000028
The value of a pixel point (x, y) in a feature map output by the jth convolutional layer before the ith maximum pooling layer; x is 1,2, …, W i,j ,y=1,2,…,H i,j
Figure FDA0003717083130000029
Wherein the content of the first and second substances,
Figure FDA00037170831300000210
representing the image to be reconstructed
Figure FDA00037170831300000211
Input discriminator subnetwork
Figure FDA00037170831300000212
Time, discriminator subnetwork
Figure FDA00037170831300000213
An output of (d);
Figure FDA00037170831300000214
wherein, | | | | represents a 1 norm, and r represents a reconstructed image
Figure FDA00037170831300000215
W and H denote reconstructed images
Figure FDA00037170831300000216
The width and the height of the base material,
Figure FDA00037170831300000217
representing reconstructed images
Figure FDA00037170831300000218
Performing pixel-by-pixel derivation, (x ', y') represents a pixel point in the reconstructed image, wherein x 'is 1,2, …, rW, y' is 1,2, …, rH;
using remote sensing images
Figure FDA00037170831300000219
And reconstructing the image
Figure FDA00037170831300000220
To train a discriminator subnetwork
Figure FDA00037170831300000221
The problem that needs to be solved by the discriminator sub-network is described by equation (6):
Figure FDA00037170831300000222
wherein the content of the first and second substances,
Figure FDA00037170831300000223
represents when the input is
Figure FDA00037170831300000224
Time, discriminator subnetwork
Figure FDA00037170831300000225
A probability value of the output;
Figure FDA00037170831300000226
representing when the input is a reconstructed image
Figure FDA00037170831300000227
Time, discriminator subnetwork
Figure FDA00037170831300000228
A probability value of the output; e [. C]Representing and expecting; theta D Is a discriminator subnetwork
Figure FDA00037170831300000229
A set of parameters for all weights and offsets;
solving theta satisfying the formula (1) and the formula (6) G And theta D Then, a well-trained generated countermeasure network is obtained;
step two and four, image X 1 ,X 2 ,...,X m Inputting into a trained generative confrontation network, and outputting an image X by a generator subnetwork in the generative confrontation network 1 ,X 2 ,...,X m Corresponding super-resolution image S 1 ,S 2 ,...,S m
Tag vector Y is transformed by matlab or python program 1 ,Y 2 ,…,Y m Multiplying the coordinates by 4 to obtain a processed label vector K 1 ,K 2 ,…,K m
Step three, utilizing the super-resolution image S 1 ,S 2 ,...,S m And super-resolution imageS 1 ,S 2 ,...,S m Corresponding label vector K 1 ,K 2 ,…,K m Training the deformable full convolution network based on the region until reaching the set maximum iteration number, and obtaining the trained deformable full convolution network based on the region;
the third step comprises the following specific processes:
step three, one, the super-resolution image S 1 ,S 2 ,...,S m And a super-resolution image S 1 ,S 2 ,...,S m Corresponding label vector K 1 ,K 2 ,…,K m Inputting a region-based deformable full convolution network, and extracting a super-resolution image S by using ResNet-101 in the region-based deformable full convolution network 1 ,S 2 ,...,S m The feature image of (2) is utilized to output a super-resolution image S 1 ,S 2 ,...,S m RoI of (1);
step three, mapping the region of interest RoI to a super-resolution image S 1 ,S 2 ,...,S m Obtaining the mapped characteristic image from the characteristic image;
thirdly, performing pooling operation on the mapped feature images to obtain pooling results, and performing averaging operation on the pooling results to obtain target classification results;
step three and four, extracting super-resolution image S 1 ,S 2 ,...,S m When the feature image is mapped, the coordinate of the boundary frame of the target is learned at the same time, when the mapped feature image is subjected to pooling operation, a position vector is generated for each RoI area at the same time, and the coordinate of the boundary frame and the position vector are aggregated into a 4-dimensional vector t (t is t) through averaging operation x ,t y ,t w ,t h ) In, t x Detect the top left corner x coordinate of the box, t, for the target y For the upper left-hand y coordinate, t, of the target detection box w Width of target detection frame, t h The height of the target detection frame;
step four, for the original remote sensing image M to be detected, the step one is carried out on the original remote sensing image MAfter the processing of the second step, a group of super-resolution images M with the same size of the original remote sensing image M are obtained 1 ,M 2 ,...,M m
The super-resolution image M 1 ,M 2 ,...,M m Inputting the deformable full convolution network based on the region trained in the third step to obtain the pair image M 1 ,M 2 ,...,M m The target detection result of (1);
and step five, combining the target detection results obtained in the step four to obtain a target detection result N of the original remote sensing image M.
2. The method for detecting the small target of the remote sensing image based on the resolution enhancement as claimed in claim 1, wherein the specific process of the first step is as follows:
carrying out up-sampling processing on an original remote sensing image X to obtain an image subjected to up-sampling processing; the images after the up-sampling processing are pre-divided, namely, the images after the up-sampling processing are divided into a group of images X with the same size 1 ,X 2 ,...,X m When image segmentation is performed, X is stored 1 ,X 2 ,...,X m Position information in the up-sampled image, m representing the total number of the divided images;
when the image is divided, the label vector Y corresponding to the small target contained in the original remote sensing image X is also divided and is used as the image X 1 ,X 2 ,...,X m Assigning corresponding segmented label vectors Y 1 ,Y 2 ,…,Y m
3. The method for detecting the small target of the remote sensing image based on the resolution enhancement as claimed in claim 2, wherein the convolution in the area-based deformable full convolution network adopts deformable convolution, and the deformable convolution is added with an offset Δ p on each convolution grid point n The modified formula is expressed as follows:
Figure FDA0003717083130000041
wherein p is 0 Point, p, representing the upper left corner of the convolution field n Is the relative offset, Δ p, of other points in the convolution field with respect to the top left corner n Is the offset, w (p), learned during convolution n ) The weight is represented by a weight that is,
Figure FDA0003717083130000042
representing the size of the convolution field, y (p) 0 ) Is the result of the convolution output, x (p) 0 +p n +Δp n ) Representing p in the input feature image after introducing the offset 0 +p n The pixel value of the dot.
4. The method for detecting the small target of the remote sensing image based on the resolution enhancement as claimed in claim 3, wherein the concrete process of the fifth step is as follows:
for image M 1 ,M 2 ,...,M m After the target detection result is zoomed, the zoom processing is carried out according to the image M 1 ,M 2 ,...,M m And combining the zooming processing results in the position information in the original remote sensing image M to obtain a target detection result N.
CN202010444356.XA 2020-05-22 2020-05-22 Resolution enhancement-based remote sensing image small target detection method Active CN111709307B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010444356.XA CN111709307B (en) 2020-05-22 2020-05-22 Resolution enhancement-based remote sensing image small target detection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010444356.XA CN111709307B (en) 2020-05-22 2020-05-22 Resolution enhancement-based remote sensing image small target detection method

Publications (2)

Publication Number Publication Date
CN111709307A CN111709307A (en) 2020-09-25
CN111709307B true CN111709307B (en) 2022-08-30

Family

ID=72537713

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010444356.XA Active CN111709307B (en) 2020-05-22 2020-05-22 Resolution enhancement-based remote sensing image small target detection method

Country Status (1)

Country Link
CN (1) CN111709307B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113420745B (en) * 2021-08-25 2021-12-24 江西中业智能科技有限公司 Image-based target identification method, system, storage medium and terminal equipment
CN114663671B (en) * 2022-02-21 2023-07-18 佳都科技集团股份有限公司 Target detection method, device, equipment and storage medium
CN115984846B (en) * 2023-02-06 2023-10-10 山东省人工智能研究院 Intelligent recognition method for small targets in high-resolution image based on deep learning
CN115953453B (en) * 2023-03-03 2023-08-15 国网吉林省电力有限公司信息通信公司 Substation geological deformation monitoring method based on image dislocation analysis and Beidou satellite

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108427920A (en) * 2018-02-26 2018-08-21 杭州电子科技大学 A kind of land and sea border defense object detection method based on deep learning
CN108510467A (en) * 2018-03-28 2018-09-07 西安电子科技大学 SAR image target recognition method based on variable depth shape convolutional neural networks
CN108596101A (en) * 2018-04-25 2018-09-28 上海交通大学 A kind of remote sensing images multi-target detection method based on convolutional neural networks
CN109299688A (en) * 2018-09-19 2019-02-01 厦门大学 Ship Detection based on deformable fast convolution neural network
CN110197255A (en) * 2019-04-29 2019-09-03 杰创智能科技股份有限公司 A kind of deformable convolutional network based on deep learning
CN110458166A (en) * 2019-08-19 2019-11-15 广东工业大学 A kind of hazardous material detection method, device and equipment based on deformable convolution
CN110728658A (en) * 2019-09-16 2020-01-24 武汉大学 High-resolution remote sensing image weak target detection method based on deep learning
CN111126385A (en) * 2019-12-13 2020-05-08 哈尔滨工程大学 Deep learning intelligent identification method for deformable living body small target

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10019631B2 (en) * 2015-11-05 2018-07-10 Qualcomm Incorporated Adapting to appearance variations when tracking a target object in video sequence

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108427920A (en) * 2018-02-26 2018-08-21 杭州电子科技大学 A kind of land and sea border defense object detection method based on deep learning
CN108510467A (en) * 2018-03-28 2018-09-07 西安电子科技大学 SAR image target recognition method based on variable depth shape convolutional neural networks
CN108596101A (en) * 2018-04-25 2018-09-28 上海交通大学 A kind of remote sensing images multi-target detection method based on convolutional neural networks
CN109299688A (en) * 2018-09-19 2019-02-01 厦门大学 Ship Detection based on deformable fast convolution neural network
CN110197255A (en) * 2019-04-29 2019-09-03 杰创智能科技股份有限公司 A kind of deformable convolutional network based on deep learning
CN110458166A (en) * 2019-08-19 2019-11-15 广东工业大学 A kind of hazardous material detection method, device and equipment based on deformable convolution
CN110728658A (en) * 2019-09-16 2020-01-24 武汉大学 High-resolution remote sensing image weak target detection method based on deep learning
CN111126385A (en) * 2019-12-13 2020-05-08 哈尔滨工程大学 Deep learning intelligent identification method for deformable living body small target

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于可变形卷积神经网络的遥感影像密集区域车辆检测方法;高鑫等;《电子与信息学报》;20180913(第12期);全文 *

Also Published As

Publication number Publication date
CN111709307A (en) 2020-09-25

Similar Documents

Publication Publication Date Title
CN111709307B (en) Resolution enhancement-based remote sensing image small target detection method
CN107330439B (en) Method for determining posture of object in image, client and server
CN111524135B (en) Method and system for detecting defects of tiny hardware fittings of power transmission line based on image enhancement
Zhou et al. Scale adaptive image cropping for UAV object detection
CN110517306B (en) Binocular depth vision estimation method and system based on deep learning
CN111582339B (en) Vehicle detection and recognition method based on deep learning
CN111368769A (en) Ship multi-target detection method based on improved anchor point frame generation model
CN111310609B (en) Video target detection method based on time sequence information and local feature similarity
CN112418165B (en) Small-size target detection method and device based on improved cascade neural network
CN107516322A (en) A kind of image object size based on logarithm pole space and rotation estimation computational methods
CN110633640A (en) Method for identifying complex scene by optimizing PointNet
CN113052057A (en) Traffic sign identification method based on improved convolutional neural network
CN112883971A (en) SAR image ship target detection method based on deep learning
CN114299405A (en) Unmanned aerial vehicle image real-time target detection method
CN114677479A (en) Natural landscape multi-view three-dimensional reconstruction method based on deep learning
CN114782417A (en) Real-time detection method for digital twin characteristics of fan based on edge enhanced image segmentation
CN114119621A (en) SAR remote sensing image water area segmentation method based on depth coding and decoding fusion network
CN113436251A (en) Pose estimation system and method based on improved YOLO6D algorithm
CN113111740A (en) Characteristic weaving method for remote sensing image target detection
CN117011648A (en) Haptic image dataset expansion method and device based on single real sample
CN114743023B (en) Wheat spider image detection method based on RetinaNet model
CN116129234A (en) Attention-based 4D millimeter wave radar and vision fusion method
CN114519819B (en) Remote sensing image target detection method based on global context awareness
CN113657225B (en) Target detection method
CN115861595A (en) Multi-scale domain self-adaptive heterogeneous image matching method based on deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant