CN111709307A - Resolution enhancement-based remote sensing image small target detection method - Google Patents

Resolution enhancement-based remote sensing image small target detection method Download PDF

Info

Publication number
CN111709307A
CN111709307A CN202010444356.XA CN202010444356A CN111709307A CN 111709307 A CN111709307 A CN 111709307A CN 202010444356 A CN202010444356 A CN 202010444356A CN 111709307 A CN111709307 A CN 111709307A
Authority
CN
China
Prior art keywords
image
remote sensing
resolution
sensing image
super
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010444356.XA
Other languages
Chinese (zh)
Other versions
CN111709307B (en
Inventor
谷延锋
叶树嘉
高国明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Institute of Technology
Original Assignee
Harbin Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Institute of Technology filed Critical Harbin Institute of Technology
Priority to CN202010444356.XA priority Critical patent/CN111709307B/en
Publication of CN111709307A publication Critical patent/CN111709307A/en
Application granted granted Critical
Publication of CN111709307B publication Critical patent/CN111709307B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • G06V20/13Satellite images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Astronomy & Astrophysics (AREA)
  • Remote Sensing (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

A resolution enhancement-based remote sensing image small target detection method belongs to the technical field of target detection in remote sensing images. The method solves the problem that the effect of detecting the small target in the remote sensing image by using the existing method is poor due to less available characteristic information of the small target in the remote sensing image and the existence of geometric deformation in the small target area. The invention carries out super-resolution processing on the remote sensing image containing the small target and then carries out target detection, thereby expanding the application range of a deep learning target detection model and realizing the deep utilization of the remote sensing image with higher spatial resolution. Aiming at the problems of few available characteristic information and geometric deformation of small and medium targets in the remote sensing image, the detail characteristic information of the small target is further perfected by adopting a super-resolution processing technology, the limited characteristic information of the small target is fully utilized by applying a deformable convolution network based on a region, and the detection capability of the small target in the remote sensing image is improved. The method can be applied to small target detection in the remote sensing image.

Description

Resolution enhancement-based remote sensing image small target detection method
Technical Field
The invention belongs to the technical field of target detection in remote sensing images, and particularly relates to a method for detecting a small target in a remote sensing image.
Background
The spatial resolution of the optical remote sensing image mainly depends on the instantaneous field angle of a satellite and the distance between the satellite and the earth surface, the size of the optical remote sensing image often depends on the performance of the satellite, and with the continuous development of remote sensing technology, more and more remote sensing satellites such as Worldview series, domestic GF series and the like are developed and emitted into the air, so that remote sensing images with higher and higher spatial resolution are obtained, small targets (such as small vehicles and the like) in the remote sensing images have richer texture characteristic information, and the possibility is provided for solving the small target detection problem of the remote sensing images by using a deep learning method.
The existing remote sensing image small target detection mainly faces two major difficulties:
although the spatial resolution of remote sensing images is continuously increased, the pixel range of small targets in the images is still very small (such as small vehicles), available characteristic information is very little, detection is directly carried out by adopting an algorithm designed for targets with normal sizes, and the detection effect is very unsatisfactory;
and secondly, when the monitoring satellite shoots, because the position and the motion state of the satellite are continuously changed and the like, the small target in the optical remote sensing image can be geometrically deformed. Since the pixel information in a small target area in the remote sensing image is very limited, even slight geometric deformation can have great influence on the detection result.
Disclosure of Invention
The invention aims to solve the problem that the existing method is poor in small target detection effect in a remote sensing image due to the fact that the available characteristic information of small targets in the remote sensing image is few and the small target area has geometric deformation, and provides a resolution enhancement-based remote sensing image small target detection method.
The technical scheme adopted by the invention for solving the technical problems is as follows: a remote sensing image small target detection method based on resolution enhancement comprises the following specific processes:
firstly, giving an original remote sensing image X and a label vector Y corresponding to a small target contained in the original remote sensing image X, wherein the small target contained in the original remote sensing image X is a target with the number of pixels within the range of (0, 100);
carrying out up-sampling and pre-segmentation processing on an original remote sensing image X to obtain a group of images X with the same size1,X2,...,XmAnd image X1,X2,...,XmThe respective corresponding label vector Y1,Y2,…,Ym
Step two, respectively generating images X1,X2,...,XmCorresponding super-resolution image S1,S2,...,SmAnd based on the label vector Y1,Y2,…,YmObtaining a super-resolution image S1,S2,...,SmThe label vectors K corresponding to each1,K2,…,Km
Step three, utilizing the super-resolution image S1,S2,...,SmAnd a super-resolution image S1,S2,...,SmCorresponding label vector K1,K2,…,KmTraining the deformable full convolution network based on the region until reaching the set maximum iteration number, and obtaining the trained deformable full convolution network based on the region;
step four, for the original remote sensing image M to be detected, processing the original remote sensing image M from the step one to the step two to obtain a group of super-resolution images M with the same size of the original remote sensing image M1,M2,...,Mm
The super-resolution image M1,M2,...,MmInputting the deformable full convolution network based on the region trained in the third step to obtain the pair image M1,M2,...,MmThe target detection result of (1);
and step five, combining the target detection results obtained in the step four to obtain a target detection result N of the original remote sensing image M.
The invention has the beneficial effects that: the invention provides a remote sensing image small target detection method based on resolution enhancement, which aims at the problems of few available characteristic information and geometric deformation of small and medium-sized targets in remote sensing images, adopts super-resolution processing technology to further perfect the detailed characteristic information of the small targets, applies a region-based deformable convolution network to fully utilize the limited characteristic information of the small targets, and improves the detection capability and the detection effect of the small targets in the remote sensing images.
In order to verify the performance of the method provided by the invention, the method is verified aiming at DOTA data sets from China resource satellite data and application centers, Google Earth, satellites JL-1, satellites GF-2 and the like. The experimental result verifies the effectiveness of the remote sensing image small target detection algorithm based on super-resolution processing. And dividing the experimental data set at random according to the proportion of a training set, a verification set and a test set which are 2:1:1, wherein the single-class detection precision reaches about 80%.
Drawings
FIG. 1 is a schematic flow chart of an implementation of the present invention;
FIG. 2a is a schematic diagram showing an image (including a small vehicle) in a DOTA dataset and a corresponding label of a small target detection box after label processing (only the small vehicle is reserved);
FIG. 2b is a schematic diagram of a set of image blocks of the same size obtained by upsampling and pre-dividing a remote sensing image;
FIG. 3a is a basic schematic diagram of generation of a countermeasure network;
FIG. 3b is a block diagram of a network structure of a generative model G (sub-generator network) corresponding to that of FIG. 3 a;
the method mainly adopts block layout, the core part of the network consists of a plurality of same residual blocks, and two deconvolution operations with the step length of 0.5 are used for improving the resolution of an output image in the output stage of the network;
n64 represents the number of convolution kernel filters, i.e., the dimension of the output feature map, and s represents the convolution step size;
FIG. 3c is a block diagram of the network structure of the discriminant model D (discriminator subnetwork) corresponding to FIG. 3 a;
the main network structure is VGG, the whole network comprises 8 convolutional layers, and two full-connection layers are used for mapping the characteristic graph into a probability value in the output stage of the network, wherein the probability value is the confidence score of whether the discriminator can discriminate the super-resolution image and the real image;
FIG. 3d shows the visual change of the remote sensing images before and after super resolution processing, where the left side is the original remote sensing image before super resolution processing and the right side is the super resolution remote sensing image after super resolution processing;
FIG. 4a illustrates a training network framework diagram of a region-based deformable convolutional network;
FIG. 4b shows a shape diagram of a normal convolution kernel;
FIG. 4c illustrates a shape diagram of a deformable convolution kernel;
they are obtained by adding a displacement (arrow) to the normal sample coordinates, indicating that the deformable convolution kernel can fit the severe deformation of the target;
FIG. 4d shows a special case of a deformable convolution as a scale transform;
FIG. 4e shows a special case of a deformable convolution as a rotational transformation;
fig. 5 is a result diagram obtained by combining the detection results of the segmented image blocks, and is an inversion of the preprocessing process to finally obtain the target detection result of the original remote sensing image.
Detailed Description
The first embodiment is as follows: this embodiment will be described with reference to fig. 1. The resolution enhancement-based remote sensing image small target detection method in the embodiment comprises the following specific processes:
firstly, giving an original remote sensing image X to be trained and a label vector Y corresponding to a small target contained in the original remote sensing image X, wherein the small target contained in the original remote sensing image X is a target with the number of pixels in the range of (0, 100);
carrying out up-sampling and pre-segmentation processing on an original remote sensing image X to obtain a group of images X with the same size1,X2,...,XmAnd image X1,X2,...,XmThe respective corresponding label vector Y1,Y2,…,Ym
Step two, respectively generating images X1,X2,...,XmCorresponding super-resolution image S1,S2,...,SmAnd based on the label vector Y1,Y2,…,YmObtaining a super-resolution image S1,S2,...,SmThe label vectors K corresponding to each1,K2,…,Km
Step three, utilizing the super-resolution image S1,S2,...,SmAnd a super-resolution image S1,S2,...,SmCorresponding label vector K1,K2,…,KmTraining the deformable full convolution network based on the region until reaching the set maximum iteration number, and obtaining the trained deformable full convolution network based on the region;
the region-based deformable full convolution network comprises ResNet-101, a convolution layer, RPN, ROI pooling layer and softmax classifier;
super-resolution image S1,S2,...,SmExtracting a characteristic image through ResNet-101, and reducing the dimension of the extracted characteristic image through a convolution layer to obtain a feature image after dimension reduction; at the same time, the super-resolution image S1,S2,...,SmOutputting a region of interest RoI through an RPN, mapping the region of interest RoI into a feature image after dimensionality reduction, performing pooling operation on the mapped image by using an ROI pooling layer, averaging results of the pooling operation, and inputting the averaged results into a softmax classifier to obtain a target classification result;
step four, for the original remote sensing image M to be detected, processing the original remote sensing image M from the step one to the step two to obtain a group of super-resolution images M with the same size of the original remote sensing image M1,M2,...,Mm
Super-resolutionImage M1,M2,...,MmInputting the deformable full convolution network based on the region trained in the third step to obtain the pair image M1,M2,...,MmThe target detection result of (1);
and step five, combining the target detection results obtained in the step four to obtain a target detection result N of the original remote sensing image M.
Image M1,M2,...,MmAnd inverting the target detection result to the target detection result under the original image M.
The invention realizes the super-resolution processing and then the target detection of the remote sensing image containing a small target (such as a small vehicle), expands the application range of the deep learning target detection model and realizes the deep utilization of the remote sensing image with higher spatial resolution.
The second embodiment is as follows: this embodiment will be described with reference to fig. 2a and 2 b. The first difference between the present embodiment and the specific embodiment is: the specific process of the step one is as follows:
carrying out up-sampling (the adjustment range is small) processing on an original remote sensing image X to obtain an image subjected to up-sampling processing; pre-dividing the up-sampled image, i.e. dividing the up-sampled image into a group of images X with the same size1,X2,...,XmWhen image segmentation is performed, X is stored1,X2,...,XmPosition information in the up-sampled image, m representing the total number of the divided images;
when the image is divided, the label vector Y corresponding to the small target contained in the original remote sensing image X is also divided and is used as the image X1,X2,...,XmAssigning corresponding segmented label vectors Y1,Y2,…,Ym
Segmenting the label vector Y to obtain a segmented label vector Y1,Y2,…,Ym
Other steps and parameters are the same as those in the first embodiment.
The third concrete implementation mode: this embodiment will be described with reference to fig. 3a, 3b, 3c and 3 d. The present embodiment differs from the first or second embodiment in that: the specific process of the second step is as follows:
step two, establishing a generator sub-network
Figure BDA0002505184590000051
And discriminator subnetwork
Figure BDA0002505184590000052
Forming a countermeasure network;
step two, a group of remote sensing images with higher resolution ratio is given
Figure BDA0002505184590000053
N1.. N, by comparing the remote sensing image
Figure BDA0002505184590000054
Down sampling to obtain remote sensing image
Figure BDA0002505184590000055
A corresponding set of remote sensing images with a lower resolution
Figure BDA0002505184590000056
N is the number of remote sensing images contained in each group;
wherein, the higher resolution refers to the remote sensing image
Figure BDA0002505184590000057
Relative to the remote sensing image
Figure BDA0002505184590000058
Is higher, lower resolution means remote sensing image
Figure BDA0002505184590000059
Relative to the remote sensing image
Figure BDA00025051845900000510
Is lower.
Step two and step three, utilizing remote sensing image
Figure BDA00025051845900000511
Training generator subnetwork
Figure BDA00025051845900000512
The problem that the generator sub-network needs to solve is described using equation (1):
Figure BDA00025051845900000513
in the formula, thetaG}W1:L;b1:L},θGIs a generator subnetwork
Figure BDA00025051845900000514
The set of parameters for all weights and offsets,
Figure BDA00025051845900000515
to obtain remote sensing images
Figure BDA00025051845900000516
When the generator sub-network is input, the reconstructed image output by the generator sub-network; lSRTo generate a loss function for the subnetwork;
loss function lSRThe device consists of the following three parts:
Figure BDA00025051845900000517
wherein the content of the first and second substances,
Figure BDA00025051845900000518
as a function of content loss, γ1Is composed of
Figure BDA00025051845900000519
The weight parameter of (a) is determined,
Figure BDA00025051845900000520
to combat the loss function, gamma2Is composed of
Figure BDA00025051845900000521
The weight parameter of (a) is determined,
Figure BDA00025051845900000522
for regularizing the loss function, gamma3Is composed of
Figure BDA00025051845900000523
The weight parameter of (2);
Figure BDA00025051845900000524
wherein, Wi,jAnd Hi,jRespectively representing discriminator sub-networks
Figure BDA00025051845900000525
The width and height of the feature map output by the jth convolutional layer before the ith maximum pooling layer (i is 5, j is 4, and may be selected practically and arbitrarily in the present invention);
Figure BDA00025051845900000526
indicating when discriminator subnetwork
Figure BDA00025051845900000527
Input is as
Figure BDA00025051845900000528
Time, discriminator subnetwork
Figure BDA00025051845900000529
The value of pixel point (x, y) in the characteristic diagram output by the jth convolutional layer before the ith maximum pooling layer,
Figure BDA00025051845900000530
indicating when discriminator subnetwork
Figure BDA00025051845900000531
Input as a reconstructed image
Figure BDA00025051845900000532
Time, discriminator subnetwork
Figure BDA00025051845900000533
The value of a pixel point (x, y) in a feature map output by the jth convolutional layer before the ith maximum pooling layer; x is 1,2, …, Wi,j,y=1,2,…,Hi,j
Figure BDA0002505184590000061
Wherein the content of the first and second substances,
Figure BDA0002505184590000062
representing the image to be reconstructed
Figure BDA0002505184590000063
Input discriminator subnetwork
Figure BDA0002505184590000064
Time, discriminator subnetwork
Figure BDA0002505184590000065
An output of (d);
Figure BDA0002505184590000066
wherein, | | | | represents a 1 norm, and r represents a reconstructed image
Figure BDA0002505184590000067
W and H denote reconstructed images
Figure BDA0002505184590000068
The width and the height of the base material,
Figure BDA0002505184590000069
representing reconstructed images
Figure BDA00025051845900000610
Performing pixel-by-pixel derivation, (x ', y') represents a pixel point in the reconstructed image, wherein x 'is 1,2, …, rW, y' is 1,2, …, rH;
using remote sensing images
Figure BDA00025051845900000611
And reconstructing the image
Figure BDA00025051845900000612
To train a discriminator subnetwork
Figure BDA00025051845900000613
N-1.. N, the problem to be solved by the discriminator subnetwork is described by equation (6):
Figure BDA00025051845900000614
wherein the content of the first and second substances,
Figure BDA00025051845900000615
represents when the input is
Figure BDA00025051845900000616
Time, discriminator subnetwork
Figure BDA00025051845900000617
A probability value of the output;
Figure BDA00025051845900000618
representing when the input is a reconstructed image
Figure BDA00025051845900000619
Time, discriminator subnetwork
Figure BDA00025051845900000620
A probability value of the output; e [. C]Representing expectations;θDIs a discriminator subnetwork
Figure BDA00025051845900000621
A set of parameters for all weights and offsets;
solving theta satisfying the formula (1) and the formula (6)GAnd thetaDThen, a well-trained generated countermeasure network is obtained;
after training and generating the confrontation network, image X is taken1,X2,...,XmThe super-resolution image S can be obtained by inputting the generation countermeasure network1,S2,...,SmWhen the confrontation network is generated for training, a discriminator subnetwork is needed to be utilized, and the cooperation of the two networks is needed during training;
step two and four, image X1,X2,...,XmInputting into a trained generative confrontation network, and outputting an image X by a generator subnetwork in the generative confrontation network1,X2,...,XmCorresponding super-resolution image S1,S2,...,Sm
Tag vector Y is transformed by matlab or python program1,Y2,…,YmMultiplying the coordinates by 4 to obtain a processed label vector K1,K2,…,Km
Vector Y of labels1,Y2,…,YmBy 4 is meant: the label vector is essentially a vector consisting of four vertex coordinates of a labeling frame of a small target, and the label vector Y is1,Y2,…,YmMultiplying 4 by the coordinate of (1) means that each element in the vector is correspondingly multiplied by 4 to obtain a processed label vector K1,K2,…,KmProcessed label vector K1,K2,…,KmAnd image X1,X2,...,XmAnd correspond to each other. In fact, the super-resolution image S1,S2,...,SmCan be set to image X1,X2,...,XmAny multiple of the resolution of (1), other multiples being achieved by modifying parameters that generate the countermeasure network, will be targetedSign vector Y1,Y2,…,YmAlso multiply the corresponding multiple, in the present invention, the super-resolution image S1,S2,...,SmIs the image X1,X2,...,Xm4 times the resolution.
Other steps and parameters are the same as those in the first or second embodiment.
Generator subnetwork GθGThe generator subnetwork G is mainly in block layout, as shown in FIG. 3b, the core part of the network is composed of a plurality of identical residual blocks, each of which is composed of a convolutional layer, a Bulk Normalization (BN) layer and a linear rectification function (ReLU) layer, specifically, the convolutional core of each convolutional layer is 3 × 3, the feature map dimension of each residual block output is 64, and the feature map resolution is kept unchanged by a padding operation during the convolution process.
Discriminator subnetwork DθDAs shown in fig. 3c, for the discriminator subnetwork D, the main network structure is VGG, and the LeakyReLU layer is used for activation at the end of each layer of the convolutional network, and the BN layer is used for normalization before the feature map is output. The network integrally comprises 8 convolutional layers, the step length of the convolutional layers and the number of the filters are continuously increased along with the continuous deepening of the network, so that the resolution of the feature map is continuously reduced, the dimension of the feature map is continuously increased, the last convolutional layer can output a 512-dimensional low-resolution feature map, in the output stage of the network, two full-connection layers are used for mapping the feature map into a probability value, and the probability value is the confidence score of whether the discriminator can discriminate the super-resolution image from the real image.
The fourth concrete implementation mode: this embodiment will be described with reference to fig. 4 a. The difference between this embodiment mode and one of the first to third embodiment modes is: the specific process of the third step is as follows:
step three, firstly, obtaining the super-resolution image S1,S2,...,SmAnd a super-resolution image S1,S2,...,SmCorresponding label vector K1,K2,…,KmInputting a region-based deformable full convolution network, and extracting a super-resolution image S by using ResNet-101 in the region-based deformable full convolution network1,S2,...,SmUsing the RPN to output a super-resolution image S1,S2,...,SmRoI of (1);
the invention adopts ResNet-101 as a feature extraction network based on the R-FCN thought, the ResNet-101 is provided with 100 convolutional layers in front, and is followed by a global average pooling layer and 1000 classes of full-connection layers, the invention deletes the subsequent average pooling layer and full-connection layer, and calculates the feature mapping by using the convolutional layers only, and adopts a transfer learning method, firstly pre-training on ImageNet to obtain a trained ResNet-101 classification network, removing the subsequent classification layer and loss calculation part, only keeping the feature extraction part therein, and inserting a randomly initialized convolutional layer with the size of 1024 × 1 × 1 into the network to reduce the dimension of the 2048-dimensional convolutional layer.
Step three, mapping the region of interest RoI to a super-resolution image S1,S2,...,SmObtaining the mapped characteristic image from the characteristic image;
thirdly, performing pooling operation on the mapped feature images to obtain pooling results, and performing averaging operation on the pooling results to obtain target classification results;
for example, for the (i, j) th chunk (0 ≦ i, j ≦ k-1), a location-sensitive RoI pooling operation is defined that pools only in the (i, j) th chunk:
Figure BDA0002505184590000081
wherein r isc(i, j | Θ) is the pooled value of the c-th class in the (i, j) -th chunk, zi,j,cIs a score mapping for the c-th category, (x)0,y0) The top left corner element representing the RoI area, n the number of elements in the chunk, and Θ all the learning parameters of the network. (i, j) th blockThe occupied area is as follows:
Figure BDA0002505184590000082
and
Figure BDA0002505184590000083
w and h represent the width and height of each RoI region, respectively, dividing the RoI region into k2And (4) a plurality of parts.
Next, averaging the pooled results to obtain a final classification result, wherein each RoI generates a (C +1) -dimensional vector after averaging: r isc(Θ)=∑i,jrc(i, j | Θ). Then, its softmax response is calculated:
Figure BDA0002505184590000084
step three and four, extracting super-resolution image S1,S2,...,SmWhen the feature image is mapped, the coordinate of the boundary frame of the target is learned at the same time, when the mapped feature image is subjected to pooling operation, a position vector is generated for each RoI area at the same time, and the coordinate of the boundary frame and the position vector are aggregated into a 4-dimensional vector t (t is t) through averaging operationx,ty,tw,th) In, txDetect the top left corner x coordinate, t, of the box for the targetyFor the upper left-hand y coordinate, t, of the target detection boxwWidth of target detection frame, thThe height of the target detection box.
The fifth concrete implementation mode: this embodiment will be described with reference to fig. 4b to 4 e. The difference between this embodiment and one of the first to fourth embodiments is: the convolution in the region-based deformable full convolution network adopts deformable convolution, and the deformable convolution adds an offset delta p to each convolution grid pointnThe modified formula is expressed as follows:
Figure BDA0002505184590000091
wherein p is0Representing the upper left corner of the convolution fieldPoint, pnIs the relative offset, Δ p, of other points in the convolution field with respect to the top left cornernIs the offset, w (p), learned during convolutionn) Representing the number of convolution kernel filters, i.e. weights,
Figure BDA0002505184590000095
representing the sampling grid, i.e. the convolution field size, y (p)0) Is the result of the convolution output, x (p)0+pn+Δpn) Representing p in the input feature image after introducing the offset0+pnThe pixel value of the dot.
Other steps and parameters are the same as in one of the first to fourth embodiments.
The present invention replaces the conventional convolution structure with a deformable convolution structure. The conventional convolution structure is defined as follows, where pnIs the ratio of each point on the receptive field to p0The amount of offset of (c).
Figure BDA0002505184590000092
Wherein p is0Is the point in the upper left corner of the convolution receptive field,
Figure BDA0002505184590000093
is a convolved sampling grid.
The sixth specific implementation mode: this embodiment will be described with reference to fig. 5. The difference between this embodiment and one of the first to fifth embodiments is: the concrete process of the step five is as follows:
for image M1,M2,...,MmAfter the target detection result is zoomed, the zoom processing is carried out according to the image M1,M2,...,MmAnd combining the zooming processing results in the position information in the original remote sensing image M to obtain a target detection result N.
Other steps and parameters are the same as those in one of the first to fifth embodiments.
The following examples were used to demonstrate the beneficial effects of the present invention:
the first embodiment is as follows:
the resolution enhancement-based remote sensing image small target detection method is specifically carried out according to the following steps:
the data used in the experiment are DOTA data sets with different resolutions obtained according to an open satellite, and fig. 2a shows an image (containing a small vehicle) in the DOTA data set and a corresponding labeling situation of a small target detection frame after label processing (only the small vehicle is reserved); FIG. 2b is a group of image blocks with the same size obtained by performing upsampling and pre-segmentation on a remote sensing image, and position information of an original image is saved when the image is cut;
table 1 shows that three target detection models are trained and tested under different training sample (remote sensing image data sets before and after the super-resolution) data conditions to obtain single-class detection Accuracy (AP) experimental results;
TABLE 1 Single class detection accuracy (% Unit)
Figure BDA0002505184590000094
Figure BDA0002505184590000101
FIG. 5 is a result obtained by combining the detection results of the segmented image blocks, which is an inversion of the preprocessing process, and a target detection result of the original remote sensing image is finally obtained; table 2 shows the experimental results obtained by training and testing six target detection models using the remote sensing image dataset after the super-resolution, wherein the last item is the experimental result of the target detection algorithm used in the present invention, and the last item includes three indexes, namely, single-class detection Accuracy (AP), training time (train _ time), and detection time (test _ time). By comparison, the algorithm provided by the invention has the best effect under all indexes, and the single-class detection precision on the DOTA data set reaches about 80%.
TABLE 2
Figure BDA0002505184590000102
The present invention is capable of other embodiments and its several details are capable of modifications in various obvious respects, all without departing from the spirit and scope of the present invention.

Claims (6)

1. A remote sensing image small target detection method based on resolution enhancement is characterized by specifically comprising the following steps:
firstly, giving an original remote sensing image X and a label vector Y corresponding to a small target contained in the original remote sensing image X, wherein the small target contained in the original remote sensing image X is a target with the number of pixels within the range of (0, 100);
carrying out up-sampling and pre-segmentation processing on an original remote sensing image X to obtain a group of images X with the same size1,X2,...,XmAnd image X1,X2,...,XmThe respective corresponding label vector Y1,Y2,…,Ym
Step two, respectively generating images X1,X2,...,XmCorresponding super-resolution image S1,S2,...,SmAnd based on the label vector Y1,Y2,…,YmObtaining a super-resolution image S1,S2,...,SmThe label vectors K corresponding to each1,K2,…,Km
Step three, utilizing the super-resolution image S1,S2,...,SmAnd a super-resolution image S1,S2,...,SmCorresponding label vector K1,K2,…,KmTraining the deformable full convolution network based on the region until reaching the set maximum iteration number, and obtaining the trained deformable full convolution network based on the region;
step four, for the original remote sensing image M to be detected, the original remote sensing image M is subjected to original remote sensingAfter the sensed image M is processed from the first step to the second step, a group of super-resolution images M with the same size of the original remote sensing image M are obtained1,M2,...,Mm
The super-resolution image M1,M2,...,MmInputting the deformable full convolution network based on the region trained in the third step to obtain the pair image M1,M2,...,MmThe target detection result of (1);
and step five, combining the target detection results obtained in the step four to obtain a target detection result N of the original remote sensing image M.
2. The method for detecting the small target of the remote sensing image based on the resolution enhancement as claimed in claim 1, wherein the specific process of the first step is as follows:
carrying out up-sampling processing on an original remote sensing image X to obtain an image subjected to up-sampling processing; pre-dividing the up-sampled image, i.e. dividing the up-sampled image into a group of images X with the same size1,X2,...,XmWhen image segmentation is performed, X is stored1,X2,...,XmPosition information in the up-sampled image, m representing the total number of the divided images;
when the image is divided, the label vector Y corresponding to the small target contained in the original remote sensing image X is also divided and is used as the image X1,X2,...,XmAssigning corresponding segmented label vectors Y1,Y2,…,Ym
3. The method for detecting the small target of the remote sensing image based on the resolution enhancement as claimed in claim 2, wherein the specific process of the second step is as follows:
step two, establishing a generator sub-network
Figure FDA0002505184580000021
And discriminator subnetwork
Figure FDA0002505184580000022
Forming a countermeasure network;
step two, a group of remote sensing images with higher resolution ratio is given
Figure FDA0002505184580000023
By aligning remote sensing images
Figure FDA0002505184580000024
Down sampling to obtain remote sensing image
Figure FDA0002505184580000025
A corresponding set of remote sensing images with a lower resolution
Figure FDA0002505184580000026
N is the number of remote sensing images contained in each group;
step two and step three, utilizing remote sensing image
Figure FDA0002505184580000027
Training generator subnetwork
Figure FDA0002505184580000028
The problem that the generator sub-network needs to solve is described using equation (1):
Figure FDA0002505184580000029
in the formula, thetaGIs a generator subnetwork
Figure FDA00025051845800000210
The set of parameters for all weights and offsets,
Figure FDA00025051845800000211
to obtain a remote sensing mapImage
Figure FDA00025051845800000212
When the generator sub-network is input, the reconstructed image output by the generator sub-network; lSRTo generate a loss function for the subnetwork;
loss function lSRThe device consists of the following three parts:
Figure FDA00025051845800000213
wherein the content of the first and second substances,
Figure FDA00025051845800000214
as a function of content loss, γ1Is composed of
Figure FDA00025051845800000215
The weight parameter of (a) is determined,
Figure FDA00025051845800000216
to combat the loss function, gamma2Is composed of
Figure FDA00025051845800000217
The weight parameter of (a) is determined,
Figure FDA00025051845800000218
for regularizing the loss function, gamma3Is composed of
Figure FDA00025051845800000219
The weight parameter of (2);
Figure FDA00025051845800000220
wherein, Wi,jAnd Hi,jRespectively representing discriminator sub-networks
Figure FDA00025051845800000221
The width and height of the feature map output by the jth convolutional layer before the ith maximum pooling layer;
Figure FDA00025051845800000222
indicating when discriminator subnetwork
Figure FDA00025051845800000223
Input is as
Figure FDA00025051845800000224
Time, discriminator subnetwork
Figure FDA00025051845800000225
The value of pixel point (x, y) in the characteristic diagram output by the jth convolutional layer before the ith maximum pooling layer,
Figure FDA00025051845800000226
indicating when discriminator subnetwork
Figure FDA00025051845800000227
Input as a reconstructed image
Figure FDA00025051845800000228
Time, discriminator subnetwork
Figure FDA00025051845800000229
The value of a pixel point (x, y) in a feature map output by the jth convolutional layer before the ith maximum pooling layer; x is 1,2, …, Wi,j,y=1,2,…,Hi,j
Figure FDA00025051845800000230
Wherein the content of the first and second substances,
Figure FDA00025051845800000231
representing the image to be reconstructed
Figure FDA00025051845800000232
Input discriminator subnetwork
Figure FDA00025051845800000233
Time, discriminator subnetwork
Figure FDA0002505184580000031
An output of (d);
Figure FDA0002505184580000032
wherein, | | | | represents a 1 norm, and r represents a reconstructed image
Figure FDA0002505184580000033
W and H denote reconstructed images
Figure FDA0002505184580000034
The width and the height of the base material,
Figure FDA0002505184580000035
representing reconstructed images
Figure FDA0002505184580000036
Performing pixel-by-pixel derivation, (x ', y') represents a pixel point in the reconstructed image, wherein x 'is 1,2, …, rW, y' is 1,2, …, rH;
using remote sensing images
Figure FDA0002505184580000037
And reconstructing the image
Figure FDA0002505184580000038
To train a discriminator subnetwork
Figure FDA0002505184580000039
Then use formula (6) toDescribing the problem that the discriminator sub-network needs to solve:
Figure FDA00025051845800000310
wherein the content of the first and second substances,
Figure FDA00025051845800000311
represents when the input is
Figure FDA00025051845800000312
Time, discriminator subnetwork
Figure FDA00025051845800000313
A probability value of the output;
Figure FDA00025051845800000314
representing when the input is a reconstructed image
Figure FDA00025051845800000315
Time, discriminator subnetwork
Figure FDA00025051845800000316
A probability value of the output; e [. C]Representing and expecting; thetaDIs a discriminator subnetwork
Figure FDA00025051845800000317
A set of parameters for all weights and offsets;
solving theta satisfying the formula (1) and the formula (6)GAnd thetaDThen, a well-trained generated countermeasure network is obtained;
step two and four, image X1,X2,...,XmInputting into a trained generative confrontation network, and outputting an image X by a generator subnetwork in the generative confrontation network1,X2,...,XmCorresponding super-resolution image S1,S2,...,Sm
Using matlab orThe python program will label vector Y1,Y2,…,YmMultiplying the coordinates by 4 to obtain a processed label vector K1,K2,…,Km
4. The method for detecting the small target of the remote sensing image based on the resolution enhancement is characterized in that the specific process of the third step is as follows:
step three, firstly, obtaining the super-resolution image S1,S2,...,SmAnd a super-resolution image S1,S2,...,SmCorresponding label vector K1,K2,…,KmInputting a region-based deformable full convolution network, and extracting a super-resolution image S by using ResNet-101 in the region-based deformable full convolution network1,S2,...,SmUsing the RPN to output a super-resolution image S1,S2,...,SmRoI of (1);
step three, mapping the region of interest RoI to a super-resolution image S1,S2,...,SmObtaining the mapped characteristic image from the characteristic image;
thirdly, performing pooling operation on the mapped feature images to obtain pooling results, and performing averaging operation on the pooling results to obtain target classification results;
step three and four, extracting super-resolution image S1,S2,...,SmWhen the feature image is mapped, the coordinate of the boundary frame of the target is learned at the same time, when the mapped feature image is subjected to pooling operation, a position vector is generated for each RoI area at the same time, and the coordinate of the boundary frame and the position vector are aggregated into a 4-dimensional vector t (t is t) through averaging operationx,ty,tw,th) In, txDetect the top left corner x coordinate, t, of the box for the targetyFor the upper left-hand y coordinate, t, of the target detection boxwWidth of target detection frame, thThe height of the target detection box.
5. The method for detecting the small target of the remote sensing image based on the resolution enhancement as claimed in claim 4, wherein the convolution in the area-based deformable full convolution network adopts deformable convolution, and the deformable convolution is added with an offset Δ p on each convolution grid pointnThe modified formula is expressed as follows:
Figure FDA0002505184580000041
wherein p is0Point, p, representing the upper left corner of the convolution fieldnIs the relative offset, Δ p, of other points in the convolution field with respect to the top left cornernIs the offset, w (p), learned during convolutionn) The weight is represented by a weight that is,
Figure FDA0002505184580000042
representing the size of the convolution field, y (p)0) Is the result of the convolution output, x (p)0+pn+Δpn) Representing p in the input feature image after introducing the offset0+pnThe pixel value of the dot.
6. The method for detecting the small target of the remote sensing image based on the resolution enhancement as claimed in claim 5, wherein the concrete process of the fifth step is as follows:
for image M1,M2,...,MmAfter the target detection result is zoomed, the zoom processing is carried out according to the image M1,M2,...,MmAnd combining the zooming processing results in the position information in the original remote sensing image M to obtain a target detection result N.
CN202010444356.XA 2020-05-22 2020-05-22 Resolution enhancement-based remote sensing image small target detection method Active CN111709307B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010444356.XA CN111709307B (en) 2020-05-22 2020-05-22 Resolution enhancement-based remote sensing image small target detection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010444356.XA CN111709307B (en) 2020-05-22 2020-05-22 Resolution enhancement-based remote sensing image small target detection method

Publications (2)

Publication Number Publication Date
CN111709307A true CN111709307A (en) 2020-09-25
CN111709307B CN111709307B (en) 2022-08-30

Family

ID=72537713

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010444356.XA Active CN111709307B (en) 2020-05-22 2020-05-22 Resolution enhancement-based remote sensing image small target detection method

Country Status (1)

Country Link
CN (1) CN111709307B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113420745A (en) * 2021-08-25 2021-09-21 江西中业智能科技有限公司 Image-based target identification method, system, storage medium and terminal equipment
CN114663671A (en) * 2022-02-21 2022-06-24 佳都科技集团股份有限公司 Target detection method, device, equipment and storage medium
CN115953453A (en) * 2023-03-03 2023-04-11 国网吉林省电力有限公司信息通信公司 Transformer substation geological deformation monitoring method based on image dislocation analysis and Beidou satellite
CN115984846A (en) * 2023-02-06 2023-04-18 山东省人工智能研究院 Intelligent identification method for small target in high-resolution image based on deep learning

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170132472A1 (en) * 2015-11-05 2017-05-11 Qualcomm Incorporated Generic mapping for tracking target object in video sequence
CN108427920A (en) * 2018-02-26 2018-08-21 杭州电子科技大学 A kind of land and sea border defense object detection method based on deep learning
CN108510467A (en) * 2018-03-28 2018-09-07 西安电子科技大学 SAR image target recognition method based on variable depth shape convolutional neural networks
CN108596101A (en) * 2018-04-25 2018-09-28 上海交通大学 A kind of remote sensing images multi-target detection method based on convolutional neural networks
CN109299688A (en) * 2018-09-19 2019-02-01 厦门大学 Ship Detection based on deformable fast convolution neural network
CN110197255A (en) * 2019-04-29 2019-09-03 杰创智能科技股份有限公司 A kind of deformable convolutional network based on deep learning
CN110458166A (en) * 2019-08-19 2019-11-15 广东工业大学 A kind of hazardous material detection method, device and equipment based on deformable convolution
CN110728658A (en) * 2019-09-16 2020-01-24 武汉大学 High-resolution remote sensing image weak target detection method based on deep learning
CN111126385A (en) * 2019-12-13 2020-05-08 哈尔滨工程大学 Deep learning intelligent identification method for deformable living body small target

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170132472A1 (en) * 2015-11-05 2017-05-11 Qualcomm Incorporated Generic mapping for tracking target object in video sequence
CN108427920A (en) * 2018-02-26 2018-08-21 杭州电子科技大学 A kind of land and sea border defense object detection method based on deep learning
CN108510467A (en) * 2018-03-28 2018-09-07 西安电子科技大学 SAR image target recognition method based on variable depth shape convolutional neural networks
CN108596101A (en) * 2018-04-25 2018-09-28 上海交通大学 A kind of remote sensing images multi-target detection method based on convolutional neural networks
CN109299688A (en) * 2018-09-19 2019-02-01 厦门大学 Ship Detection based on deformable fast convolution neural network
CN110197255A (en) * 2019-04-29 2019-09-03 杰创智能科技股份有限公司 A kind of deformable convolutional network based on deep learning
CN110458166A (en) * 2019-08-19 2019-11-15 广东工业大学 A kind of hazardous material detection method, device and equipment based on deformable convolution
CN110728658A (en) * 2019-09-16 2020-01-24 武汉大学 High-resolution remote sensing image weak target detection method based on deep learning
CN111126385A (en) * 2019-12-13 2020-05-08 哈尔滨工程大学 Deep learning intelligent identification method for deformable living body small target

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
高鑫等: "基于可变形卷积神经网络的遥感影像密集区域车辆检测方法", 《电子与信息学报》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113420745A (en) * 2021-08-25 2021-09-21 江西中业智能科技有限公司 Image-based target identification method, system, storage medium and terminal equipment
CN114663671A (en) * 2022-02-21 2022-06-24 佳都科技集团股份有限公司 Target detection method, device, equipment and storage medium
CN114663671B (en) * 2022-02-21 2023-07-18 佳都科技集团股份有限公司 Target detection method, device, equipment and storage medium
CN115984846A (en) * 2023-02-06 2023-04-18 山东省人工智能研究院 Intelligent identification method for small target in high-resolution image based on deep learning
CN115984846B (en) * 2023-02-06 2023-10-10 山东省人工智能研究院 Intelligent recognition method for small targets in high-resolution image based on deep learning
CN115953453A (en) * 2023-03-03 2023-04-11 国网吉林省电力有限公司信息通信公司 Transformer substation geological deformation monitoring method based on image dislocation analysis and Beidou satellite
CN115953453B (en) * 2023-03-03 2023-08-15 国网吉林省电力有限公司信息通信公司 Substation geological deformation monitoring method based on image dislocation analysis and Beidou satellite

Also Published As

Publication number Publication date
CN111709307B (en) 2022-08-30

Similar Documents

Publication Publication Date Title
CN111709307B (en) Resolution enhancement-based remote sensing image small target detection method
CN107330439B (en) Method for determining posture of object in image, client and server
CN111524135B (en) Method and system for detecting defects of tiny hardware fittings of power transmission line based on image enhancement
Zhou et al. Scale adaptive image cropping for UAV object detection
CN110517306B (en) Binocular depth vision estimation method and system based on deep learning
CN111582339B (en) Vehicle detection and recognition method based on deep learning
CN111368769A (en) Ship multi-target detection method based on improved anchor point frame generation model
CN107516322A (en) A kind of image object size based on logarithm pole space and rotation estimation computational methods
CN109712149B (en) Image segmentation method based on wavelet energy and fuzzy C-means
CN111310609B (en) Video target detection method based on time sequence information and local feature similarity
CN112418165B (en) Small-size target detection method and device based on improved cascade neural network
CN113516693B (en) Rapid and universal image registration method
CN110633640A (en) Method for identifying complex scene by optimizing PointNet
CN113052057A (en) Traffic sign identification method based on improved convolutional neural network
CN112883971A (en) SAR image ship target detection method based on deep learning
CN114299405A (en) Unmanned aerial vehicle image real-time target detection method
CN114677479A (en) Natural landscape multi-view three-dimensional reconstruction method based on deep learning
CN114519819B (en) Remote sensing image target detection method based on global context awareness
Gao et al. Bayesian image super-resolution with deep modeling of image statistics
CN114782417A (en) Real-time detection method for digital twin characteristics of fan based on edge enhanced image segmentation
CN113657225B (en) Target detection method
CN111292308A (en) Convolutional neural network-based infrared defect detection method for photovoltaic solar panel
CN114119621A (en) SAR remote sensing image water area segmentation method based on depth coding and decoding fusion network
CN117011648A (en) Haptic image dataset expansion method and device based on single real sample
CN117392545A (en) SAR image target detection method based on deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant