CN111709307B - Resolution enhancement-based remote sensing image small target detection method - Google Patents
Resolution enhancement-based remote sensing image small target detection method Download PDFInfo
- Publication number
- CN111709307B CN111709307B CN202010444356.XA CN202010444356A CN111709307B CN 111709307 B CN111709307 B CN 111709307B CN 202010444356 A CN202010444356 A CN 202010444356A CN 111709307 B CN111709307 B CN 111709307B
- Authority
- CN
- China
- Prior art keywords
- image
- remote sensing
- resolution
- sensing image
- super
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/10—Terrestrial scenes
- G06V20/13—Satellite images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Abstract
A resolution enhancement-based remote sensing image small target detection method belongs to the technical field of target detection in remote sensing images. The method solves the problem that the effect of detecting the small target in the remote sensing image by using the existing method is poor due to less available characteristic information of the small target in the remote sensing image and the geometric deformation of the small target area. The invention carries out super-resolution processing on the remote sensing image containing the small target and then carries out target detection, thereby expanding the application range of a deep learning target detection model and realizing the deep utilization of the remote sensing image with higher spatial resolution. Aiming at the problems of few available characteristic information and geometric deformation of small and medium targets in the remote sensing image, the method further improves the detailed characteristic information of the small target by adopting a super-resolution processing technology, fully utilizes the limited characteristic information of the small target by applying a deformable convolution network based on a region, and improves the detection capability of the small target in the remote sensing image. The method can be applied to small target detection in the remote sensing image.
Description
Technical Field
The invention belongs to the technical field of target detection in remote sensing images, and particularly relates to a method for detecting a small target in a remote sensing image.
Background
The spatial resolution of the optical remote sensing image mainly depends on the instantaneous field angle of a satellite and the distance between the satellite and the earth surface, the size of the optical remote sensing image often depends on the performance of the satellite, and with the continuous development of remote sensing technology, more and more remote sensing satellites such as Worldview series, domestic GF series and the like are developed and emitted into the air, so that remote sensing images with higher and higher spatial resolution are obtained, small targets (such as small vehicles and the like) in the remote sensing images have richer texture characteristic information, and the possibility is provided for solving the small target detection problem of the remote sensing images by using a deep learning method.
The existing remote sensing image small target detection mainly faces two difficulties:
although the spatial resolution of remote sensing images is continuously increased, the range of small target pixels in the images is still very small (such as small vehicles), available characteristic information is very little, detection is directly carried out by adopting an algorithm designed for targets with normal sizes, and the detection effect is very unsatisfactory;
and secondly, when the monitoring satellite shoots, because the position and the motion state of the satellite are continuously changed and the like, the small target in the optical remote sensing image can be geometrically deformed. Since the pixel information in a small target area in the remote sensing image is very limited, even slight geometric deformation can have great influence on the detection result.
Disclosure of Invention
The invention aims to solve the problem that the existing method is poor in small target detection effect in a remote sensing image due to the fact that the available characteristic information of small targets in the remote sensing image is few and the small target area has geometric deformation, and provides a resolution enhancement-based remote sensing image small target detection method.
The technical scheme adopted by the invention for solving the technical problems is as follows: a remote sensing image small target detection method based on resolution enhancement comprises the following specific processes:
firstly, giving an original remote sensing image X and a label vector Y corresponding to a small target contained in the original remote sensing image X, wherein the small target contained in the original remote sensing image X is a target with the number of pixels within the range of (0, 100);
carrying out up-sampling and pre-segmentation processing on an original remote sensing image X to obtain a group of images X with the same size 1 ,X 2 ,...,X m And image X 1 ,X 2 ,...,X m The respective corresponding label vector Y 1 ,Y 2 ,…,Y m ;
Step two, respectively generating images X 1 ,X 2 ,...,X m Corresponding superResolution image S 1 ,S 2 ,...,S m And based on the label vector Y 1 ,Y 2 ,…,Y m Obtaining a super-resolution image S 1 ,S 2 ,...,S m The label vectors K corresponding to each 1 ,K 2 ,…,K m ;
Step three, utilizing the super-resolution image S 1 ,S 2 ,...,S m And a super-resolution image S 1 ,S 2 ,...,S m Corresponding label vector K 1 ,K 2 ,…,K m Training the deformable full convolution network based on the region until reaching the set maximum iteration number, and obtaining the trained deformable full convolution network based on the region;
step four, for the original remote sensing image M to be detected, processing the original remote sensing image M from the step one to the step two to obtain a group of super-resolution images M with the same size of the original remote sensing image M 1 ,M 2 ,...,M m ;
The super-resolution image M 1 ,M 2 ,...,M m Inputting the deformable full convolution network based on the region trained in the third step to obtain the pair image M 1 ,M 2 ,...,M m The target detection result of (1);
and step five, combining the target detection results obtained in the step four to obtain a target detection result N of the original remote sensing image M.
The beneficial effects of the invention are as follows: the invention provides a remote sensing image small target detection method based on resolution enhancement, which aims at the problems of few available characteristic information and geometric deformation of small and medium-sized targets in remote sensing images, adopts super-resolution processing technology to further perfect the detailed characteristic information of the small targets, applies a region-based deformable convolution network to fully utilize the limited characteristic information of the small targets, and improves the detection capability and the detection effect of the small targets in the remote sensing images.
In order to verify the performance of the method provided by the invention, the DOTA data sets from China resource satellite data and application centers, Google Earth, satellites JL-1, satellites GF-2 and the like are verified. The experimental result verifies the effectiveness of the remote sensing image small target detection algorithm based on super-resolution processing. And dividing the experimental data set at random according to the proportion of a training set, a verification set and a test set which are 2:1:1, wherein the single-class detection precision reaches about 80%.
Drawings
FIG. 1 is a schematic flow chart of an implementation of the present invention;
FIG. 2a is a schematic diagram showing an image (including a small vehicle) in a DOTA dataset and a corresponding label of a small target detection box after label processing (only the small vehicle is reserved);
FIG. 2b is a schematic diagram of a set of image blocks of the same size obtained from a remote sensing image after upsampling and pre-segmentation processing;
FIG. 3a is a basic schematic diagram of generation of a countermeasure network;
FIG. 3b is a block diagram of a network structure of a generative model G (sub-generator network) corresponding to that of FIG. 3 a;
the method mainly adopts block layout, the core part of the network consists of a plurality of same residual blocks, and two deconvolution operations with the step length of 0.5 are used for improving the resolution of an output image at the output stage of the network;
n64 represents the number of convolution kernel filters, i.e., the dimension of the output feature map, and s represents the convolution step size;
FIG. 3c is a block diagram of the network structure of the discriminant model D (discriminator subnetwork) corresponding to FIG. 3 a;
the main network structure is VGG, the whole network comprises 8 convolutional layers, and two full-connection layers are used for mapping the characteristic graph into a probability value in the output stage of the network, wherein the probability value is the confidence score of whether the discriminator can discriminate the super-resolution image and the real image;
FIG. 3d shows the visual change of the remote sensing images before and after super resolution processing, where the left side is the original remote sensing image before super resolution processing and the right side is the super resolution remote sensing image after super resolution processing;
FIG. 4a illustrates a training network framework diagram of a region-based deformable convolutional network;
FIG. 4b shows a shape diagram of a normal convolution kernel;
FIG. 4c illustrates a shape diagram of a deformable convolution kernel;
they are obtained by adding a displacement (arrow) to the normal sample coordinates, indicating that the deformable convolution kernel can fit the severe deformation of the target;
FIG. 4d shows a special case of a deformable convolution as a scale transform;
FIG. 4e shows a special case of a deformable convolution as a rotational transformation;
fig. 5 is a result diagram obtained by combining the detection results of the segmented image blocks, and is an inversion of the preprocessing process to finally obtain the target detection result of the original remote sensing image.
Detailed Description
The first embodiment is as follows: this embodiment will be described with reference to fig. 1. The resolution enhancement-based remote sensing image small target detection method in the embodiment comprises the following specific processes:
firstly, giving an original remote sensing image X to be trained and a label vector Y corresponding to a small target contained in the original remote sensing image X, wherein the small target contained in the original remote sensing image X is a target with the number of pixels in the range of (0, 100);
carrying out up-sampling and pre-segmentation processing on an original remote sensing image X to obtain a group of images X with the same size 1 ,X 2 ,...,X m And image X 1 ,X 2 ,...,X m The respective corresponding label vector Y 1 ,Y 2 ,…,Y m ;
Step two, respectively generating images X 1 ,X 2 ,...,X m Corresponding super-resolution image S 1 ,S 2 ,...,S m And based on the label vector Y 1 ,Y 2 ,…,Y m Obtaining a super-resolution image S 1 ,S 2 ,...,S m The label vectors K corresponding to each 1 ,K 2 ,…,K m ;
Step three, utilizing super-resolutionImage S 1 ,S 2 ,...,S m And super-resolution image S 1 ,S 2 ,...,S m Corresponding label vector K 1 ,K 2 ,…,K m Training the deformable full convolution network based on the region until reaching the set maximum iteration number, and obtaining the trained deformable full convolution network based on the region;
the region-based deformable full convolution network comprises ResNet-101, a convolution layer, RPN, ROI pooling layer and softmax classifier;
super-resolution image S 1 ,S 2 ,...,S m Extracting a characteristic image through ResNet-101, and reducing the dimension of the extracted characteristic image through a convolution layer to obtain a dimension-reduced characteristic image; at the same time, the super-resolution image S 1 ,S 2 ,...,S m Outputting a region of interest RoI through an RPN, mapping the region of interest RoI into a feature image after dimensionality reduction, performing pooling operation on the mapped image by using an ROI pooling layer, averaging results of the pooling operation, and inputting the averaged results into a softmax classifier to obtain a target classification result;
step four, for the original remote sensing image M to be detected, processing the original remote sensing image M from the step one to the step two to obtain a group of super-resolution images M with the same size of the original remote sensing image M 1 ,M 2 ,...,M m ;
The super-resolution image M 1 ,M 2 ,...,M m Inputting the deformable full convolution network based on the region trained in the third step to obtain the pair image M 1 ,M 2 ,...,M m The target detection result of (1);
and step five, combining the target detection results obtained in the step four to obtain a target detection result N of the original remote sensing image M.
Image M 1 ,M 2 ,...,M m And inverting the target detection result to the target detection result in the original image M.
The invention realizes the super-resolution processing and then the target detection of the remote sensing image containing a small target (such as a small vehicle), expands the application range of the deep learning target detection model and realizes the deep utilization of the remote sensing image with higher spatial resolution.
The second embodiment is as follows: this embodiment will be described with reference to fig. 2a and 2 b. The first difference between the present embodiment and the specific embodiment is: the specific process of the step one is as follows:
carrying out up-sampling (the adjustment range is small) processing on an original remote sensing image X to obtain an image subjected to up-sampling processing; pre-dividing the up-sampled image, i.e. dividing the up-sampled image into a group of images X with the same size 1 ,X 2 ,...,X m When image segmentation is performed, X is stored 1 ,X 2 ,...,X m Position information in the up-sampled image, m representing the total number of the divided images;
when the image is divided, the label vector Y corresponding to the small target contained in the original remote sensing image X is also divided and is used as the image X 1 ,X 2 ,...,X m Assigning corresponding segmented label vectors Y 1 ,Y 2 ,…,Y m 。
Segmenting the label vector Y to obtain a segmented label vector Y 1 ,Y 2 ,…,Y m 。
Other steps and parameters are the same as those in the first embodiment.
The third concrete implementation mode: this embodiment will be described with reference to fig. 3a, 3b, 3c and 3 d. The present embodiment differs from the first or second embodiment in that: the specific process of the second step is as follows:
step two, establishing a generator sub-networkAnd discriminator subnetworkForming a countermeasure network;
step two, a given group has a comparisonHigh resolution remote sensing imageN1.. N, by comparing the remote sensing imageDown sampling to obtain remote sensing imageA corresponding set of remote sensing images with a lower resolutionN is 1,. N, N is the number of remote sensing images included in each group;
wherein, the higher resolution refers to the remote sensing imageRelative to the remote sensing imageIs higher, lower resolution means remote sensing imageRelative to the remote sensing imageIs lower.
Step two and step three, utilizing remote sensing imageTraining generator subnetworkThe problem that the generator sub-network needs to solve is described using equation (1):
in the formula, theta G }W 1:L ;b 1:L },θ G Is a generator subnetworkThe set of parameters for all weights and offsets,to obtain remote sensing imagesWhen the generator subnetwork is input, the generator subnetwork outputs a reconstructed image; l SR To generate a loss function for the subnetwork;
loss function l SR The device consists of the following three parts:
wherein, the first and the second end of the pipe are connected with each other,as a function of content loss, γ 1 Is composed ofThe weight parameter of (a) is determined,to combat the loss function, gamma 2 Is composed ofThe weight parameter of (a) is determined,for regularizing the loss function, gamma 3 Is composed ofRight of (1)A weight parameter;
wherein, W i,j And H i,j Respectively representing discriminator subnetworksThe width and height of the feature map output by the jth convolutional layer before the ith maximum pooling layer (i is 5, j is 4, and may be selected practically and arbitrarily in the present invention);indicating when discriminator subnetworkInput is asTime, discriminator subnetworkThe value of pixel point (x, y) in the characteristic diagram output by the jth convolutional layer before the ith maximum pooling layer,indicating when discriminator subnetworkInput as a reconstructed imageTime, discriminator subnetworkThe value of a pixel point (x, y) in a feature map output by the jth convolutional layer before the ith maximum pooling layer; x is 1,2, …, W i,j ,y=1,2,…,H i,j ;
Wherein the content of the first and second substances,representing the image to be reconstructedInput discriminator subnetworkTime, discriminator subnetworkAn output of (d);
wherein, | | | | represents a 1 norm, and r represents a reconstructed imageW and H denote reconstructed imagesThe width and the height of the base material,representing reconstructed imagesPerforming pixel-by-pixel partial derivation, (x ', y') represents pixel points in the reconstructed image, wherein x 'is 1,2, …, rW, y' is 1,2, … and rH;
using remote sensing imagesAnd reconstructing the imageTo train a discriminator subnetworkN-1.. N, the problem to be solved by the discriminator subnetwork is described by equation (6):
wherein the content of the first and second substances,represents when the input isTime, discriminator subnetworkA probability value of the output;
representing when the input is a reconstructed imageTime, discriminator subnetworkA probability value of the output; e [. C]Representing and expecting; theta D Is a discriminator subnetworkA set of parameters for all weights and offsets;
solving theta satisfying the formula (1) and the formula (6) G And theta D Then, a well-trained generated confrontation network is obtained;
after training and generating the confrontation network, image X is taken 1 ,X 2 ,...,X m The super-resolution image S can be obtained by inputting the generation countermeasure network 1 ,S 2 ,...,S m When the confrontation network is generated for training, a discriminator subnetwork is needed to be utilized, and the cooperation of the two networks is needed during training;
step two and four, image X 1 ,X 2 ,...,X m Inputting into a trained generative confrontation network, and outputting an image X by a generator subnetwork in the generative confrontation network 1 ,X 2 ,...,X m Corresponding super-resolution image S 1 ,S 2 ,...,S m ;
Tag vector Y is transformed by matlab or python program 1 ,Y 2 ,…,Y m Multiplying the coordinates by 4 to obtain a processed label vector K 1 ,K 2 ,…,K m 。
Vector Y of labels 1 ,Y 2 ,…,Y m By 4 is meant: the label vector is essentially a vector consisting of four vertex coordinates of a labeling frame of a small target, and the label vector Y is 1 ,Y 2 ,…,Y m Multiplying 4 by the coordinate of (1) means that each element in the vector is correspondingly multiplied by 4 to obtain a processed label vector K 1 ,K 2 ,…,K m Processed label vector K 1 ,K 2 ,…,K m And image X 1 ,X 2 ,...,X m And correspond to each other. In fact, the super-resolution image S 1 ,S 2 ,...,S m Can be set to image X 1 ,X 2 ,...,X m Any multiple of the resolution of (1), other multiples being implemented by modifying parameters that generate the countermeasure network, the tag vector Y 1 ,Y 2 ,…,Y m Also multiply the corresponding multiple, in the present invention, the super-resolution image S 1 ,S 2 ,...,S m Is the image X 1 ,X 2 ,...,X m 4 times the resolution.
Other steps and parameters are the same as those in the first or second embodiment.
Generator subnetwork G θG As shown in FIG. 3b, generatingThe block layout is mainly adopted by the sub-network G, the core part of the network is composed of a plurality of identical residual blocks, each residual block is composed of a convolution layer, a Batch Normalization (BN) layer and a linear rectification function (ReLU) layer, specifically, the convolution kernel of each convolution layer is 3 multiplied by 3, the feature map dimension output by each residual block is 64, and the resolution of the feature map is kept unchanged through a filling operation in the convolution process. In the output stage of the network, two deconvolution operations with a step size of 0.5 are used to improve the resolution of the output image.
Discriminator subnetwork D θD As shown in fig. 3c, for the discriminator subnetwork D, the main network structure is VGG, and the LeakyReLU layer is used for activation at the end of each layer of the convolutional network, and the BN layer is used for normalization before the feature map is output. The whole network comprises 8 convolutional layers, the step length of the convolutional layers and the number of the filters are continuously increased along with the continuous deepening of the network, so that the resolution of the feature map is continuously reduced, the dimension of the feature map is continuously increased, the last convolutional layer can output a 512-dimensional low-resolution feature map, in the output stage of the network, two full-connection layers are used for mapping the feature map into a probability value, and the probability value is the confidence score of whether the discriminator can discriminate the super-resolution image from the real image.
The fourth concrete implementation mode: this embodiment will be described with reference to fig. 4 a. The difference between this embodiment mode and one of the first to third embodiment modes is: the specific process of the third step is as follows:
step three, firstly, obtaining the super-resolution image S 1 ,S 2 ,...,S m And a super-resolution image S 1 ,S 2 ,...,S m Corresponding label vector K 1 ,K 2 ,…,K m Inputting a region-based deformable full convolution network, and extracting a super-resolution image S by using ResNet-101 in the region-based deformable full convolution network 1 ,S 2 ,...,S m Using the RPN to output a super-resolution image S 1 ,S 2 ,...,S m RoI of (1);
the invention adopts ResNet-101 as a feature extraction network based on the R-FCN thought, the ResNet-101 is provided with 100 convolutional layers in front, and is followed by a global average pooling layer and 1000 classes of full-connection layers, the invention deletes the subsequent average pooling layer and full-connection layer, and calculates the feature mapping by using the convolutional layers only, and adopts a transfer learning method, firstly pre-training on ImageNet to obtain a trained ResNet-101 classification network, removing the subsequent classification layer and loss calculation part, only keeping the feature extraction part therein, and inserting a randomly initialized convolutional layer with the size of 1024 × 1 × 1 into the network to reduce the dimension of the 2048-dimensional convolutional layer.
Step three, mapping the region of interest RoI to a super-resolution image S 1 ,S 2 ,...,S m Obtaining the mapped characteristic image from the characteristic image;
thirdly, performing pooling operation on the mapped feature images to obtain pooling results, and performing averaging operation on the pooling results to obtain target classification results;
for example, for the (i, j) th chunk (0 ≦ i, j ≦ k-1), a location-sensitive RoI pooling operation is defined that pools only in the (i, j) th chunk:
wherein r is c (i, j | Θ) is the pooled value of the c-th class in the (i, j) -th chunk, z i,j,c Is a score mapping for the c-th category, (x) 0 ,y 0 ) The top left corner element representing the RoI area, n the number of elements in the chunk, and Θ all the learning parameters of the network. The area occupied by the (i, j) th block is:andw and h represent the width and height of each RoI region, respectively, dividing the RoI region into k 2 And (4) a plurality of parts.
Then, the pooled material is treatedThe result is averaged to obtain the final classification result, and after averaging, each RoI generates a (C +1) -dimensional vector: r is c (Θ)=∑ i,j r c (i, j | Θ). Then, its softmax response is calculated:
step three and four, extracting super-resolution image S 1 ,S 2 ,...,S m When the feature image is mapped, the coordinate of the boundary frame of the target is learned at the same time, when the mapped feature image is subjected to pooling operation, a position vector is generated for each RoI area at the same time, and the coordinate of the boundary frame and the position vector are aggregated into a 4-dimensional vector t (t is t) through averaging operation x ,t y ,t w ,t h ) In, t x Detect the top left corner x coordinate, t, of the box for the target y For the upper left-hand y coordinate, t, of the target detection box w Width of target detection frame, t h The height of the target detection box.
The fifth concrete implementation mode: this embodiment will be described with reference to fig. 4b to 4 e. The difference between this embodiment and one of the first to fourth embodiments is: the convolution in the deformable full convolution network based on the region adopts deformable convolution, and the deformable convolution adds an offset delta p on each convolution grid point n The modified formula is expressed as follows:
wherein p is 0 Point, p, representing the upper left corner of the convolution field n Is the relative offset, Δ p, of other points in the convolution field with respect to the top left corner n Is the offset, w (p), learned during convolution n ) Representing the number of convolution kernel filters, i.e. weights,representing the sampling grid, i.e. the convolution field size, y (p) 0 ) Is a convolution ofThe result of the post-output, x (p) 0 +p n +Δp n ) Representing p in the input feature image after introducing the offset 0 +p n The pixel value of the dot.
Other steps and parameters are the same as in one of the first to fourth embodiments.
The present invention replaces the conventional convolution structure with a deformable convolution structure. The conventional convolution structure is defined as follows, where p n Is the ratio of each point on the receptive field to p 0 The offset of (c).
Wherein p is 0 Is the point in the upper left corner of the convolution receptive field,is a convolved sampling grid.
The sixth specific implementation mode is as follows: this embodiment will be described with reference to fig. 5. The difference between this embodiment and one of the first to fifth embodiments is: the concrete process of the step five is as follows:
for image M 1 ,M 2 ,...,M m After the target detection result is zoomed, the zoom processing is carried out according to the image M 1 ,M 2 ,...,M m And combining the zooming processing results in the position information in the original remote sensing image M to obtain a target detection result N.
Other steps and parameters are the same as those in one of the first to fifth embodiments.
The following examples were used to demonstrate the beneficial effects of the present invention:
the first embodiment is as follows:
the resolution enhancement-based remote sensing image small target detection method is specifically carried out according to the following steps:
the data used in the experiment are DOTA data sets with different resolutions obtained according to an open satellite, and fig. 2a shows an image (containing a small vehicle) in the DOTA data set and a corresponding labeling situation of a small target detection frame after label processing (only the small vehicle is reserved); FIG. 2b is a group of image blocks with the same size obtained by performing upsampling and pre-segmentation on a remote sensing image, and position information of an original image is saved when the image is cut;
table 1 shows that three target detection models are trained and tested under different training sample (remote sensing image data sets before and after the super-resolution) data conditions to obtain a single-class detection precision (AP) experimental result;
TABLE 1 Single class detection accuracy (% Unit)
FIG. 5 is a result obtained by combining the detection results of the segmented image blocks, which is an inversion of the preprocessing process, and a target detection result of the original remote sensing image is finally obtained; table 2 shows the experimental results obtained by training and testing six target detection models using the remote sensing image dataset after the super-resolution, wherein the last item is the experimental result of the target detection algorithm used in the present invention, and the last item includes three indexes, namely, single-class detection Accuracy (AP), training time (train _ time), and detection time (test _ time). By comparison, the algorithm provided by the method has the best effect under all indexes, and the single-class detection precision on the DOTA data set reaches about 80%.
TABLE 2
The present invention is capable of other embodiments and its several details are capable of modifications in various obvious respects, all without departing from the spirit and scope of the present invention.
Claims (4)
1. A remote sensing image small target detection method based on resolution enhancement is characterized by specifically comprising the following steps:
firstly, giving an original remote sensing image X and a label vector Y corresponding to a small target contained in the original remote sensing image X, wherein the small target contained in the original remote sensing image X is a target with the number of pixels within the range of (0, 100);
carrying out up-sampling and pre-segmentation processing on an original remote sensing image X to obtain a group of images X with the same size 1 ,X 2 ,...,X m And image X 1 ,X 2 ,...,X m The respective corresponding label vector Y 1 ,Y 2 ,…,Y m ;
Step two, respectively generating images X 1 ,X 2 ,...,X m Corresponding super-resolution image S 1 ,S 2 ,...,S m And based on the label vector Y 1 ,Y 2 ,…,Y m Obtaining a super-resolution image S 1 ,S 2 ,...,S m The label vectors K corresponding to each 1 ,K 2 ,…,K m ;
The specific process of the second step is as follows:
step two, establishing a generator sub-networkAnd discriminator subnetworkForming a countermeasure network;
step two, a group of remote sensing images with higher resolution ratio is givenBy aligning remote sensing imagesDown sampling to obtain remote sensing imageA corresponding set of remote sensing images with a lower resolutionN is the number of remote sensing images contained in each group;
step two and step three, utilizing remote sensing imageTraining generator subnetworkThe problem that the generator sub-network needs to solve is described using equation (1):
in the formula, theta G Is a generator subnetworkThe set of parameters for all weights and offsets,to obtain remote sensing imagesWhen the generator subnetwork is input, the generator subnetwork outputs a reconstructed image; l SR To generate a loss function for the subnetwork;
loss function l SR The device consists of the following three parts:
wherein the content of the first and second substances,as a function of content loss, γ 1 Is composed ofThe weight parameter of (a) is determined,to combat the loss function, gamma 2 Is composed ofThe weight parameter of (a) is determined,as a regularizing loss function, gamma 3 Is composed ofThe weight parameter of (2);
wherein, W i,j And H i,j Respectively representing discriminator subnetworksThe width and height of the feature map output by the jth convolutional layer before the ith maximum pooling layer;indicating when discriminator subnetworkInput is asTime, discriminator subnetworkThe value of pixel point (x, y) in the characteristic diagram output by the jth convolutional layer before the ith maximum pooling layer,indicating when discriminator subnetworkInput as a reconstructed imageTime, discriminator subnetworkThe value of a pixel point (x, y) in a feature map output by the jth convolutional layer before the ith maximum pooling layer; x is 1,2, …, W i,j ,y=1,2,…,H i,j ;
Wherein the content of the first and second substances,representing the image to be reconstructedInput discriminator subnetworkTime, discriminator subnetworkAn output of (d);
wherein, | | | | represents a 1 norm, and r represents a reconstructed imageW and H denote reconstructed imagesThe width and the height of the base material,representing reconstructed imagesPerforming pixel-by-pixel derivation, (x ', y') represents a pixel point in the reconstructed image, wherein x 'is 1,2, …, rW, y' is 1,2, …, rH;
using remote sensing imagesAnd reconstructing the imageTo train a discriminator subnetworkThe problem that needs to be solved by the discriminator sub-network is described by equation (6):
wherein the content of the first and second substances,represents when the input isTime, discriminator subnetworkA probability value of the output;representing when the input is a reconstructed imageTime, discriminator subnetworkA probability value of the output; e [. C]Representing and expecting; theta D Is a discriminator subnetworkA set of parameters for all weights and offsets;
solving theta satisfying the formula (1) and the formula (6) G And theta D Then, a well-trained generated countermeasure network is obtained;
step two and four, image X 1 ,X 2 ,...,X m Inputting into a trained generative confrontation network, and outputting an image X by a generator subnetwork in the generative confrontation network 1 ,X 2 ,...,X m Corresponding super-resolution image S 1 ,S 2 ,...,S m ;
Tag vector Y is transformed by matlab or python program 1 ,Y 2 ,…,Y m Multiplying the coordinates by 4 to obtain a processed label vector K 1 ,K 2 ,…,K m ;
Step three, utilizing the super-resolution image S 1 ,S 2 ,...,S m And super-resolution imageS 1 ,S 2 ,...,S m Corresponding label vector K 1 ,K 2 ,…,K m Training the deformable full convolution network based on the region until reaching the set maximum iteration number, and obtaining the trained deformable full convolution network based on the region;
the third step comprises the following specific processes:
step three, one, the super-resolution image S 1 ,S 2 ,...,S m And a super-resolution image S 1 ,S 2 ,...,S m Corresponding label vector K 1 ,K 2 ,…,K m Inputting a region-based deformable full convolution network, and extracting a super-resolution image S by using ResNet-101 in the region-based deformable full convolution network 1 ,S 2 ,...,S m The feature image of (2) is utilized to output a super-resolution image S 1 ,S 2 ,...,S m RoI of (1);
step three, mapping the region of interest RoI to a super-resolution image S 1 ,S 2 ,...,S m Obtaining the mapped characteristic image from the characteristic image;
thirdly, performing pooling operation on the mapped feature images to obtain pooling results, and performing averaging operation on the pooling results to obtain target classification results;
step three and four, extracting super-resolution image S 1 ,S 2 ,...,S m When the feature image is mapped, the coordinate of the boundary frame of the target is learned at the same time, when the mapped feature image is subjected to pooling operation, a position vector is generated for each RoI area at the same time, and the coordinate of the boundary frame and the position vector are aggregated into a 4-dimensional vector t (t is t) through averaging operation x ,t y ,t w ,t h ) In, t x Detect the top left corner x coordinate of the box, t, for the target y For the upper left-hand y coordinate, t, of the target detection box w Width of target detection frame, t h The height of the target detection frame;
step four, for the original remote sensing image M to be detected, the step one is carried out on the original remote sensing image MAfter the processing of the second step, a group of super-resolution images M with the same size of the original remote sensing image M are obtained 1 ,M 2 ,...,M m ;
The super-resolution image M 1 ,M 2 ,...,M m Inputting the deformable full convolution network based on the region trained in the third step to obtain the pair image M 1 ,M 2 ,...,M m The target detection result of (1);
and step five, combining the target detection results obtained in the step four to obtain a target detection result N of the original remote sensing image M.
2. The method for detecting the small target of the remote sensing image based on the resolution enhancement as claimed in claim 1, wherein the specific process of the first step is as follows:
carrying out up-sampling processing on an original remote sensing image X to obtain an image subjected to up-sampling processing; the images after the up-sampling processing are pre-divided, namely, the images after the up-sampling processing are divided into a group of images X with the same size 1 ,X 2 ,...,X m When image segmentation is performed, X is stored 1 ,X 2 ,...,X m Position information in the up-sampled image, m representing the total number of the divided images;
when the image is divided, the label vector Y corresponding to the small target contained in the original remote sensing image X is also divided and is used as the image X 1 ,X 2 ,...,X m Assigning corresponding segmented label vectors Y 1 ,Y 2 ,…,Y m 。
3. The method for detecting the small target of the remote sensing image based on the resolution enhancement as claimed in claim 2, wherein the convolution in the area-based deformable full convolution network adopts deformable convolution, and the deformable convolution is added with an offset Δ p on each convolution grid point n The modified formula is expressed as follows:
wherein p is 0 Point, p, representing the upper left corner of the convolution field n Is the relative offset, Δ p, of other points in the convolution field with respect to the top left corner n Is the offset, w (p), learned during convolution n ) The weight is represented by a weight that is,representing the size of the convolution field, y (p) 0 ) Is the result of the convolution output, x (p) 0 +p n +Δp n ) Representing p in the input feature image after introducing the offset 0 +p n The pixel value of the dot.
4. The method for detecting the small target of the remote sensing image based on the resolution enhancement as claimed in claim 3, wherein the concrete process of the fifth step is as follows:
for image M 1 ,M 2 ,...,M m After the target detection result is zoomed, the zoom processing is carried out according to the image M 1 ,M 2 ,...,M m And combining the zooming processing results in the position information in the original remote sensing image M to obtain a target detection result N.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010444356.XA CN111709307B (en) | 2020-05-22 | 2020-05-22 | Resolution enhancement-based remote sensing image small target detection method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010444356.XA CN111709307B (en) | 2020-05-22 | 2020-05-22 | Resolution enhancement-based remote sensing image small target detection method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111709307A CN111709307A (en) | 2020-09-25 |
CN111709307B true CN111709307B (en) | 2022-08-30 |
Family
ID=72537713
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010444356.XA Active CN111709307B (en) | 2020-05-22 | 2020-05-22 | Resolution enhancement-based remote sensing image small target detection method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111709307B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113420745B (en) * | 2021-08-25 | 2021-12-24 | 江西中业智能科技有限公司 | Image-based target identification method, system, storage medium and terminal equipment |
CN114663671B (en) * | 2022-02-21 | 2023-07-18 | 佳都科技集团股份有限公司 | Target detection method, device, equipment and storage medium |
CN115984846B (en) * | 2023-02-06 | 2023-10-10 | 山东省人工智能研究院 | Intelligent recognition method for small targets in high-resolution image based on deep learning |
CN115953453B (en) * | 2023-03-03 | 2023-08-15 | 国网吉林省电力有限公司信息通信公司 | Substation geological deformation monitoring method based on image dislocation analysis and Beidou satellite |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108427920A (en) * | 2018-02-26 | 2018-08-21 | 杭州电子科技大学 | A kind of land and sea border defense object detection method based on deep learning |
CN108510467A (en) * | 2018-03-28 | 2018-09-07 | 西安电子科技大学 | SAR image target recognition method based on variable depth shape convolutional neural networks |
CN108596101A (en) * | 2018-04-25 | 2018-09-28 | 上海交通大学 | A kind of remote sensing images multi-target detection method based on convolutional neural networks |
CN109299688A (en) * | 2018-09-19 | 2019-02-01 | 厦门大学 | Ship Detection based on deformable fast convolution neural network |
CN110197255A (en) * | 2019-04-29 | 2019-09-03 | 杰创智能科技股份有限公司 | A kind of deformable convolutional network based on deep learning |
CN110458166A (en) * | 2019-08-19 | 2019-11-15 | 广东工业大学 | A kind of hazardous material detection method, device and equipment based on deformable convolution |
CN110728658A (en) * | 2019-09-16 | 2020-01-24 | 武汉大学 | High-resolution remote sensing image weak target detection method based on deep learning |
CN111126385A (en) * | 2019-12-13 | 2020-05-08 | 哈尔滨工程大学 | Deep learning intelligent identification method for deformable living body small target |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10019631B2 (en) * | 2015-11-05 | 2018-07-10 | Qualcomm Incorporated | Adapting to appearance variations when tracking a target object in video sequence |
-
2020
- 2020-05-22 CN CN202010444356.XA patent/CN111709307B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108427920A (en) * | 2018-02-26 | 2018-08-21 | 杭州电子科技大学 | A kind of land and sea border defense object detection method based on deep learning |
CN108510467A (en) * | 2018-03-28 | 2018-09-07 | 西安电子科技大学 | SAR image target recognition method based on variable depth shape convolutional neural networks |
CN108596101A (en) * | 2018-04-25 | 2018-09-28 | 上海交通大学 | A kind of remote sensing images multi-target detection method based on convolutional neural networks |
CN109299688A (en) * | 2018-09-19 | 2019-02-01 | 厦门大学 | Ship Detection based on deformable fast convolution neural network |
CN110197255A (en) * | 2019-04-29 | 2019-09-03 | 杰创智能科技股份有限公司 | A kind of deformable convolutional network based on deep learning |
CN110458166A (en) * | 2019-08-19 | 2019-11-15 | 广东工业大学 | A kind of hazardous material detection method, device and equipment based on deformable convolution |
CN110728658A (en) * | 2019-09-16 | 2020-01-24 | 武汉大学 | High-resolution remote sensing image weak target detection method based on deep learning |
CN111126385A (en) * | 2019-12-13 | 2020-05-08 | 哈尔滨工程大学 | Deep learning intelligent identification method for deformable living body small target |
Non-Patent Citations (1)
Title |
---|
基于可变形卷积神经网络的遥感影像密集区域车辆检测方法;高鑫等;《电子与信息学报》;20180913(第12期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN111709307A (en) | 2020-09-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111709307B (en) | Resolution enhancement-based remote sensing image small target detection method | |
CN107330439B (en) | Method for determining posture of object in image, client and server | |
CN111524135B (en) | Method and system for detecting defects of tiny hardware fittings of power transmission line based on image enhancement | |
Zhou et al. | Scale adaptive image cropping for UAV object detection | |
CN110517306B (en) | Binocular depth vision estimation method and system based on deep learning | |
CN111582339B (en) | Vehicle detection and recognition method based on deep learning | |
CN111368769A (en) | Ship multi-target detection method based on improved anchor point frame generation model | |
CN111310609B (en) | Video target detection method based on time sequence information and local feature similarity | |
CN112418165B (en) | Small-size target detection method and device based on improved cascade neural network | |
CN107516322A (en) | A kind of image object size based on logarithm pole space and rotation estimation computational methods | |
CN110633640A (en) | Method for identifying complex scene by optimizing PointNet | |
CN113052057A (en) | Traffic sign identification method based on improved convolutional neural network | |
CN112883971A (en) | SAR image ship target detection method based on deep learning | |
CN114299405A (en) | Unmanned aerial vehicle image real-time target detection method | |
CN114677479A (en) | Natural landscape multi-view three-dimensional reconstruction method based on deep learning | |
CN114782417A (en) | Real-time detection method for digital twin characteristics of fan based on edge enhanced image segmentation | |
CN114119621A (en) | SAR remote sensing image water area segmentation method based on depth coding and decoding fusion network | |
CN113436251A (en) | Pose estimation system and method based on improved YOLO6D algorithm | |
CN113111740A (en) | Characteristic weaving method for remote sensing image target detection | |
CN117011648A (en) | Haptic image dataset expansion method and device based on single real sample | |
CN114743023B (en) | Wheat spider image detection method based on RetinaNet model | |
CN116129234A (en) | Attention-based 4D millimeter wave radar and vision fusion method | |
CN114519819B (en) | Remote sensing image target detection method based on global context awareness | |
CN113657225B (en) | Target detection method | |
CN115861595A (en) | Multi-scale domain self-adaptive heterogeneous image matching method based on deep learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |