CN110728658A

CN110728658A - High-resolution remote sensing image weak target detection method based on deep learning

Info

Publication number: CN110728658A
Application number: CN201910870991.1A
Authority: CN
Inventors: 王超; 张洪艳; 张良培
Original assignee: Wuhan University (WHU)
Current assignee: Wuhan University (WHU)
Priority date: 2019-09-16
Filing date: 2019-09-16
Publication date: 2020-01-24

Abstract

The invention discloses a method for detecting a weak target of a high-resolution remote sensing image based on deep learning, for the remote sensing image with low resolution, small target size and fuzzy quality, firstly, the resolution is improved by adopting a WGAN-based hyper-resolution reconstruction method, the image with enhanced quality is input into a target detection frame, deep characteristic extraction is carried out on the image by utilizing a residual error network, then the extracted low-level features and high-level features are fused to ensure that the fused multi-level feature map has rich detail information and high-level semantic information, the fused multi-level features are utilized, the region suggestion network carries out region-of-interest coarse extraction on the feature map, and mapping the extracted regions to the same dimension by using a region-of-interest alignment method, and performing subsequent accurate target classification and position refinement to obtain a final target detection result. The method effectively improves the detection precision and recall rate of the weak and small targets under the conditions of low resolution and complex background of the remote sensing image.

Description

High-resolution remote sensing image weak target detection method based on deep learning

Technical Field

The invention relates to the technical field of video image processing and target detection, in particular to a high-resolution remote sensing image weak target detection method based on deep learning.

Background

The remote sensing earth observation technology is improved in image space, spectrum and time resolution, can collect multi-sensor multi-source earth observation data, enables a remote sensing satellite to have high-resolution, all-weather, all-time and large-range earth observation and imaging capabilities through efficient multi-source data fusion, achieves subversive results in multiple fields such as military affairs, agriculture and urban planning, provides strategic support for sustainable development problems such as climate change monitoring and environmental pollution control, and greatly promotes social development and progress. Particularly, the high spatial resolution (high resolution) remote sensing image can reach sub-meter level along with the iterative update of high resolution series satellites of all countries in the world, the obtained image has rich texture and detail information, compared with the low resolution remote sensing image generated by other sensors, the high spatial resolution (high resolution) remote sensing image can finely reflect the state of ground objects, and a rich and reliable data source can be provided for the applications of land utilization investigation, city management, battlefield decision and the like. The high-resolution remote sensing image greatly promotes the development and application of the remote sensing subject. Among the interpretation and processing methods of remote sensing image information, the target detection and identification technology is a popular research direction by virtue of its practicability and wide application field. The target detection consists of a target classification part and a target positioning part, can determine the type of ground objects, simultaneously outputs the position information of the objects, and can reflect the state information of the targets more completely and more finely.

With the improvement of the software and hardware level of a computer, the computing capability and the mass data storage capability of the computer are unprecedentedly developed, so that the intelligent operation and management of the mass data become possible. In this context, deep learning and neural network models, which have previously been slow to develop due to hardware factor limitations, are receiving renewed attention, and more artificial intelligence vision tasks can be excellently performed by deep neural networks, in particular Convolutional Neural Networks (CNNs). Compared with the traditional method, the neural network has strong feature expression capability, nonlinear fitting capability and generalization capability and strong model learning capability, and the model is processed end to end without excessive expert priori knowledge intervention, so that the method is suitable for target classification and regression tasks. And because the network has the characteristics of local perception and parameter sharing when image processing is carried out, the local characteristics of the relation and space between adjacent domains can be kept, and the processing of high-dimensional images is not difficult. By means of good feature extraction and characterization capabilities of the deep neural network, the target detection and identification algorithm based on deep learning develops rapidly, various models with good detection precision and speed are promoted, and compared with the traditional model, the application effect and quality leap is realized.

The inventor of the present application finds that the method of the prior art has at least the following technical problems in the process of implementing the present invention:

although the target detection model based on deep learning is developed vigorously, the application of the target detection model on remote sensing images still has a plurality of difficulties. The remote sensing image has large imaging area and long imaging distance, so that the image space resolution is low and the image background is complex; the top-down imaging mode of the satellite also enables the target to be easily shielded; and some image distortion may occur due to surface reflection, atmospheric refraction, etc. The unique imaging mode of the remote sensing image enables the ground object target in the image to be small in size, low in resolution, lack of sufficient texture information, large in background noise interference and low in contrast, so that detection difficulty is caused, and the application of the remote sensing image in a plurality of fine fields is greatly limited.

Therefore, the method in the prior art has the technical problem of low detection precision.

Disclosure of Invention

The invention aims to provide a high-resolution remote sensing image weak target detection method based on deep learning, and provides a multilayer feature fusion remote sensing image weak target detection algorithm combined with a hyper-resolution reconstruction method aiming at the characteristics of relatively low imaging resolution and small target size of a remote sensing image. In consideration of the influence of image resolution on target detection, the remote sensing image quality is improved by adopting a WGAN-based hyper-resolution reconstruction method, the detail information of the image is enriched, more effective target features can be extracted by a residual network, the low-level features and the high-level features extracted by the residual network are fused, the fused feature map is ensured to have abundant semantic information and detail information, the RPN network is used for carrying out foreground target coarse extraction and interesting region generation, subsequent target accurate detection is carried out, the precision and the recall ratio of target detection are improved, and the target detection precision of the remote sensing image is improved.

In order to achieve the purpose, the invention adopts the technical scheme that: a high-resolution remote sensing image weak target detection method based on deep learning comprises the following steps:

step S1: based on the ground feature type and characteristics in the high-resolution remote sensing image, an initial data set is constructed, the initial data set is divided into a training data set, a verification data set and a test data set, and type information and position information labeling is carried out on ground feature samples in the initial data set;

step S2: taking the training data set and the verification data set as high-resolution samples, performing down-sampling on the high-resolution samples to obtain low-resolution samples, and then performing training and verification evaluation on the WGAN-based image super-resolution reconstruction model by using the low-resolution samples to obtain a trained super-resolution reconstruction model;

step S3: inputting the training data set, the verification data set and the test data set into the trained super-resolution reconstruction model to obtain a reconstructed training data set, verification data set and test data set;

step S4: constructing a remote sensing image target detection model based on multi-layer feature fusion by adopting a target detection basic framework based on region suggestion, and setting parameters of the target detection model according to the characteristics of the reconstructed training data set;

step S5: inputting the reconstructed training data set into a target detection model, training the target detection model, calculating an optimized target function loss value, and performing training and updating on the model weight parameters through a back propagation algorithm;

step S6: verifying the model trained in the step S5 by using the reconstructed verification data set to obtain an optimized target detection model;

step S7: and inputting the reconstructed test data set into an optimized target detection model for target detection and identification, determining the category and position information of the ground object target, and detecting the weak and small target of the remote sensing image.

In one embodiment, when the step S1 labels the feature samples in the initial data set with the category information and the location information, spatial context information is introduced for the weak targets in the initial data set.

In one embodiment, the optimization objective function of the WGAN-based super-resolution image reconstruction model in step S2 is as follows:

wherein L represents the total Loss function of WGAN, Loss _ G represents the generator Loss function, Loss _ D represents the discriminator Loss function, f_w(x) Representing a continuous mapping function for mapping the samples to have their derivative values within a range; e represents averaging the formula; p_rRepresenting true distribution, P_gThe generation distribution is represented.

In one embodiment, step S3 specifically includes:

performing image reconstruction on remote sensing images contained in a training data set, a verification data set and a test data set by using a trained super-resolution reconstruction model, wherein quality evaluation on the super-resolution images is usually performed by adopting a peak signal-to-noise ratio and an image structure similarity index, wherein the peak signal-to-noise ratio PSNR is usually analyzed by adopting a formula (2), the image Structure Similarity Index (SSIM) is shown in a formula (3), MSE represents a mean square error, M and N represent image length and width values, M and N respectively represent pixel values corresponding to the images before and after reconstruction, s represents a gray level order, and the larger the PSNR value is, the smaller the distortion is;

wherein c is a constant,

μ_yand σ_yValue of (d) and mu_x、σ_xSimilarly, parameter C₁＝K₁×L，C₂＝K₂xL, L represents the maximum pixel value in a gray-scale image, K₁And K₂As a constant, SSIM is used to measure the similarity between images from brightness, contrast and texture, with larger values indicating less distortion and higher quality of the image.

In one implementation, the remote sensing image target detection model based on multi-layer feature fusion in step S4 includes a feature extraction and fusion module, a region-of-interest generation module, and a target detection module, where the feature extraction and fusion module is configured to extract feature of a ground object and fuse the extracted low-level features and high-level features, the region-of-interest generation module is configured to extract a region of interest based on the fused feature map, and the target detection module is configured to perform target identification and detection according to the generated region of interest.

In one embodiment, the feature extraction and fusion module includes a down-sampling process and an up-sampling process when fusing the low-level features and the high-level features.

In one embodiment, the formula for optimizing the objective function in step S5 is:

in the formula (4), L_cls＝-log(p_i) Representing the classification error, p_iRepresenting the probability value of the ground object class output by the classifier;

representing the positioning error of the model, where t_x＝(x-x_a)/w_a，t_y＝(y-y_a)/h_a，

Data with subscript a is a parameter value of a bounding box output by an anchor box, data with asterisk superscript is a ground truth value, the rest are model prediction output values, lambda is a balance parameter, and N is_clsTo train the blocksize value, N_locThe generated Anchor box value.

One or more technical solutions in the embodiments of the present application have at least one or more of the following technical effects:

the invention provides a high-resolution remote sensing image weak target detection method based on deep learning, which is characterized by comprising the steps of firstly, carrying out class information and position information labeling on a ground object sample in an initial data set based on the class and the characteristics of a ground object in a high-resolution remote sensing image, then, taking a training data set and a verification data set as high-resolution samples, carrying out down-sampling on the high-resolution samples to obtain low-resolution samples, then, utilizing the low-resolution samples to train, verify and evaluate an image super-resolution reconstruction model based on WGAN, and obtaining the trained super-resolution reconstruction model; inputting the training data set, the verification data set and the test data set into a trained super-resolution reconstruction model to obtain a reconstructed training data set, a reconstructed verification data set and a reconstructed test data set; then, a target detection basic framework based on region suggestion is adopted, a remote sensing image target detection model based on multi-layer feature fusion is constructed, and parameters of the target detection model are set according to the characteristics of a reconstructed training data set; inputting the reconstructed training data set into a target detection model, training the target detection model, calculating an optimized target function loss value, and performing training and updating on the model weight parameters through a back propagation algorithm; then, verifying the trained model by using the reconstructed verification data set to obtain an optimized target detection model; and finally, inputting the reconstructed test data set into an optimized target detection model for target detection and identification, determining the category and position information of the ground object target, and detecting the remote sensing image weak and small target.

According to the method provided by the invention, the remote sensing image in the training data set, the verification data set and the test data set is reconstructed by adopting the WGAN-based image super-resolution reconstruction model, the quality of the remote sensing image can be improved, the detail information of the image is enriched, so that a remote sensing image target detection model based on multi-layer feature fusion can extract more effective target features, the extracted low-layer features and high-layer features are fused, the fused feature diagram is ensured to have abundant semantic information and detail information, and the fused feature diagram is utilized by an RPN (resilient packet network) to perform foreground target coarse extraction and interesting region generation to perform subsequent target accurate detection, so that the precision and recall ratio of target detection are improved, and the target detection precision of the remote sensing image is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is a schematic flow chart of a method for detecting a weak target in a high-resolution remote sensing image based on deep learning according to an embodiment of the present invention;

FIG. 2 is an overall frame diagram of a high-resolution remote sensing image weak target detection method based on deep learning.

Detailed Description

The invention aims to provide a high-resolution remote sensing image weak target detection method based on deep learning, aiming at the technical problem that the detection precision is low in the method in the prior art, the remote sensing image weak target detection and identification are carried out through a multi-layer feature fusion model of a combined hyper-resolution reconstruction method, and the precision and recall rate of high-resolution remote sensing image target detection can be improved through image quality improvement and multi-layer fusion of effective features.

In order to achieve the above purpose, the main concept of the invention is as follows:

for remote sensing images with low resolution, small target size and fuzzy quality, firstly, improving the resolution of the images by adopting a WGAN-based hyper-resolution reconstruction method, and enriching the detail information of weak target ground objects; inputting the image with enhanced quality into a target detection frame, extracting deep features of the image by using a residual error network, and then fusing the extracted low-level features and high-level features, so that the fused multi-level feature map has rich detail information and high-level semantic information, and the accuracy of target positioning and classification is ensured; and performing region-of-interest coarse extraction on the feature map by using the fused multilayer features and the region suggestion network RPN (namely a region-of-interest generation module), mapping the extracted regions to the same dimension by using a region-of-interest alignment method, and performing subsequent accurate target classification and position refinement to obtain a final target detection result. The method effectively improves the detection precision and recall rate of the weak and small targets under the conditions of low resolution and complex background of the remote sensing image.

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The embodiment of the invention uses the video image acquired by the unmanned aerial vehicle to detect the target. At present, the unmanned aerial vehicle is widely applied in the fields of military affairs, urban management and agricultural production, but is limited by the performance of carrying an image acquisition sensor, the resolution of the acquired image is low, high target detection precision is difficult to obtain, and the high-performance sensor is extremely expensive. Therefore, the present invention will describe a specific process using the unmanned aerial vehicle image as a data source.

The embodiment provides a method for detecting a weak target of a high-resolution remote sensing image based on deep learning, please refer to fig. 1, and the method comprises the following steps:

step S1: based on the ground feature type and characteristics in the high-resolution remote sensing image, an initial data set is constructed, the initial data set is divided into a training data set, a verification data set and a test data set, and type information and position information labeling is carried out on ground feature samples in the initial data set.

Specifically, in the present invention, the weak target mainly refers to a remote sensing image with low resolution, small target size and fuzzy quality. And intercepting the video image shot by the unmanned aerial vehicle through a video frame to obtain image format data, wherein the image data needs to fully cover the multi-mode ground object target, so that the effectiveness of data set manufacturing is ensured. In the specific implementation process, the manufactured data set comprises 800 images, the training set, the verification set and the test set are divided according to the ratio of 2:1:1, effective targets are selected for marking, and in the implementation mode, the acquired video mainly aims at detecting automobile targets.

Specifically, when the initial data set is labeled, the label format needs to be made to meet the format requirement input by the model, the category and position information of the target object needs to be correctly labeled, wrong detection and missing detection of the target caused by data labeling errors are prevented, and a spatial context information auxiliary model is added for target detection. In the process of constructing the data set, some weak and small targets, fuzzy targets and high-inclination targets are easy to miss the targets and influence the generalization capability of the training model, so that the ground objects need to be labeled in detail, and the multi-scale (size and angle) information of the data set is increased; when targets such as bridges are labeled, spatial context information is introduced, so that the targets can be extracted and identified by an auxiliary model, and the detection precision of the remote sensing targets in a complex scene is improved.

Step S2: and taking the training data set and the verification data set as high-resolution samples, performing down-sampling on the high-resolution samples to obtain low-resolution samples, and performing training, verification and evaluation on the WGAN-based image super-resolution reconstruction model by using the low-resolution samples to obtain a trained super-resolution reconstruction model.

Specifically, the downsampling of the training data set and the validation data set is performed to generate a low resolution data set, which is used as an input sample for training the hyper-differential reconstruction process.

Specifically, the image super-resolution reconstruction model based on the WGAN is used for reconstructing an image, the image super-resolution reconstruction method based on the Wassertein generation countermeasure network (WGAN) is adopted in the invention, and the Wassertein distance can be shown as the following formula (5). Wherein, x and y represent two different samples, formula 5 is used to find the distance between two samples, and the sample similarity is measured by the distance; inf represents a floor function.

The WGAN uses the Wasserstein distance to replace the previous K-L divergence and J-S divergence, and effective training of the model under weak correlation of the low-resolution image and the high-resolution real image can be guaranteed.

WGAN-based image super-resolutionIn the optimized objective function of the resolution reconstruction model, f_w(x) Representing a continuous mapping function, which is a simple mapping of samples to have their derivative values within a certain range; e represents averaging the formula. By adjusting and optimizing the generator, the Wasserstein distance between the generated sample and the real sample is shortened, and the image quality approaches to the real image. The generation quality of the image is improved through the alternate training of the generator and the discriminator. The WGAN training is stable, the model convergence speed is high, and the quality of the training result can better reflect the quality of the generated image. The quality evaluation of super-resolution images is usually performed by quantitative analysis using peak signal-to-noise ratio (PSNR) and image Structure Similarity Index (SSIM). The PSNR formula is shown below, where MSE represents a mean square error, M and N represent image length and width values, M and N represent pixel values corresponding to images before and after reconstruction, s represents a gray scale order, and generally takes 256, where a larger PSNR value represents a smaller distortion. When image quality evaluation is performed before and after reconstruction, for remote sensing images, generally, PSNR values greater than about 30 can indicate that image quality is effectively improved.

Step S3: and inputting the training data set, the verification data set and the test data set into the trained super-resolution reconstruction model to obtain the reconstructed training data set, the reconstructed verification data set and the reconstructed test data set.

Specifically, the trained super-resolution reconstruction model can be used for performing super-resolution reconstruction on images in the training data set, the verification data set and the test data set, so that the resolution of the remote sensing image is improved, and the detail texture information of the image is enriched.

In one embodiment, step S3 specifically includes:

wherein c is a constant,μ_yand σ_yValue of (d) and mu_x、σ_xSimilarly, parameter C₁＝K₁×L，C₂＝K₂xL, L represents the maximum pixel value in a gray-scale image, K₁And K₂As a constant, SSIM is used to measure the similarity between images from brightness, contrast and texture, with larger values indicating less distortion and higher quality of the image.

In a specific implementation, L is a maximum pixel value in the grayscale image, for example, L is 255, K₁And K₂Is a small constant, and can be preset by those skilled in the art in specific implementation, and in the embodiment, may be K₁＝0.01，K₂SSIM measures inter-image similarity from brightness, contrast and texture 0.02. Similarly, a larger value indicates a smaller distortion of the image and a higher quality. Similarly, when image quality evaluation is performed before and after reconstruction, the remote sensing image reconstruction effect is considered to be good when the SSIM value in the experiment reaches 0.85 or more.

Step S4: and constructing a remote sensing image target detection model based on multi-layer feature fusion by adopting a target detection basic framework based on region suggestion, and setting parameters of the target detection model according to the characteristics of the reconstructed training data set.

Specifically, the characteristics of the data set mainly include the size, resolution, feature type, and number of images in the data set, and the model parameters include blocksize, IOU threshold, learning rate, and the like.

In a specific implementation process, for an input image, deep feature extraction is performed by using a residual error network, and for extracted features, multi-layer feature fusion is performed by using a feature pyramid network, so that fused features are guaranteed to have high-layer semantic information and low-layer detail texture information. Inputting the fused features into a regional suggestion layer, and extracting all foreground objects (regional suggestion is a process of rough classification and rough positioning, and can distinguish a target ground object from a background, does not distinguish a specific category, determines the approximate position of the target through a window, and roughly extracts the target) and approximate regions (rough positioning) which may exist in the target. Because the regions are different in size, the region alignment method is used, the regions of interest are mapped into feature maps with the same size and input into a subsequent classification regressor, and the specific category identification and the target position refinement are carried out, so that the accurate detection of the target is realized.

More specifically, when the feature of the ground object is extracted, a high-efficiency residual error network is used as a basic network framework (namely, the feature extraction and fusion module adopts a residual error network structure), so that deeper features can be extracted, and the characterization capability of the features is improved; for a multi-level feature mapping graph generated by a depth residual error network, a feature fusion network is used for carrying out low-level and high-level feature fusion, so that the fused feature graph has rich detail information and semantic information, and the classification and positioning accuracy of a model on a weak target is improved; the region of interest generation module may employ a region suggestion network: the RPN is used for inputting the multilayer characteristics into the RPN to perform layer-by-layer processing, generating anchor boxes with different proportions and different scales, providing classification and rough positioning to perform foreground extraction and background separation, performing region-of-interest (ROI) screening through confidence coefficient, and removing redundant regions; and accurately mapping the region of interest extracted based on the fused feature map by adopting a region alignment (ROI Align) method, projecting the ROI to the same size, ensuring the efficiency of the model in reverse propagation, and keeping the rest of the model consistent with a classical target detection framework based on region suggestion.

Specifically, the image downsampling process is similar to the convolution process of a convolution neural network, and each layer of n layers of features is gradually reduced like each level of a pyramid by setting the size of a convolution kernel; the up-sampling process performs 1 × 1 convolution (to reduce the number of feature channels) on each layer of feature map i generated by down-sampling, the interpolation is to perform superposition fusion on the feature maps with the same size as the i-1 layer, the obtained fusion features are subjected to 3x3 convolution processing (eliminating aliasing effect caused by upsampling) to generate n-1 layer fused feature maps and feature maps (for generating a larger-scale suggested region) after the feature maps of the highest layer are maximally pooled, and then inputting the n-1 layers of fused feature maps and the feature map with the highest layer of feature map maximized and pooled into a subsequent RPN network for layer-by-layer processing to generate an interested region, inputting the fused feature maps (with the pooled feature maps removed) into a region alignment (ROI align) module, and mapping the interested region on each layer of feature map to the same dimension.

Referring to fig. 2, a target detection model is specifically described, first, an input image is input into a WGAN-based image super-resolution reconstruction model to perform image super-resolution reconstruction, and then feature extraction and fusion, region-of-interest generation, and target generation detection are performed through the target detection model.

The target detection model is realized by the following steps:

and substep 1, feature extraction. A skip connection mode is adopted by a residual error network (ResNet), the method is different from the traditional convolution neural network in that input is convoluted to obtain output, and ResNet adds the input characteristics of the layer with the characteristics after convolution and outputs the sum to the next layer, so that the gradient disappearance phenomenon of the network can be avoided along with the deepening of the network layer number when back propagation training is carried out, the extracted detailed characteristic information is enriched, the performance of the model and the training convergence speed are improved, and the deep characteristic extraction and characteristic expression capacity of the model is greatly improved. The basic neural network adopted in the experiment is resnet101, the convolution of 3x3 and 7x7 is mainly used, and the deepening of the layer number can extract deep features of the target

And step 2, performing multi-layer feature fusion on the extracted features. In the multi-layer feature fusion algorithm, the down-sampling process of the image is set to be 5 layers, and each layer of the feature map is gradually decreased like each level of a pyramid; in the up-sampling process, after performing 1x1 convolution on each layer of feature map i generated by down-sampling (in order to reduce the number of feature channels), interpolating feature maps with the same size as the i-1 layer, performing superposition fusion on the feature maps, performing 3x3 convolution processing (eliminating aliasing effect caused by up-sampling) on the obtained fusion features to generate 4 layers of fused feature maps P2-P5 and the feature map P6 after the maximization pooling of the highest layer of feature map (in order to generate a larger-scale suggested region), inputting the feature maps of the P2-P5 layers (without the pooling) into a region alignment (ROI align) module, and mapping the regions of interest on each layer of feature map to the same dimension.

And 3, generating a region of interest. In the RPN, firstly, processing layer by layer is carried out on input multilayer features, 15 candidate frames with different scales and different proportions are generated on each position on a feature map through an Anchor method to serve as candidate target regions, whether the ground feature of each candidate frame belongs to the foreground or the background is obtained through calculation of a ground feature classification layer, and the central point coordinate and the length and width value of the corresponding region are obtained through a window regression layer, so that rough extraction of the region of interest is carried out. The Anchorbox generates bounding boxes with different scales and different length-width ratios on the image, and can detect multi-scale and multi-modal ground object targets. This example produced 15(3x5) different sizes (32,64,128,256,512) and different aspect ratios (1:1,1:2,2:1) of the Anchor box. During model training, the generated Anchor box is roughly classified through a classifier, foreground extraction and background separation are carried out, the Anchor box boundary frame is translated and zoomed, the final rough extraction confidence level is obtained, 3000 Anchor boxes with the confidence levels close to the front are reserved, and the rest redundant boundary frames are deleted.

And substep 4. aligning the regions of interest. When ROI align is used for mapping the interested region, the original length of a boundary frame is reserved, the region is divided into four parts, the value of the central point of each part is obtained by a bilinear interpolation method, and the mapped pixel value is obtained by a maximum pooling or mean pooling method, so that each interested region is mapped to the scale with the same size for subsequent classification regression operation. For a remote sensing image small target, the phenomena of low detection confidence, poor precision and recall caused by positioning deviation can be avoided. And inputting the feature map after aligning the region of interest into a subsequent classifier and a regression device, and performing accurate classification and position refinement on the target to determine the specific category and accurate position information of the target.

And step 5, the initial experimental weight uses the weight parameters trained on the basis of the COCO data set instead of adopting the fully initialized training weight, so that a better migration training effect can be achieved. The Anchor box adopts three ratios (0.5,1 and 2) and five sizes (32,64,128,256 and 512), namely 15 Anchor boxes are generated at each Anchor point, so that the remote sensing image target with multiple scales and multiple angles can be effectively detected; selecting 0.5 as a threshold value of an intersection ratio (IOU), namely, judging that the intersection ratio of a detection result and a real target area is more than 0.5 as effective detection, and avoiding target omission caused by overlarge threshold value for a remote sensing image with a small target size, wherein when the threshold value is set to be 0.75, the target detection precision is greatly reduced; the threshold value of redundant boundary frame removal of the NMS is 0.7, namely, the redundant Bbox of the same target is removed, and the boundary frames of surrounding ground objects are removed as far as possible; the initial learning rate is 0.001, the learning momentum is 0.9, and the weight attenuation is 0.0001; limited by GPU hardware, training the batch size to be 8, namely processing eight images per iteration; in consideration of the complexity and the target diversity of the remote sensing image, the ratio of the training, verifying and testing set is set to be 2:1: 1.

Step S5: inputting the reconstructed training data set into a target detection model, training the target detection model, calculating an optimized target function loss value, and performing training and updating on the model weight parameters through a back propagation algorithm.

In the specific implementation process, the value of lambda is 10, N_clsTo train the blocksize value, N_locTo generate an Anchor box value, this may be 2400.

Step S6: and verifying the model trained in the step S5 by using the reconstructed verification data set to obtain an optimized target detection model.

Specifically, the reconstructed verification data set is used for verifying the model trained in the stage, and training parameters are adjusted according to the actual target detection effect and the batch verification result until the loss functions of the model on the training set and the verification set tend to be stable, the ground object target detection reaches high precision and high recall rate, and the model obtains the optimal solution on the data set.

In the specific implementation process, the experiment adopts a segmented training mode, a training model is output after every 5000 times of training (1000-2000 times in the later period) in the early period, verification is carried out on a verification set, training parameters are adjusted according to the verification result, and training is continued until the training result meets the requirement.

Specifically, when evaluating the target extraction result, it is unlikely that all the features will be correctly extracted due to the limitation of model performance, and this involves the Recall (Recall) concept, which represents the proportion of all the target features that the model can extract to the total number of targets. Similarly, each time the feature extraction is performed, there is a right error and a wrong error in the result output by the model, and the ratio of the number of correct predictions (i.e. the positive samples and the negative samples are both predicted correctly) in the result output by the model to the total prediction amount is the Precision ratio (Precision) of the target detection. AP (Average precision) represents the precision Average value under different recall values, and mAP (mean Average precision) represents the AP Average value of multiple types of ground objects. The AP and mAP values can be used for measuring precision of precision detection of the model, and are the most common target detection measurement indexes, but the recall value is important and cannot be ignored. The method can reflect whether the model is trained sufficiently or not, and can fully and completely detect the multi-modal target ground object without missing detection of the target. In the embodiment, the detection precision of the image weak target detection method constructed by the invention on the automobile is 0.9347, the recall rate is 0.9439, and the detection result of the remote sensing image weak target can be improved.

Generally, the remote sensing image in the training data set, the verification data set and the test data set is reconstructed by adopting a WGAN-based image super-resolution reconstruction model, the quality of the remote sensing image can be improved, the detail information of the image is enriched, so that a remote sensing image target detection model based on multi-layer feature fusion can extract more effective target features, the extracted low-layer features and high-layer features are fused, the semantic information and detail information of a fused feature map are ensured, the fused feature map is used for carrying out foreground target coarse extraction and interesting region generation through an RPN (resilient packet network) to carry out subsequent target accurate detection, the precision and recall ratio of target detection are improved, and the detection precision of the remote sensing image target is improved.

The method has the advantages that as the method is used for constructing the multi-layer characteristic fusion remote sensing image weak target detection algorithm of the combined image super-resolution reconstruction method, the image resolution is improved through the super-resolution reconstruction method aiming at the characteristics of low image imaging resolution and small target size, and the detail texture information of the image is enriched; the extracted low-level features and high-level features can be fused by utilizing a multi-feature fusion network, and the detection precision of the remote sensing image weak target is improved by constructing a target characteristic framework based on multi-level feature fusion.

In the embodiment of the present invention, a remote sensing image is used, but not limited to a remote sensing image. The method has wide universality for other images, such as video images, natural images, medical images and the like, is less limited by objective factors, and has wide application range. Through experimental verification, the multilayer feature fusion target detection algorithm of the combined image hyper-resolution reconstruction method constructed by the patent is proved to be capable of improving the detection precision and the recall rate of weak and small targets in the image.

It is to be noted and understood that various modifications and improvements can be made to the invention described in detail above without departing from the spirit and scope of the invention as claimed in the appended claims. Accordingly, the scope of the claimed subject matter is not limited by any of the specific exemplary teachings provided.

Claims

1. A high-resolution remote sensing image weak target detection method based on deep learning is characterized by comprising the following steps:

2. The method as claimed in claim 1, wherein the step S1 introduces spatial context information for weak targets in the initial data set when labeling the ground feature samples in the initial data set with category information and location information.

3. The method of claim 1, wherein the optimization objective function of the WGAN-based super resolution reconstruction model in step S2 is:

4. The method according to claim 1, wherein step S3 specifically comprises:

wherein c is a constant,

5. The method according to claim 1, wherein the remote sensing image target detection model based on multi-layer feature fusion in step S4 includes a feature extraction and fusion module, a region-of-interest generation module and a target detection module, wherein the feature extraction and fusion module is used for extracting feature of the ground feature and fusing the extracted low-level features and high-level features, the region-of-interest generation module is used for extracting a region of interest based on the fused feature map, and the target detection module is used for performing target identification and detection according to the generated region of interest.

6. The method of claim 5, wherein the feature extraction and fusion module includes a down-sampling process and an up-sampling process when fusing the low-level features and the high-level features.

7. The method of claim 1, wherein the formula of the optimization objective function in step S5 is: