CN112199984A

CN112199984A - Target rapid detection method of large-scale remote sensing image

Info

Publication number: CN112199984A
Application number: CN202010664095.2A
Authority: CN
Inventors: 吴则良; 贾自凯; 谷雪晨; 陶宏; 陈祺; 金忍; 宋韬; 林德福
Original assignee: Beijing Institute of Technology BIT
Current assignee: Beijing Institute of Technology BIT
Priority date: 2020-07-10
Filing date: 2020-07-10
Publication date: 2021-01-08
Anticipated expiration: 2040-07-10
Also published as: CN112199984B

Abstract

The invention discloses a target rapid detection method of a large-scale remote sensing image, which is based on deep learning and adopts a target detection strategy of combining region intensive estimation and block detection to rapidly and accurately detect and classify a large-scale high-resolution image. The method for rapidly detecting the target of the large-scale remote sensing image can simultaneously realize the accurate detection of the large-scale target and the small target in the dense area, and has high detection efficiency.

Description

Target rapid detection method of large-scale remote sensing image

Technical Field

The invention relates to the technical field of image recognition, in particular to a method for quickly detecting a large-scale remote sensing image target.

Background

Most of current remote sensing target detection is based on images with the scale smaller than 1920 x 1080, and with the continuous improvement of the resolution of a camera, high-resolution large-scale images such as 2K images and 4K images are more and more, so that high precision is achieved, and meanwhile, huge calculation amount is brought.

In the prior art, the mainstream detection strategy is to perform uniform image segmentation on an image to obtain a smaller-scale image, and then perform detection after reducing the resolution. Such methods, while capable of achieving higher accuracy of target detection, increase computational cost. Moreover, because the target distribution in the remote sensing image is uneven, part of the segmented image does not contain the target to be detected, and the calculated amount and the detection time are increased invisibly. Meanwhile, since the image is cut uniformly, the detection effect of a small target in a target distribution dense area is poor.

Therefore, it is necessary to provide a method for rapidly detecting a target in a large-scale remote sensing image, so as to perform rapid and accurate detection and classification on a large-scale high-resolution image, and solve the problems of target distribution unevenness and small target detection.

Disclosure of Invention

In order to overcome the above problems, the present inventors have conducted intensive studies and, as a result, found that: based on deep learning, a target detection strategy integrating region intensive pre-estimation and block detection is adopted to perform rapid and accurate multi-target detection classification on large-scale high-resolution images, the problems of uneven target distribution and small targets can be processed in a targeted manner, and the detection accuracy and the detection speed are remarkably improved, so that the method is completed.

Specifically, the present invention aims to provide the following:

in a first aspect, a method for rapidly detecting a target of a large-scale remote sensing image is provided, wherein the method comprises the following steps:

step 1, preparing a training data set, and training to obtain a detection network;

and 2, obtaining a large-scale remote sensing image to be detected, and detecting by adopting a detection network obtained by training.

Wherein the training comprises the steps of:

step 1-1, preprocessing images in a training data set;

step 1-2, training a large-scale target detection subnetwork;

and 1-3, training a small target detection sub-network in a dense area.

Wherein, the step 1-2 comprises the following substeps:

step 1-2-1, constructing a convolutional neural network;

step 1-2-2, training to obtain a large-scale target suggestion area network and a dense area small target suggestion network;

and 1-2-3, training a neural network, updating network parameters and obtaining a large-scale target detection subnetwork with convergent training.

Wherein, in the step 1-2-2, a large-scale target and a dense region small target are distinguished by adopting metric learning,

preferably, the distinguishing of the dense area small targets includes the steps of calculating a small target occurrence probability, and comparing the probability value with a threshold value.

In the step 1-2-2, the trained small target suggestion network in the dense area is evaluated by adopting the following formula:

wherein L is_BCERepresenting a binary cross entropy loss, L_IOUDenotes the GIOU loss, p_iIs a target areaProbability of small object occurrence, t_iA parameterized vector representing bounding box prediction, the same as RPN;

representing whether or not it is a small target area,

which represents the error of the regression,

as a function of the indication when

If so, 1 is returned, otherwise, 0 is returned.

Wherein, step 2 comprises the following substeps:

step 2-1, obtaining a large-scale remote sensing image, and preprocessing the large-scale remote sensing image;

step 2-2, extracting the features of the preprocessed image to obtain a feature map;

step 2-3, extracting a large-scale target area and a dense small target area;

and 2-4, carrying out large-scale target detection and small target detection in a dense area.

And after the step 2-4, a step of fusing the large-scale target detection result and the dense region small target detection result is further included.

In a second aspect, a computer-readable storage medium is provided, wherein the storage medium stores a program for quickly detecting an object in a large-scale remote sensing image, and the program, when executed by a processor, causes the processor to execute the steps of the method for quickly detecting an object in a large-scale remote sensing image.

In a third aspect, a computer device is provided, wherein the device includes a memory and a processor, the memory stores a program for quickly detecting an object in a large-scale remote sensing image, and the program, when executed by the processor, causes the processor to execute the steps of the method for quickly detecting an object in a large-scale remote sensing image.

The invention has the advantages that:

(1) according to the method for quickly detecting the target of the large-scale remote sensing image, a target detection strategy which is formed by fusing area intensive estimation and block detection is adopted, the problems of uneven target distribution and small targets are processed in a targeted manner, and the detection precision and the detection speed are obviously improved based on deep learning repair detection;

(2) according to the method for rapidly detecting the target of the large-scale remote sensing image, the two detection sub-networks are parallel, so that the large-scale target and the small target in the dense area can be accurately detected at the same time, and the detection efficiency is high;

(3) the method for quickly detecting the target of the large-scale remote sensing image can realize accurate detection of the small target of the high-resolution image, and has low calculation cost;

(4) according to the method for rapidly detecting the target of the large-scale remote sensing image, the convolution neural network is adopted to reduce convolution kernels to 3 x 3 and 1 x 1, so that the operation amount can be greatly reduced, and the detection speed is improved.

Drawings

FIG. 1 illustrates a schematic diagram of image annotation in accordance with a preferred embodiment of the present invention;

FIG. 2 is a diagram illustrating bounding box prediction in accordance with a preferred embodiment of the present invention;

fig. 3 is a diagram showing the picture detection effect in embodiment 1 of the present invention.

Detailed Description

The present invention will be described in further detail below with reference to the accompanying drawings and embodiments. The features and advantages of the present invention will become more apparent from the description. In which, although various aspects of the embodiments are shown in the drawings, the drawings are not necessarily drawn to scale unless specifically indicated.

The invention provides a target rapid detection method of a large-scale remote sensing image, which comprises the following steps:

step 1, preparing a training data set, and training to obtain a detection network.

In the present invention, the large-scale remote sensing image is preferably a remote sensing image with a resolution of 1920 × 1080 and above, such as 2K and 4K images.

The detection method of the present invention is further described below:

The training data set is a large-scale remote sensing image, before training, the image is labeled, as shown in fig. 1, and the labeled content is listed as follows: { "area":169, "bbox": 102,81,13, "category _ name": "car" },

wherein, the numerical value of the area represents the pixel area of the labeling area (rectangular frame area); the first numerical value of bbox represents the horizontal pixel coordinate of the upper left corner of the rectangular frame relative to the upper left corner of the picture, the second numerical value represents the vertical pixel coordinate of the upper left corner of the rectangular frame relative to the upper left corner of the picture, the third numerical value represents the width of the rectangular frame, and the fourth numerical value represents the height of the rectangular frame; category _ name represents the target analogy.

Preferably, the training comprises the steps of:

step 1-1, preprocessing images in a training data set.

The remote sensing image is a high-resolution image, so that the influence of target scale change on a detection result is reduced, the image is preferably zoomed and then cut into image blocks with the same size, and multi-scale sampling of the input image can be realized.

In the invention, the image is preferably preprocessed into 512 by 512 pixels, so that the utilization rate of the GPU can be effectively improved.

Step 1-2, training a large-scale target detection subnetwork.

Wherein, the step 1-2 comprises the following substeps:

and 1-2-1, constructing a convolutional neural network.

The constructed convolutional neural network is a deep residual error network of multilayer convolution, namely a deep convolutional network. In the present invention, it is preferable to omit the pooling layer and the full link layer in the neural network structure, and to construct a new feature extraction network using only the convolutional layer, for example, ResNet101 as the base network.

In the convolutional neural network, the fully-connected layer has a problem of parameter redundancy, and the convolutional layer has excellent positioning capability, but the positioning capability is lost after the fully-connected layer for classification is added, so that it is preferable to use only the convolutional layer in the present invention.

Preferably, the convolution kernels are 3 × 3 and 1 × 1, which is beneficial to greatly reducing the operation amount and improving the detection speed.

And step 1-2-2, training to obtain a large-scale target suggestion area network and a dense area small target suggestion network.

In the invention, in order to solve the problems of uneven target distribution, no target to be detected contained in a part of segmented images and poor detection effect on a small target in a target dense area in a remote sensing image, the image is preferably divided into a large-scale target suggestion area network and a small target suggestion network in a dense area before the target is detected, so that the target detection pertinence is improved, and the detection precision and the detection speed are obviously improved.

According to a preferred embodiment of the present invention, metric learning is used to distinguish large-scale targets from small targets in dense areas, and a metric space of normal samples (large-scale targets) and small samples (small targets in dense areas) is obtained, so as to distinguish small samples, that is, a metric learning method is used to train and learn a scale estimation network of the large targets and the small targets.

In the present invention, the distance between the samples of different classes is

The distance between samples of the same class is

Where m1 represents the sample distance threshold of one, m2 represents the sample distance threshold of two, and 0 ≦ m₂≤m₁，y_ij0 and y_ij1 indicates that the samples are from different classes and the same class, respectively.

Preferably, the scale estimation network for large and small targets is evaluated using the following formula:

wherein [ ·]₊Representing the Hinge loss function;

alpha represents a coefficient one for controlling the weighting of the difficult samples, and the preferred value is 10; beta represents a coefficient two for controlling the weighting of the difficult samples, and is preferably 0.

When the loss function is converged, stopping training and effectively distinguishing the small targets in the dense area.

In a further preferred embodiment, said discriminating a dense area small object comprises the steps of calculating a small object occurrence probability and comparing the probability value with a threshold value,

preferably, for an input target area, the output of the metric network (i.e., the depth feature vector) is represented by E if the i-th class target is distributed as

Is a Gaussian distribution of variances, then the target region in each target class iThe probability of small targets appearing in (1) is shown as follows:

wherein the content of the first and second substances,

representing Euclidean distance from the depth feature vector E to a feature measurement center j of an ith class target;

representing the feature metric center j of the class i object.

In a further preferred embodiment, for the target region, if p_i(E) T is a threshold value, indicating that the target is a small target.

Preferably, the value of the threshold T is 0-1.

In the invention, through the training, a large-scale target suggestion area network and a dense area small-target suggestion network can be obtained.

In the invention, the large-scale target and the small-scale target are distinguished by adopting the metric learning, the large-scale target is used for training a network, and network parameters are updated, so that a large-scale target detection subnetwork is obtained.

And 1-3, training a small target detection sub-network in a dense area.

Wherein, the step 1-3 comprises the following substeps:

and step 1-3-1, training a dense region small target suggestion network.

Wherein the small targets obtained in step 1-2 are adopted to train the dense region small target suggestion network.

According to a preferred embodiment of the present invention, the trained dense area small target suggestion network is evaluated using the following formula:

wherein L is_BCERepresenting a binary cross entropy loss, L_IOUDenotes the GIOU loss, p_iProbability of small object appearing in target region, t_iA parameterized vector representing bounding box prediction, the same as RPN;

representing whether or not it is a small target area,

which represents the error of the regression,

as a function of the indication when

If so, 1 is returned, otherwise, 0 is returned.

In a further preferred embodiment, a detection algorithm based on ResNet is adopted, gradient retransmission is carried out on the whole dense region small target suggestion network according to the gradient of the loss function, network parameters are updated, and finally the dense region small target suggestion network with convergent training is obtained.

And 1-3-2, adopting the obtained small target training dense region small target detection subnetwork.

Preferably, a ResNet-based detection algorithm is adopted, gradient retransmission is carried out on the whole dense region small target detection sub-network according to the gradient of the loss function, network parameters are updated, and the dense region small target detection sub-network with convergent training is obtained.

Wherein, step 2 comprises the following substeps:

and 2-1, obtaining a large-scale remote sensing image and preprocessing the large-scale remote sensing image.

The large-scale remote sensing image can be obtained by a common method in the prior art, such as an unmanned aerial vehicle, a satellite and the like.

Preferably, the preprocessing is to reduce the resolution of the image, such as to 512 × 512 pixels, and then input it into the detection network obtained by training.

And 2-2, performing feature extraction on the preprocessed image to obtain a feature map.

And generating a feature map by using the preprocessed image through a backbone network and a feature extraction network of a convolutional neural network.

And 2-3, extracting a large-scale target area and a dense small target area.

Specifically, according to the feature map, the large-scale target region and the dense small target region of the image are obtained by using the large-scale target suggestion region network and the dense region small target suggestion network obtained by training in the step 1.

And respectively adopting the large-scale target detection sub-network and the dense region small target detection sub-network obtained by training to detect the large-scale target and the small target in the obtained large-scale target region and dense small target region.

According to a preferred embodiment of the present invention, in the process of detecting large-scale objects and detecting small objects in dense areas, as shown in fig. 2, the following method is adopted to predict the bounding box of an image object:

outputting five parameters t for each bounding box_x,t_y,t_w,t_h,t_oThe coordinate of the upper left corner of the grid where the target is located is (c) away from the upper left corner of the whole image_x,c_y) The width and height of the predicted bounding box are p_wAnd p_h，

The predicted bounding box absolute position coordinates are:

b_x＝σ(t_x)+c_x；

b_y＝σ(t_y)+c_y；

the prediction box was scored using the following formula:

wherein pr (object) represents the probability of the existence of the object, and if the object is contained in the grid, the value is 1; if no target is included, the value is 0; the IOU is the ratio of the intersection of the prediction box and the actual box group route to the union of the prediction box and the actual box group route, and the numerical value indicates the accuracy of the prediction box and is an important index for judging the detection effect.

In a further preferred embodiment, the best target candidate box is obtained using a NMS (non-maximal suppression) approach.

And 2-5, obtaining a target detection result of the large-scale remote sensing image.

In the invention, after the extracted detection result of the dense area is obtained, the position information of the small area with the relatively dense object is converted into the position relative to the original image, so that the large-scale object detection result obtained by detection and the small object detection result of the dense area are fused to obtain the final object detection result of the remote sensing image.

The target detection result comprises category information, coordinate information, a score and the like of the target candidate frame.

The target rapid detection method of the large-scale remote sensing image adopts a strategy of separately detecting the large-scale target and the small target in the dense area, firstly carries out dense area small target suggestion network extraction on the image before detection, and carries out small target detection in the extracted dense area, thereby realizing rapid and accurate detection of the large-scale high-resolution image and solving the problem of small target detection with uneven target distribution.

The invention also provides a computer readable storage medium, which stores a target rapid detection program of the large-scale remote sensing image, and when the program is executed by a processor, the program causes the processor to execute the steps of the target rapid detection method of the large-scale remote sensing image.

The method for rapidly detecting the target of the large-scale remote sensing image can be realized by means of software and a necessary general hardware platform, wherein the software is stored in a computer-readable storage medium (comprising a ROM/RAM, a magnetic disk and an optical disk) and comprises a plurality of instructions for enabling a terminal device (which can be a mobile phone, a computer, a server, a network device and the like) to execute the method.

The invention also provides computer equipment which comprises a memory and a processor, wherein the memory stores a target rapid detection program of the large-scale remote sensing image, and the program causes the processor to execute the steps of the target rapid detection method of the large-scale remote sensing image when being executed by the processor.

Examples

The present invention is further described below by way of specific examples, which are merely exemplary and do not limit the scope of the present invention in any way.

Example 1

1. Data set

The Vis-Drone data set is adopted to evaluate the target rapid detection method of the large-scale remote sensing image, wherein the specific information of the data set is as follows: the VisDrone contains 10209 fully labeled static images that can be grouped into 10 categories, with 6471 image for training, 548 image for verification, and 3190 image for testing. The resolution of the image is around 2000 × 1500 pixels.

2. The detection strategy used in example 1 was compared to the results of the Faster R-CNN method on the Vis-Drone dataset, as shown in Table 1.

The fast R-CNN method firstly uses a group of basic convolution layer + activation function + pooling layer to extract a characteristic diagram of an image, and the characteristic diagram is shared for a subsequent RPN layer and a full connection layer; the RPN is used for generating a suggestion region, the layer judges that anchors belong to a target or a background through softmax, and then the anchors are corrected by target frame regression to obtain an accurate suggestion frame; then the Roi Pooling layer collects the input feature graph and the suggestion box, integrates the information to extract the feature graph of the suggestion box, and sends the feature graph to the subsequent full-connection layer to judge the target category; and calculating the category of the suggested frame by using the feature map of the suggested frame, and simultaneously regressing the target frame to obtain the final accurate position of the check frame. Specifically, as described in "Ren S, He K, Girshick R, et al. faster R-CNN: Towards read-Time Object Detection with Region pro-posal Networks [ J ]. IEEE Transactions on Pattern Analysis and Machine Analysis, 2017,39(6):1137 and 1149 ].

The experimental platform was an Nvidia TX2 computer.

TABLE 1

	FPS	AP	AP₅₀	AP₇₅
					Faster R–CNN	1～2	22.6	40.5	21.9
Example 1	4～6	33.8	57.6	33.5

Wherein, FPS represents the detection speed, namely the number of processed pictures per second;

AP represents the detection precision, namely the area of a region enclosed by the P-R curve and the coordinate axis;

p represents accuracy, R represents recall, and the calculation formula is as follows:

TP represents that the retrieval target is a positive case and is correctly retrieved, FP represents that a negative case is detected as a positive case by mistake, TN represents that the negative case is correctly found, and FN represents that a positive case is detected as a negative case by mistake;

AP₅₀indicating detection accuracy based on the IOU threshold being 50%, i.e. IOU>50% of the candidate boxes are counted; AP (Access Point)₇₀Indicating detection accuracy based on an IOU threshold of 75%, i.e. IOU>75% of the candidate boxes are counted.

According to the detection result, compared with the fast Faster R-CNN method, the target rapid detection method of the large-scale remote sensing image has the advantages that the detection speed is improved by 2-4 times, and the detection precision is improved by 49.56%.

3. The detection effect graph obtained by adopting the detection strategy described in embodiment 1 of the present invention is shown in fig. 3.

The present invention has been described above in connection with preferred embodiments, but these embodiments are merely exemplary and merely illustrative. On the basis of the above, the invention can be subjected to various substitutions and modifications, and the substitutions and the modifications are all within the protection scope of the invention.

Claims

1. A target rapid detection method of a large-scale remote sensing image is characterized by comprising the following steps:

2. The detection method according to claim 1, wherein in step 1, the training comprises the steps of:

step 1-1, preprocessing images in a training data set;

step 1-2, training a large-scale target detection subnetwork;

and 1-3, training a small target detection sub-network in a dense area.

3. The detection method according to claim 2, characterized in that step 1-2 comprises the following sub-steps:

step 1-2-1, constructing a convolutional neural network;

4. The detection method according to claim 3, wherein in step 1-2-2, metric learning is used to distinguish large scale targets from dense area small targets,

5. The detection method according to claim 3, wherein in step 1-2-2, the trained dense area small target suggestion network is evaluated using the following formula:

representing whether or not it is a small target area,

which represents the error of the regression,

as a function of the indication when

If so, 1 is returned, otherwise, 0 is returned.

6. The detection method according to claim 1, characterized in that step 2 comprises the following sub-steps:

step 2-3, extracting a large-scale target area and a dense small target area;

7. The detection method according to claim 6, characterized in that after the step 2-4, the method further comprises the step of fusing the detection result of the large-scale target and the detection result of the small target in the dense region.

8. A computer-readable storage medium, in which a target rapid detection program of a large-scale remote sensing image is stored, which when executed by a processor causes the processor to perform the steps of the target rapid detection method of a large-scale remote sensing image according to one of claims 1 to 7.

9. A computer device, characterized in that it comprises a memory and a processor, said memory storing a program for rapid object detection of large-scale remote sensing images, which program, when executed by the processor, causes the processor to carry out the steps of the method for rapid object detection of large-scale remote sensing images according to one of claims 1 to 7.