CN113468968B

CN113468968B - Remote sensing image rotating target detection method based on non-anchor frame

Info

Publication number: CN113468968B
Application number: CN202110615440.8A
Authority: CN
Inventors: 肖肖; 刘小波; 杨健峰; 龚鑫; 代浩然; 郑可心
Original assignee: China University of Geosciences
Current assignee: China University of Geosciences
Priority date: 2021-06-02
Filing date: 2021-06-02
Publication date: 2023-04-07
Anticipated expiration: 2041-06-02
Also published as: CN113468968A

Abstract

The invention provides a remote sensing image rotating target detection method based on an anchor-free frame, which comprises the steps of firstly carrying out quadruple down-sampling operation on a remote sensing image to obtain an image with rich characteristics and proper size; constructing a remote sensing image rotating target detection model without an anchor frame, taking the obtained image as input, and extracting shallow and deep characteristic maps by using a backbone network; performing feature fusion by using the extracted features with different scales and using a feature pyramid to obtain a feature map subjected to quadruple down sampling; designing a target detection head module, taking the obtained feature graph as the input of the head module, detecting the category and the position of each pixel point in the feature graph by the head module, designing a loss function, and calculating the loss function of an output prediction frame relative to a boundary frame: updating the model parameters to finish the training of the model; and detecting the remote sensing image target by using the trained target detection model. The invention improves the detection precision of the model and simplifies the structural complexity of the model.

Description

Remote sensing image rotating target detection method based on anchor-free frame

Technical Field

The invention relates to the technical field of remote sensing image target detection, in particular to a remote sensing image rotating target detection method based on an anchor-free frame.

Background

The task of object detection is to determine the location and category information of a specified object in an image and to surround the object with a detection frame. The core problem to be solved in target detection is the accuracy and high efficiency of target positioning and classification. With the development of remote sensing technology, the related research of remote sensing images is more important, and as the research of remote sensing image scanning instruments is continuously promoted, the acquisition mode of the remote sensing images is easier, and the remote sensing images are widely applied to the fields of intelligent transportation, urban planning, geological exploration, national defense safety, intelligent agriculture and the like and play a vital role. Therefore, the application of target detection on the remote sensing image is more and more abundant, but the characteristics of the remote sensing image such as various target types, various target directions, large target scale change and the like cause that performance indexes such as the precision, the efficiency and the like of the target detection are greatly influenced, so that the task of detecting the target of the remote sensing image is an important and very challenging task.

At present, a target detection algorithm based on deep learning can complete target detection tasks such as high image resolution, complex scene and the like in practical application, and a large number of target detection algorithms based on deep learning are proposed, such as fast-RCNN, SSD, YOLO algorithms and the like. Although these algorithms have achieved good results on standard test data sets, the remote sensing image has high resolution and a single imaging mode, and the like, and further improvement of the algorithms is required. The two-stage algorithm such as fast-RCNN needs to generate a candidate region, and the candidate region is classified and linearly regressed through a series of classifiers to correct the position of a candidate frame, which results in long time consumption and failure to meet the requirement of real-time detection. According to the single-stage algorithm such as SSD and YOLO, a candidate region does not need to be generated, the structure is simple, a speed block is detected, but the detection precision is difficult to meet the requirement, and especially the detection omission phenomenon occurs in small target detection. The above algorithms all need to use an anchor frame to enumerate the possible positions of the designated target, and a plurality of anchor frames are generated for each pixel point, but the number of targets on one picture is far less than the number of generated anchor frames, one target can only be matched with one anchor frame, and one anchor frame can only be matched with one target, so that a large number of unmatched anchor frames can be regarded as negative samples, which causes the problems of unbalanced positive and negative samples, a large number of extra hyper-parameters, large calculation amount of model post-processing operation and the like due to the anchor frames. In addition, the target detection method uses a horizontal boundary frame to detect the target, and for the dense arrangement of the remote sensing image targets, the prediction frame is relatively enlarged, which results in inaccurate positioning, and the adjacent prediction frames are overlapped in a large amount, which leads to the introduction of redundant background information and hinders the improvement of the target detection model performance.

The key point for improving the target detection performance of the remote sensing image is to improve the positioning accuracy and the classification confidence of the prediction frame. The existing remote sensing image target detection method is mainly used for detecting a target based on an anchor frame, and predicting the target by using a horizontal frame, so that the problems that hyper-parameters introduced by the anchor frame are difficult to adjust, the target is placed in a rotating mode, the model redundancy calculation amount is large and the like are ignored, the number of the hyper-parameters is reduced, the complexity of the model is reduced, and the target in any direction is accurately detected, so that the detection precision and speed are improved, and the task is the remote sensing image target detection task which needs to be solved urgently and is full of challenges at present.

Chenyangwei and other 2020, invents a multi-phase liver lesion detection method and system based on an anchor-frame-free method, firstly uses the anchor-frame-free method to detect the multi-phase of the liver lesion, avoids the problem of adjusting the related hyper-parameters of the anchor frame, fully combines deep layer information and shallow layer information, improves the learning capacity of a network on the multi-scale lesion, and simultaneously provides a cyclic feature connection module to obtain the dynamic change of features by combining the features of each scale of the multi-phase period, thereby further improving the detection capacity of the lesion. The method comprises the following steps: 1) Forming a feature extraction network by using a full-scale connection circulation deep polymerization detection network and a circulation feature connection module; 2) The feature extraction network transmits the shallow features to the deep features by using dense jump connection to realize the fusion of the multi-scale features; 3) Obtaining the same-scale features of the images of all phases through the step 2), and then synthesizing different features through the interconnection of all nodes of the cyclic feature extraction module; 4) Performing up-sampling convolution on each connection node in the step 3), and outputting the characteristics obtained by the characteristic extraction module by the last connection node; 5) And inputting a liver focus data set to train the feature extraction network and the detection branch and obtain the focus position, thereby realizing the function of automatically detecting the liver focus in a multiphase period.

Standby, 2020, the invention provides a ship remote sensing target detection method based on a boundary optimization neural network, which extracts more target miss characteristics, realizes optimization of positioning of a remote sensing ship target in any rotation direction, and simultaneously improves detection accuracy of the remote sensing ship target. The method comprises the following steps: 1) Marking high-resolution satellite image ship data; 2) Generating training sample data and augmenting the data in the step 1); 3) Designing a ship target detection model based on a boundary optimization neural network, and inputting the data amplified in the step 2) into the ship target detection model to obtain a detection target area; 4) And removing the overlapped frame by adopting an NMS algorithm to obtain the final detection result of the remote sensing target of the ship.

The invention relates to a remote sensing image target detection method based on an anchor-frame-free mode in 2020, which is characterized in that an anchor-frame-free method and multiple scales are used for obtaining characteristic graphs with rich characteristics, a new loss function is designed to fuse multiple indexes, model training is optimized by using the loss function, and a remote sensing image target detection model with less hyper-parameters, low model complexity and high detection precision is established. The method comprises the following steps: 1) Establishing a characteristic extraction network without an anchor frame, a characteristic pyramid and a detector without the anchor frame; 2) Acquiring a remote sensing image, and performing segmentation on the remote sensing image to obtain a small-size image; 3) Inputting the image obtained in the step 2) into the feature extraction network, and obtaining three feature maps with different scales by using the feature pyramid structure; 4) Inputting the characteristic graphs in the step 3) into an anchor-frame-free detector for prediction to obtain a target prediction result; 5) Designing a multi-index fusion loss function, and optimizing the prediction result of the step 4); 6) And completing the establishment of a target detection model and carrying out target detection on the remote sensing image.

The invention discloses a remote sensing target detection method based on boundary constraint Centenet in 2019 for Von Jie and the like, which is used for solving the problems of low detection precision and recall rate of dense small targets in the prior art, and utilizes boundary constraint to enable a prediction frame to be more accurate on the basis of the Centenet, thereby further improving the detection precision of the targets. The method comprises the following steps: 1) Randomly acquiring a training sample from the optical remote sensing image data set; 2) Constructing a feature extraction network, a boundary constraint convolutional network and a key point generating network, taking the output of the feature extraction network as the input of the boundary constraint convolutional network and the key point generating network, and taking the output of the boundary constraint convolutional network as the pooling kernel of the corner point constraint pooling layer of the key point generating network to obtain a boundary constraint CenterNet network; 3) Acquiring the prediction label and the embedded vector in the step 1); 4) Calculating the loss of the boundary constraint CenterNet network in the step 2), training and adjusting the parameters of the network, and obtaining the boundary constraint CenterNet network after training; 5) And 4) acquiring a target detection result by using the boundary constraint CenterNet network trained in the step 4).

In the invention, the detection precision of the specified target is improved, most of the target detection methods utilize the anchor-frame-free thought to reduce the number of the hyper-parameters related to the anchor frame, solve the problems of unbalance of positive and negative samples, complex post-processing and the like caused by the anchor frame, and establish or improve the anchor-frame-free remote sensing image target detection algorithm. However, the method does not solve the problem of detection of variable target placement angles in the remote sensing image, which can cause the problems of high overlapping degree of adjacent prediction frames, mutual interference of background redundant information, inaccurate positioning of the target prediction frames and the like. The Yankee peak uses a rotating frame in the remote sensing image ship target detection method to solve the problem of multi-angle target placement, but the method still needs to use maximum pooling as post-processing operation to remove a prediction frame with high overlapping degree, which not only increases the model calculation complexity, but also cannot improve the classification accuracy of the prediction frame.

The invention provides a remote sensing image rotating target detection method based on an anchor-free frame. The problems of large hyper-parameter quantity, high model complexity, influence on model detection performance and the like caused by an anchor frame are obtained by analyzing and summarizing the remote sensing image target detection method based on the anchor frame, the target boundary frame is predicted by introducing the idea of no anchor frame and replacing the anchor frame with key points of a target, and the remote sensing image target detection model with less hyper-parameter quantity, low model complexity and less redundant information is established. On the basis of anchor frame-free target detection, the defects that a horizontal frame can cause high overlapping degree of target boundary frames and much redundant background information are analyzed by using the characteristics of random target arrangement angles, different scales and the like in a remote sensing image, an angle prediction branch is added to a classification branch and a regression branch, and a rotating frame is used for replacing the horizontal frame to detect a rotating target. In addition, the target detection model uses post-processing operation to remove the prediction frame with high overlapping degree, so that the calculation complexity is increased, and hardware resources are wasted.

Disclosure of Invention

In view of this, the present invention aims to provide a method for detecting a rotating target of a remote sensing image based on an anchor-free frame, which includes the following steps:

s1, carrying out quadruple down-sampling operation on a remote sensing image to obtain an image with rich characteristics and proper size;

s2, constructing a remote sensing image rotating target detection model without an anchor frame, taking the processed image in the S1 as input, and extracting shallow and deep characteristic maps by using a backbone network;

s3, performing feature fusion by using features of different scales extracted in the S2 and using a Feature Pyramid (FPN) to obtain a feature map subjected to quadruple down sampling;

s4, designing a target detection head module, taking the feature graph obtained in the S3 as the input of the head module, detecting the category and the position of each pixel point in the feature graph by the head module, selecting top-K central point positions with the minimum category loss of K targets by category output, and predicting a polar radius and two polar angles of a target boundary frame by using the K central point positions, namely predicting frame position information under polar coordinates;

s5, designing a loss function, and calculating the loss function of the output prediction frame relative to the boundary frame: calculating an angle loss function of the polar radius and the polar angle, calculating a bias loss function and a central point category loss function of a prediction frame, updating parameters of a remote sensing image rotating target detection model without an anchor frame, and finishing the training of the model;

and S6, carrying out remote sensing image target detection by using the trained target detection model.

The technical scheme provided by the invention has the following beneficial effects: the invention establishes an anchor frame-free remote sensing image target detection model based on central point detection, does not need to use an anchor frame, reduces hyperparameters introduced by the anchor frame, reduces the computational complexity and simplifies the model; then, aiming at the problem of high judgment complexity of positive and negative sample points, adding classification loss is provided, only one positive sample is generated for each target, and extra calculation amount is not needed; and finally, aiming at the problems that the horizontal frame detection is easy to predict the frame overlapping, the target missing detection is generated, and the effectiveness of the rotating target detection cannot be realized, the detection method of the polar coordinate and the rotating frame is provided, and the parameters of back propagation are reduced. In conclusion, the method and the device improve the detection precision of the model, simplify the structural complexity of the model and improve the precision and speed of target detection.

Drawings

FIG. 1 is a flow chart of a remote sensing image rotating target detection method based on an anchor-free frame in the invention;

FIG. 2 is an overall detection framework for detecting a rotating target of a remote sensing image;

FIG. 3 is an anchor-box based object detection method framework;

FIG. 4 is a target detection method based on an anchor-free frame (taking OneNet as an example);

fig. 5 is a schematic of polar coordinate, polar angle acquisition.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be further described with reference to the accompanying drawings.

Referring to fig. 1, the present invention provides a method for detecting a rotating target of a remote sensing image based on an anchor-free frame, which includes the following steps:

the size of the remote sensing image is large, the size of part of the high-resolution remote sensing image even reaches one hundred thousand to one hundred thousand levels, and the remote sensing image has overlarge model training parameters, high model complexity and extremely high hardware resource consumption due to large size and rich information, so that the remote sensing image cannot be directly input into the model for training.

In order to train the remote sensing image better, the invention firstly carries out preprocessing operation on the remote sensing image: the remote sensing image with any size is segmented into sub-images, and in order to ensure that all targets can be placed in the sub-images, the sub-images are segmented into the remote sensing image in a mode that the overlapping rate is 15%; in order to ensure that the subgraphs can be mapped to the original graph, the center point coordinate of each subgraph is named according to the name of the graph and the segmentation position; because the subgraph is mapped to the original graph to generate an overlapping detection target, in order to remove the overlapping target without using post-processing operation, only one prediction box with the highest positioning and classification credibility is selected for one boundary box; and finally, evaluating and visualizing the effectiveness of the detection result.

Effective target features need to be extracted for target detection, and the features need to be screened and detected, especially, a remote sensing image contains rich information, so that the extraction of the effective target features and the elimination of redundant background information are necessary. The method utilizes multilayer convolution networks to extract deep and shallow features, and performs feature fusion on feature graphs of different sizes through a feature pyramid network, and predicts the fused feature graphs.

S2, constructing a remote sensing image rotating target detection model without an anchor frame, taking the image obtained in the S1 as input, and extracting shallow and deep characteristic maps by using a backbone network; selecting networks such as ResNet, hourglass and DLA as backbone networks to extract features, please refer to FIG. 2;

referring to fig. 3, in the anchor frame-based target detection method, an input image after image preprocessing is passed through a backbone network to obtain a feature map; then generating anchor frames with different sizes and aspect ratios for each pixel point on the feature map; then, each anchor frame is put into the head module, a position offset relative to the boundary frame and the probability of each category to which the anchor frame belongs are generated, for example, if the overlap ratio of the anchor frame and the boundary frame is high and the classification category of the anchor frame is the same as the category of the boundary frame, the anchor frame is matched with the boundary frame, the category of the anchor frame is the category of the boundary frame, and then the center point offset and the size offset of the anchor frame relative to the boundary frame are calculated. If one anchor frame is not matched with any boundary frame, setting the category of the anchor frame as a background category, wherein the anchor frame with the category as the background is generally called as a negative sample, and the rest anchor frames are regarded as positive samples; and finally, calculating a loss function of the output picture and the training picture through regression, and improving the detection precision of the model in a back propagation mode.

The anchor frame-based target detection method is described in detail above, and although the anchor frame can be found with a high matching degree with the bounding box by using the anchor frame, the detection precision of the target detection model is improved. But the anchor frame can bring a large number of over-parameters which are difficult to adjust, and the training complexity of the model is increased; the design of the anchor frame needs strong prior information, so that the anchor frame is difficult to design for a target with a special shape, and the problem of missing detection of the target occurs; because the number of the targets in the image is far less than that of the generated anchor frames, the number of the positive samples generated in the training process is far less than that of the negative samples, and the problem of serious imbalance of the positive samples and the negative samples is caused.

Referring to fig. 4, the present invention provides an anchor frame-free target detection method (taking OneNet as an example), and provides a rotating target detection method for the characteristics of remote sensing images, directly predicts target key points (such as a central point and a corner point) on a feature map, and predicts a target according to the deviation between the key points and the target central point, without generating an anchor frame on the feature map, thereby reducing additional hyper-parameters brought by the anchor frame, and reducing model calculation complexity and structure complexity;

s3, performing feature fusion by using the features of different scales extracted in the S2 and using a Feature Pyramid (FPN) to obtain a feature map subjected to quadruple down sampling;

s4, designing a target detection head module, taking the feature graph obtained in the S3 as input of the head module, detecting the category and the position of each pixel point in the feature graph by the head module, selecting top-K central point positions with the minimum category loss of K targets by category output, namely predicting a central point position for each target, and predicting a polar radius and two polar angles of a target boundary frame by using the K central point positions, namely predicting frame position information under polar coordinates;

aiming at the problems of changeable target directions and overlapping of dense target boundary frames of remote sensing images on the basis of an anchor frame-free detector, the loss function based on polar coordinates is designed, a rectangular coordinate system is replaced by a polar coordinate system, the polar radius rho is used for replacing the width and the height (w and h), and the polar angle theta is used ₁ 、θ ₂ Rotating target detection is carried out instead of the horizontal frame; adding angle branches to the regression and classification branches of the head module, and detecting one polar radius rho and two polar angles theta of the output by using the input feature map ₁ 、θ ₂ Please refer to fig. 5.

And obtaining the vertex coordinates of the prediction frame by using the central point regression target size, wherein the polar radius is calculated according to the following formula:

wherein the vertex coordinates of the prediction box are (x) ₁ ,y ₁ )，(x ₂ ,y ₂ )，(x ₃ ,y ₃ )，(x ₄ ,y ₄ ) The pole coordinate is (x) _p ,y _p ) The calculation formula is as follows:

ρ is the polar radius.

Calculating the size of all polar angles by using four vertexes and a central point of the prediction box, and then acquiring two minimum polar angles theta in a counterclockwise direction ₁ 、θ ₂ As an angle prediction output, the calculation formula of the polar angle is as follows:

wherein x is _p ,y _p Is a pole coordinate; x is the number of _i ,y _i Are the vertex coordinates.

because a certain offset exists between the prediction frame and the boundary frame in the rotation detection method, and the intersection area may be a polygon, and a gradient cannot be obtained, the method uses the PIoU Loss to calculate the offset of the prediction frame, and calculates the IoU position offset of the prediction frame and the boundary frame in a pixel-by-pixel judgment mode, wherein the judgment formula of the position of the prediction center point is as follows:

wherein s ∈ w, h

p _i，j For the center point of the prediction box, w and h are the width and height of the bounding box, w 'and h' are the width and height of the prediction box, and->

To predict the distance from the center point to the bounding box width and height, K is the 3 × 3 convolution kernel, F (p) _i,j And b) is used for judging whether the predicted central point is in the boundary frame, k is used for controlling the weight of each pixel point, and an adjustable parameter with the s of 10, 50, 100 being 0.5 or 0.7 is taken.

S _b∩b' ，S _b∪b' The calculation formula is as follows for the intersection area and the phase-parallel area of the prediction frame and the boundary frame respectively:

S _b∪b' ＝w×h+w'×h'-S _b∩b'

the calculation formula of the PIoU is as follows:

the PIoU-Loss calculation formula is as follows:

and especially aiming at the condition that the target distribution of the remote sensing image is not uniform, the target detection effect can be further improved by using the PIoU Loss.

In summary, the position loss function is as follows:

L _loc ＝L _loc '+L _PIoU

the loss function L comprises a category loss function and a position loss function; the position loss function comprises an angle loss function and a bias loss function of a prediction box, and the calculation formula of the loss function L is as follows:

L＝L _class +λ _loc L _loc

here, L _class 、L _loc Respectively, a class loss function, a position loss function (including bias loss and angle loss), lambda _loc Is the adjustment factor.

Said L _loc The calculation formula is as follows:

wherein u represents a predicted value, u ^* Representing true values, the loss function L being applied when the predicted and true values are equal _loc Zero, the Smooth-L1 loss function solves the problem of unsmooth L1 loss and the problem of gradient explosion from the teacher point at the center point.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A remote sensing image rotating target detection method based on an anchor frame is characterized by comprising the following steps:

s1, performing quadruple down-sampling operation on a remote sensing image to obtain an image with rich features and a proper size;

s5, designing a loss function, and calculating the loss function of the output prediction frame relative to the boundary frame: calculating an angle loss function of the polar radius and the polar angle, calculating a bias loss function and a central point class loss function of a prediction frame, updating the remote sensing image rotating target detection model parameters without an anchor frame, and finishing the training of the model;

L＝L _class +λ _loc L _loc

wherein L is _class 、L _loc Respectively, a class loss function, a position loss function, λ _loc Is the adjustment factor;

the position loss function is as follows:

L _loc ＝L _loc '+L _PIoU

wherein L is _loc As a function of position loss, including bias loss and angle loss, L _loc ' is a polar radius rho and two polar angles theta ₁ 、θ ₂ Angle loss function of, L _PIoU For the bias loss function of the prediction box, M is all positive samples, S _b∩b′ ，S _b∪b′ The calculation formula is respectively the intersection area and the phase-parallel area of the prediction frame and the boundary frame, and is as follows:

S _b∪b′ ＝w×h+w′×h′-S _b∩b′

wherein, B _bb' The smallest square, s e (w, h),

p _i，j for the predicted box center point, w, h are the width and height of the bounding box, and w ', h' are the width and height of the predicted boxHigh->

To predict the distance from the center point to the bounding box width and height, K is a 3 × 3 convolution kernel, F (p) _i,j B) for determining whether the predicted center point is within the bounding box, k being used to control the weight of each pixel point;

and S6, detecting the remote sensing image target by using the trained target detection model.

2. The method for detecting the remote sensing image rotating target based on the anchor-free frame as claimed in claim 1, wherein the calculation formula of the polar radius is as follows:

wherein the vertex coordinate of the prediction frame is (x) ₁ ,y ₁ )，(x ₂ ,y ₂ )，(x ₃ ,y ₃ )，(x ₄ ,y ₄ ) The pole coordinate is (x) _p ,y _p ) The calculation formula is as follows:

ρ is the polar radius.

3. The method for detecting the target rotation of the remote sensing image without the anchor frame as claimed in claim 1, wherein the four vertexes and the center point of the prediction frame are used for calculating the sizes of all polar angles, and then the two smallest polar angles θ are obtained in the counterclockwise direction ₁ 、θ ₂ As an angle prediction output, the calculation formula of the polar angle is:

wherein x is _p ,y _p The coordinates of the pole of the prediction frame; x is the number of _i ,y _i Is the vertex coordinates of the prediction box.

4. The method for detecting the rotating target of the remote sensing image based on the anchor-free frame as claimed in claim 1, wherein the class loss function L is _class The following were used:

wherein N denotes the total number of targets, p _t The class probability of each pixel point prediction is expressed, the proximity degree of the prediction class and the real class is reflected, and p _t The larger the classification is, the more accurate it is; gamma is a hyperparameter used to control the weight of the sample, and when gamma is larger, the weight of the easily separable sample is smaller.

5. The method for detecting the remote sensing image rotating target based on the anchorless frame as claimed in claim 1, wherein L is _loc The calculation formula is as follows:

wherein L is _loc ' is a polar radius rho and two polar angles theta ₁ 、θ ₂ U represents the predicted value, u ^* Representing true values, the loss function L being applied when the predicted and true values are equal _loc Is zero.