CN113221775B

CN113221775B - Method for detecting target remote sensing image with single-stage arbitrary quadrilateral regression frame large length-width ratio

Info

Publication number: CN113221775B
Application number: CN202110545880.0A
Authority: CN
Inventors: 宿南; 黄志博; 闫奕名; 冯收; 赵春晖; 黄博闻
Original assignee: Harbin Engineering University
Current assignee: Harbin Engineering University
Priority date: 2021-05-19
Filing date: 2021-05-19
Publication date: 2022-04-26
Anticipated expiration: 2041-05-19
Also published as: CN113221775A

Abstract

A single-stage arbitrary quadrilateral regression frame large aspect ratio target remote sensing image detection method belongs to the technical field of remote sensing images and aims to solve the problem that a horizontal frame cannot be adopted to accurately position a large aspect ratio target due to the fact that the remote sensing image is a bird's-eye view angle. The method is based on a single-stage target detection framework and can return to any quadrangle; the process comprises the following steps: respectively extracting the features of the three feature layers of the target remote sensing image by using a feature pyramid network structure, and fusing the extracted features; performing regression calculation on the target position of the target remote sensing image by adopting any quadrilateral frame to obtain a candidate frame of any quadrilateral and obtain a classification result and a confidence score; and combining the candidate frames with high confidence scores on the three scales, reducing the candidate frames to the original size, calculating the intersection ratio between the candidate frames of each category, and removing redundant candidate frames by adopting a non-maximum suppression algorithm for solving any quadrilateral to obtain a final detection result. The method is used for detecting the target remote sensing image with the large length-width ratio.

Description

Method for detecting target remote sensing image with single-stage arbitrary quadrilateral regression frame large length-width ratio

Technical Field

The invention relates to a detection algorithm for a target remote sensing image with a large length-width ratio, and belongs to the technical field of remote sensing images.

Background

With the development of the optical remote sensing satellite technology, the resolution of the remote sensing image is greatly improved, and the requirement of target detection through the optical remote sensing image also comes. However, remote sensing image target detection is different from traditional target detection due to changes of shooting visual angles and application scenes, and two new challenges exist.

On one hand, since the remote sensing images are mostly shot by satellites or unmanned planes, the visual field is large, and detection of a specific target in a large-range scene requires high detection speed while pursuing accuracy. On the other hand, the remote sensing images are all overlooking angles, and a horizontal rectangular frame used in the traditional target detection cannot well describe the position information of the inclined target with large length-width ratio, such as a ship, in the remote sensing images.

Disclosure of Invention

The invention aims to solve the problem that a remote sensing image is at an overlook view angle and a target with a large aspect ratio cannot be accurately positioned by adopting a horizontal frame, and provides a single-stage arbitrary quadrilateral regression frame target remote sensing image detection method with the large aspect ratio.

According to the method for detecting the long-length-width ratio target remote sensing image of the single-stage arbitrary quadrilateral regression frame, the detection algorithm is based on the single-stage target detection frame and can regress an arbitrary quadrilateral;

the specific process comprises the following steps:

s1, respectively extracting features of the three feature layers of the target remote sensing image by using the feature pyramid network structure, and fusing the extracted features;

s2, performing regression calculation on the target position of the target remote sensing image by adopting any quadrilateral frame, acquiring a candidate frame of any quadrilateral, and simultaneously acquiring a classification result and a confidence score;

and S3, combining the candidate frames with high confidence score on the three scales, reducing the candidate frames to the original size, calculating the intersection ratio among the candidate frames of each category, and then removing redundant candidate frames by adopting a non-maximum suppression algorithm for solving any quadrangle to obtain a final detection result.

Preferably, in the step S1, feature extraction is respectively performed on three feature layers of the target remote sensing image, and a CSP-Darknet53 network is used for calculation;

the method specifically comprises the following steps:

copying a feature mapping graph of a base layer when feature extraction is carried out on the deep feature graph;

and when feature extraction is carried out on feature maps with different scales, the upper-layer information and the lower-layer information are respectively subjected to up-down sampling combination.

Preferably, the regression calculation of S2 includes: and the regression rotation is realized by adding four offsets into the regression center coordinate and the regression width and height.

Preferably, when the regression rotation is realized by adding four offsets, the loss function loss of the target position detection part is represented by the regression loss of four parts:

loss＝lbox+la+lcls+lobj

the regression losses of the four parts are respectively: regression loss lbox of horizontal bounding box, regression loss la of normalized tilt offset, loss lcls of classification and loss lobj of confidence;

wherein, the regression loss lbox of the horizontal circumscribed frame is:

wherein, { x_i,y_i,w_i,h_iDenotes the predicted value of each candidate region of the target bounding contour,

representing the true value in the target circumscribing outline tag,

indicates whether there is an object at the (i, j) position, 1 indicates present, and 0 indicates absent; lambda [ alpha ]_boxRepresents a custom horizontal regression loss coefficient, λ_box∈(0,1]；S²Representing each lattice point in the area with the side length S; b representsEach bounding box on a lattice point;

the regression loss la for the normalized tilt offset is:

wherein alpha is_ikA predicted value representing the inclination of the target,

a true value representing the tilt of the target; lambda [ alpha ]_αRepresents a custom rotation offset regression loss coefficient, λ_α∈(0,1](ii) a k represents a k-th rotational offset amount;

the classified losses lcls are:

p_i(c) representing the probability of prediction as class c;

representing the true probability of class c; lambda [ alpha ]_classDenotes a custom classification loss factor, λ_class∈(0,1]；

The loss of confidence lobj is:

c_iindicating the probability of predicting the target at the i position,

the probability that a target is really located at the position i is shown, 1 shows that the target is located, and 0 shows that the target is not located; lambda [ alpha ]_noobjRepresenting a custom confidence loss coefficient, λ_noobj∈(0,1]；

Indicating whether there is no object at the (i, j) position, 1 indicating no, 0 indicating present;

the classification result is obtained by the loss of classification lcls and the confidence score is obtained by the loss of confidence lobj.

Preferably, the regression loss lbox of the horizontal circumscribing frame is used for positioning the central position and the circumscribing profile of the target;

the regression loss la of the normalized inclination offset is used for representing the inclination degree of the target;

the loss of classification, lcls, is used to represent the training classification capability;

the loss of confidence lobj is used to distinguish whether the candidate region contains a target object.

Preferably, the specific method for calculating the intersection ratio between the candidate frames of each category in S3 includes:

setting a target remote sensing image as a quadrangle;

s3-1, for any two quadrangles R_iAnd R_jEstablishing an empty point set PSet;

s3-2, two quadrangles R_iAnd R_jThe intersection points of all the edges in the PSet are added into the PSet;

s3-3, dividing the quadrangle R_iAll of (A) are located in the quadrangle R_jAdding the internal vertex into the PSet;

s3-4, dividing the quadrangle R_jAll of (A) are located in the quadrangle R_iAdding the internal vertex into the PSet;

s3-5, sorting all points in the PSet according to a reverse clock, and calculating the area of the overlapping part of the two quadrangles by using a triangle subdivision algorithm:

the area of triangle Δ IJK is expressed as:

wherein the content of the first and second substances,

represents a vector from I to J;

represents a vector from I to K;

the area of the polygonal Aera (IJKLMNOP) is shown as:

S_{Aera(IJKLMNOP)}＝S_ΔIJK+S_ΔIKL+S_ΔILM+S_ΔIMN+S_ΔINO+S_ΔIOP；

wherein S is_ΔIKL、S_ΔILM、S_ΔIMN、S_ΔINOAnd S_ΔIOPRepresents the areas of triangles Δ IKL, Δ ILM, Δ IMN, Δ INO, and Δ IOP, respectively;

s3-6, obtaining intersection ratio IoU [ i, j ]:

Area(R_i) Represents the area of the rotation frame i; area (R)_j) Represents the area of the rotation frame j; area (i) indicates the overlapping area of the rotation frame i and the rotation frame j;

and S3-7, sorting the intersection ratio of all the candidate frames according to the confidence score, and when the intersection ratio of two frames is more than IoU [ i, j ] and is more than 0.5, reserving the candidate frame with high confidence score.

The invention has the advantages that: in order to solve the problems, the invention provides a method for detecting a target remote sensing image with a large length-width ratio of a regression frame of a single-stage arbitrary quadrangle, which can regress the arbitrary quadrangle. Aiming at a large-scale remote sensing optical image, the pixel position of a target in the image is obtained, the position and the category of a target object can be quickly obtained from a large-area background and are described by a quadrangle closely attached to the contour of the target, the quick detection of the target is realized,

drawings

FIG. 1 is a block diagram of a flow chart of a single-stage arbitrary quadrilateral regression frame large aspect ratio target remote sensing image detection method according to the present invention;

FIG. 2 is a schematic diagram of the present invention illustrating the addition of four offsets to achieve regression rotation;

FIG. 3 is a first overlapping case of two quadrilaterals, for example for a ship;

FIG. 4 is a second overlapping case of two quadrilaterals, for example for a ship;

FIG. 5 is a third overlapping case of two quadrilaterals, for example for a ship;

fig. 6 is a fourth case of overlapping two quadrilaterals, taking a ship as an example.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the embodiments and features of the embodiments may be combined with each other without conflict.

The invention is further described with reference to the following drawings and specific examples, which are not intended to be limiting.

The first embodiment is as follows: the present embodiment is described below with reference to fig. 1, in which the single-stage arbitrary quadrilateral regression frame in the present embodiment is capable of regressing an arbitrary quadrilateral based on a single-stage target detection frame;

the specific process comprises the following steps:

The second embodiment is as follows: in the embodiment, to further explain the first specific embodiment, in step S1, feature extraction is respectively performed on three feature layers of the target remote sensing image, and a CSP-Darknet53 network is used for calculation;

the method specifically comprises the following steps:

In the present embodiment, the conventional target detection algorithm is generally divided into a single-phase algorithm and a two-phase algorithm. The two-stage algorithm is to extract the interest region first and then classify and regress each candidate region. Although the method improves the detection precision to a certain extent, the calculation amount of the network is greatly increased, and the method is not suitable for the detection task of the remote sensing image with a large area. The single-stage target detection network architecture only needs one time of feature extraction and regression classification calculation, so that the speed is greatly improved. In the present embodiment, the CSP-dark net53 network is used as the feature extraction algorithm to extract abundant information features from the input target remote sensing image. And the feature mapping chart of the basic layer is copied during deep feature extraction, so that the problem of gradient disappearance is solved, the reuse of network features is increased, the number of network parameters is reduced, and the calculation speed of feature extraction is accelerated. Meanwhile, for the characteristic diagrams with different scales, the characteristic diagrams are not used independently, and the upper layer information and the lower layer information are respectively subjected to up-down sampling combination, so that the information loss is avoided.

The third concrete implementation mode: in this embodiment, to further explain the first or second embodiment, the regression calculation in S2 includes: and the regression rotation is realized by adding four offsets into the regression center coordinate and the regression width and height.

In the embodiment, since the remote sensing image is different from the conventional image, and objects are all overlooking angles, a target with a large length-width ratio, such as a ship, cannot be accurately positioned by using a horizontal frame. The regression calculation of the tilt target position can be realized by using an arbitrary quadrangular frame in S2. In the regression part, in addition to the coordinates of the center and the width and height of the regression routine, four additional offsets are added to achieve the rotation regression. In addition, the angle is not used as a regression parameter, so that the periodicity problem caused by angle regression is avoided on one hand, and the rotation angle problem is described through the four parameters, so that the robustness of the algorithm is greatly improved, and the influence of single parameter fluctuation on the final result is reduced. The algorithm describes the rotation angle by four offsets, but the loss function is calculated without returning to the four offsets directly, but instead, to the ratio of the four offsets to the width and height.

The fourth concrete implementation mode: the following describes the present embodiment with reference to fig. 2, and the present embodiment further describes a third specific embodiment, where when the regression rotation is implemented by adding four offset amounts, the loss function loss of the target position detection portion is collectively expressed by using the regression losses of four portions:

loss＝lbox+la+lcls+lobj

wherein, the regression loss lbox of the horizontal circumscribed frame is:

representing the true value in the target circumscribing outline tag,

indicates whether there is an object at the (i, j) position, 1 indicates present, and 0 indicates absent; lambda [ alpha ]_boxRepresents a custom horizontal regression loss coefficient, λ_box∈(0,1]；S²Representing each lattice point in the area with the side length S; b represents each bounding box on the grid point;

the regression loss la for the normalized tilt offset is:

the classified losses lcls are:

p_i(c) representing the probability of prediction as class c;

The loss of confidence lobj is:

c_iindicating the probability of predicting the target at the i position,

In this embodiment, the higher the confidence loss lobj score is, the closer to the target, and the lower the score is, the closer to the background. The detector uses the same branch to train classification and regression parameters simultaneously, on one hand, the network calculation speed is accelerated, the network parameters are reduced, on the other hand, the classification and regression parameters are mutually promoted during training, and the network convergence is accelerated.

The fifth concrete implementation mode: the fourth embodiment is further described in the present embodiment, where the regression loss lbox of the horizontal circumscribing frame is used to position the center position and the circumscribing profile of the target;

The sixth specific implementation mode: the present embodiment is described below with reference to fig. 3 to fig. 6, and the present embodiment further describes a fifth embodiment, where the specific method for calculating the intersection ratio between candidate frames of each category in S3 includes:

setting a target remote sensing image as a quadrangle;

s3-1, for any two quadrangles R_iAnd R_jEstablishing an empty point set PSet;

s3-4, dividing the quadrangle R_jAll of (A) are located at fourEdge shape R_iAdding the internal vertex into the PSet;

the area of triangle Δ IJK is expressed as:

wherein the content of the first and second substances,

represents a vector from I to J;

represents a vector from I to K;

the area of the polygonal Aera (IJKLMNOP) is shown as:

S_{Aera(IJKLMNOP)}＝S_ΔIJK+S_ΔIKL+S_ΔILM+S_ΔIMN+S_ΔINO+S_ΔIOP；

s3-6, obtaining intersection ratio IoU [ i, j ]:

In the present embodiment, S3 proposes a non-maximum suppression algorithm for solving any quadrilateral, and the regression frame in the conventional target detection is generally a horizontal rectangle, so that the intersection ratio of two horizontal rectangles can be easily obtained in the non-maximum suppression. However, the algorithm adopts a quadrilateral regression box in any direction to describe the position and the size of the target, so that the invention adopts a more complex intersection ratio calculation mode in any shape. For example, the regression frames of the ship are all quadrangles, taking the quadrangles as examples, as shown in fig. 3 to 6. The overlapping conditions of two quadrangles can be simply classified into four categories, fig. 3 shows that the most specific overlapping area is a triangle, fig. 4 shows that the overlapping area is a quadrangle, fig. 5 shows that the overlapping area is a hexagon, fig. 6 shows that the overlapping area is an octagon, and other odd conditions show that a vertex of one quadrangle falls on another side.

Although the invention herein has been described with reference to particular embodiments, it is to be understood that these embodiments are merely illustrative of the principles and applications of the present invention. It is therefore to be understood that numerous modifications may be made to the illustrative embodiments and that other arrangements may be devised without departing from the spirit and scope of the present invention as defined by the appended claims. It should be understood that features described in different dependent claims and herein may be combined in ways different from those described in the original claims. It is also to be understood that features described in connection with individual embodiments may be used in other described embodiments.

Claims

1. The method for detecting the target remote sensing image with the large length-width ratio of the single-stage arbitrary quadrilateral regression frame is characterized in that,

the detection method is based on a single-stage target detection framework, and can return to any quadrangle;

the specific process comprises the following steps:

s3, combining the candidate frames with high confidence score on three scales, reducing the candidate frames to the original size, calculating the intersection ratio among the candidate frames of each category, and then removing redundant candidate frames by adopting a non-maximum suppression algorithm for solving any quadrangle to obtain a final detection result;

s2 the regression calculation includes: regression rotation is realized by regression center coordinates, regression width and height and adding four offsets;

when the four offsets are added to realize regression rotation, the loss function loss of the target position detection part is represented by the regression loss of the four parts:

loss＝lbox+la+lcls+lobj

wherein, the regression loss lbox of the horizontal circumscribed frame is:

representing the true value in the target circumscribing outline tag,

the regression loss la for the normalized tilt offset is:

the classified losses lcls are:

p_i(c) representing the probability of prediction as class c;

The loss of confidence lobj is:

c_iindicating the probability of predicting the target at the i position,

obtaining a classification result through classified loss lcls, and obtaining a confidence score through confidence loss lobj;

s3, the specific method for calculating the intersection ratio between candidate frames of each category includes:

setting a target remote sensing image as a quadrangle;

s3-1, for any two quadrangles R_iAnd R_jEstablishing an empty point set PSet;

the area of triangle Δ IJK is expressed as:

wherein the content of the first and second substances,

represents a vector from I to J;

represents a vector from I to K;

the area of the polygonal Aera (IJKLMNOP) is shown as:

S_{Aera(IJKLMNOP)}＝S_ΔIJK+S_ΔIKL+S_ΔILM+S_ΔIMN+S_ΔINO+S_ΔIOP；

s3-6, obtaining intersection ratio IoU [ i, j ]:

2. The method for detecting the large aspect ratio target remote sensing image of the single-stage arbitrary quadrilateral regression frame according to claim 1, wherein the step S1 of performing feature extraction on the three feature layers of the target remote sensing image respectively is performed by calculating through a CSP-Darknet53 network;

the method specifically comprises the following steps:

3. The method for detecting the long-width ratio target remote sensing image of the single-stage arbitrary quadrilateral regression frame according to claim 1, wherein the regression loss lbox of the horizontal circumscribed frame is used for positioning the central position and the circumscribed outline of the target;