CN115019181A

CN115019181A - Remote sensing image rotating target detection method, electronic equipment and storage medium

Info

Publication number: CN115019181A
Application number: CN202210900309.0A
Authority: CN
Inventors: 金世超; 冯鹏铭; 贺广均; 符晗; 刘世烁; 常江; 邹同元; 梁银川; 车程安; 田路云
Original assignee: Beijing Institute of Satellite Information Engineering
Current assignee: Beijing Institute of Satellite Information Engineering
Priority date: 2022-07-28
Filing date: 2022-07-28
Publication date: 2022-09-06
Anticipated expiration: 2042-07-28
Also published as: CN115019181B

Abstract

The invention relates to a method for detecting a rotating target of a remote sensing image, electronic equipment and a storage medium, wherein in training, rich sample points are obtained by using an elliptical distribution sampling mode for a given target position label; by utilizing a self-adaptive foreground sampling strategy, high-quality foreground sample points are sequentially obtained from a high-level characteristic diagram to a low-level characteristic diagram and are input into a loss function together with a network predicted foreground target, so that a more accurate target characteristic representation method is learned.

Description

Remote sensing image rotating target detection method, electronic equipment and storage medium

Technical Field

The invention relates to a remote sensing image rotating target detection method, electronic equipment and a storage medium.

Background

With the gradual improvement of the resolution and the imaging quality of the remote sensing images, the extraction of interested targets from the high-resolution remote sensing images becomes possible. The rotary target in the remote sensing image can be well detected by utilizing the deep learning technology based on the convolutional neural network through automatic extraction and learning of target characteristics.

Common target detection methods based on deep learning mainly include two types: 1. a multi-step detector, namely, on a feature map extracted by a feature extraction backbone Network, firstly distinguishing a target (foreground) from a background by using a Region suggestion Network (RPN) through an Anchor frame (Anchor), and then classifying and regressing the target according to an output result of the RPN; 2. the single-step detector mainly comprises an anchor frame and a non-anchor frame, and predicts the offset and the category of a target position on the feature map by using the anchor frame and then performs regression and classification; and the single-step detection method based on the non-anchor frame directly predicts four vertexes of the target frame on the down-sampled feature map by using a method similar to key point detection, classifies the target and regresses the vertexes.

The traditional non-anchor frame single-step detector mainly comprises the following steps:

step 1: extracting features of the input image by using a deep convolutional neural network as a target feature extraction backbone network to obtain a down-sampled feature map;

step 2: performing upsampling on the downsampled Feature map layer by using a Feature Pyramid (FPN) to acquire more detailed features of the image;

and step 3: after a characteristic diagram is obtained from the FPN, predicting the category and the position of a rotating frame representing a target by utilizing two network branches, namely a target classification network, a size regression network and an angle regression network;

and 4, step 4: obtaining foreground sample points by using a sampling strategy based on a fixed center distance and a limited characteristic map, and respectively calculating the losses of the two branch networks through loss functions by combining the predicted sample points;

and 5: through gradient back propagation, network parameters are optimized, and the purpose of training a target detection network is achieved;

step 6: and detecting the target in the high-resolution remote sensing image by using the trained network, and obtaining and outputting the information representing the category, the position and the angle of the target rotating frame by using the output results of the two sub-networks.

Although the traditional method can acquire and learn the characteristics of the target, the characteristics of the target size and the extremely large variation range of the aspect ratio in the remote sensing image are ignored, so that the problems that the characteristics of the first two ends of the target cannot be acquired and the positive sample is lost exist.

Disclosure of Invention

In view of the technical problems, the invention provides a remote sensing image rotating target detection method, electronic equipment and a storage medium by using a self-adaptive ellipse distribution foreground sample sampling strategy, so that a network framework can obtain high-quality target characteristics by using a single-step non-anchor frame process to further perform target detection of a rotating frame; based on the sampling strategy, the invention solves another key problem that the target scale and the length-width ratio change greatly in the remote sensing image target detection, so that the network is difficult to extract the accurate features of the target, thereby improving the accuracy of the target detection.

The technical solution for realizing the purpose of the invention is as follows: a remote sensing image rotating target detection method comprises the following steps:

s1, obtaining the remote sensing image and a target label corresponding to the remote sensing image, and obtaining a corresponding multi-scale feature map according to the remote sensing image;

s2, constructing a target classification branch network and a position and angle regression branch network, and predicting the multi-scale feature map to obtain a target prediction value;

s3, screening sample points meeting the elliptical distribution on the multi-scale characteristic graph by using the target label corresponding to the remote sensing image in the S1 to obtain foreground sample points and obtain the real category and the regression value of the foreground sample points;

step S4, performing network training by using the foreground sample points and the predicted values, repeatedly executing steps S1 to S3, and training a detection model;

and S5, detecting the remote sensing image by using the detection model obtained in the step S4.

According to one aspect of the invention, in step S1, after the remote sensing image is acquired, the remote sensing image is preprocessed, and the preprocessing process at least includes cutting and turning.

According to an aspect of the present invention, in step S1, the method further includes:

step S11, extracting input sample characteristics by using a characteristic extraction backbone network to obtain a characteristic diagram;

and S12, performing up-sampling and feature fusion on the feature map obtained in the step S11 by using the feature pyramid to obtain a multi-scale feature map.

According to an aspect of the present invention, in step S2, the method specifically includes:

step S21, establishing a classification prediction network branch and a position regression prediction network branch on the basis of the multi-scale characteristic diagram obtained in the step S1;

and step S22, predicting the multi-scale characteristic diagram by adopting the classification prediction network branch and the position regression prediction network branch to obtain a target prediction value.

According to an aspect of the present invention, in step S3, the method specifically includes:

s31, according to the target label corresponding to the remote sensing image in the S1, aiming at the actual size of each target, screening out sample points meeting the elliptical distribution on the multi-scale feature map obtained in the S2;

in step S32, a self-adaptive foreground sample sampling strategy is adopted, and a certain number of foreground sample points are sequentially sampled from the high level to the low level on the feature pyramid, and the true category and the regression value thereof are obtained.

According to an aspect of the present invention, in step S31, the method specifically includes:

step S311, obtaining the multi-scale characteristic diagram by using the step 3

According to the sizes of different feature maps, calculating the position coordinates of the sample points mapped back to the original image

；

Step S312, utilizing the position coordinates of the sample points

And the coordinates of the rotating frame in the corresponding target label

Constructing an ellipse distribution sampling range based on the target size and the rotation angle, and screening out the ellipse distribution range

Inner sample points:

wherein the content of the first and second substances,

is the x-axis coordinate of the central point of the rotating frame,

is the coordinate of the center point y axis of the rotating frame, w is the width of the rotating frame, h is the height of the rotating frame,

and

calculating the coordinates of the target label and the sample point to obtain:

；

wherein the content of the first and second substances,

and

representing the coordinate offset between the sample point and the center point of the rotating frame:

；

range threshold representing elliptical distribution:

。

according to an aspect of the present invention, in step S32, the method specifically includes:

step S321, utilizing the width of the rotating frame in the target label

And height

Obtaining the longest edge of each rotating frame

：

Step S322, calculating Euclidean distance from the sample point to the center point of the rotating frame by using the sample point in the elliptical distribution range acquired in the step S31

And the longest side of the rotating frame

Calculating the normalized distance from the sample point to the center of the rotating frame

：

And S323, sequentially sampling foreground samples from large to small according to the longest edge of the rotating frame, sequentially selecting sample points which are closest to the high-level feature map and are not selected from the low-level feature map by utilizing the normalized distance, and regarding the sample points as the foreground sample points of the rotating frame.

According to an aspect of the present invention, in step S4, performing network training by using the foreground sample points and the predicted values, specifically including:

step S41, adopting focal loss to calculate loss in the target classification network branch, the formula is as follows:

wherein alpha and beta are a balance factor and a smoothing factor respectively,Mthe number of the selected sample points in the image is counted;

step S42, calculating the position regression branch network loss by using Smooth-L1 loss;

step S43, weighted average is performed on the two branch network losses to obtain a total loss:

wherein the content of the first and second substances,

is the number of positive samples and is,

,

loss of classification, positional regression, respectively.

According to an aspect of the present invention, there is provided an electronic apparatus including: one or more processors, one or more memories, and one or more computer programs; wherein, the processor is connected with the memory, the one or more computer programs are stored in the memory, and when the electronic device runs, the processor executes the one or more computer programs stored in the memory, so as to make the electronic device execute a remote sensing image rotating object detection method according to any one of the above technical solutions.

According to an aspect of the present invention, there is provided a computer readable storage medium for storing computer instructions, which when executed by a processor, implement a method for detecting a rotating target in a remote sensing image according to any one of the above technical solutions.

According to the concept of the invention, the invention provides a remote sensing image rotating target detection method, electronic equipment and a storage medium, wherein in training, rich sample points are obtained by using an elliptical distribution sampling mode for a given target position label; secondly, by utilizing a self-adaptive foreground sampling strategy, high-quality foreground sample points are sequentially obtained from a high-level feature map to a low-level feature map and then are input into a loss function together with a foreground target predicted by a network, so that a more accurate target feature representation method is learned.

Meanwhile, in the constructed feature pyramid, high-quality foreground sampling points are obtained in all layers of the feature pyramid from high to low according to the size of the target, training and prediction are carried out through fusion, the problem that small-size targets are difficult to obtain sampling points in the feature pyramid and too many redundant sampling points are obtained through large sizes is solved, sampling precision and generalization are improved through a self-adaptive method, and the method has important significance for high-resolution remote sensing image rotating frame target detection.

Drawings

FIG. 1 is a flow chart schematically illustrating model training of a method for detecting a rotating target in a remote sensing image according to an embodiment of the invention;

FIG. 2 is a schematic diagram illustrating adaptive elliptically distributed foreground sample point sampling according to an embodiment of the present invention;

FIG. 3 schematically illustrates a model training phase in accordance with one embodiment of the present invention;

FIG. 4 is a flow chart that schematically illustrates a method for detecting a rotating target in a remote sensing image, in accordance with an embodiment of the present invention;

FIG. 5 schematically shows a flowchart of step S2 according to an embodiment of the present invention;

fig. 6 schematically shows a flowchart of step S3 according to an embodiment of the present invention.

Detailed Description

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments will be briefly described below. It is obvious that the drawings in the following description are only some embodiments of the invention, and that for a person skilled in the art, other drawings can be derived from them without inventive effort.

The present invention is described in detail below with reference to the drawings and the specific embodiments, which are not repeated herein, but the embodiments of the present invention are not limited to the following embodiments.

As shown in fig. 1 to 6, the method for detecting the remote sensing image rotating target of the invention comprises the following steps:

s1, obtaining the remote sensing image and a target label corresponding to the remote sensing image, and obtaining a corresponding multi-scale characteristic diagram according to the remote sensing image;

s4, performing network training by using the foreground sample points and the predicted values, repeatedly executing the steps S1-S3, and training a detection model;

In this embodiment, as shown in fig. 1 and 4, in training, for a given target position label, rich sample points are obtained by using an elliptical distribution sampling method; secondly, by utilizing a self-adaptive foreground sampling strategy, high-quality foreground sample points are sequentially obtained from a high-level feature map to a low-level feature map and then are input into a loss function together with a foreground target predicted by a network, so that a more accurate target feature representation method is learned.

Meanwhile, in the constructed characteristic pyramid, high-quality foreground sampling points are obtained in each layer of the characteristic pyramid from high to low according to the size of the target, training and prediction are carried out through fusion, the problem that small-size targets are difficult to obtain sampling points in the characteristic pyramid and too many redundant sampling points are obtained in large sizes is solved, sampling precision and generalization are improved through a self-adaptive method, and the method has great significance for detecting the high-resolution remote sensing image rotating frame target.

In an embodiment of the present invention, preferably, in step S1, after the remote sensing image is acquired, the remote sensing image is preprocessed, and the preprocessing process at least includes cropping and flipping.

In the embodiment, the remote sensing image is cut, turned and the like, so that the complexity of a subsequent algorithm is reduced and the efficiency is improved.

In an embodiment of the present invention, preferably, in step S1, the method further includes:

As shown in fig. 5, in an embodiment of the present invention, preferably, in step S2, the method specifically includes:

s21, establishing a classification prediction network branch and a position regression prediction network branch on the basis of the multi-scale feature map obtained in the S1;

As shown in fig. 6, in an embodiment of the present invention, preferably, in step S3, the method specifically includes:

In this embodiment, the feature extraction backbone network is used to extract the features of the input samples to obtain a feature map

Wherein

The number of layers of the characteristic diagram; applying the feature pyramid structure, firstly, the feature map obtained in step S2 is compared

、

、

Up-sampling to obtain characteristic diagram

、

、

Then go right again

Down-sampling to obtain

、

And using 1 × 1 convolution to make the feature pyramid in which the feature is formed

、

、

、

、

Cascading to obtain Feature heads (Feature heads) with different scales; and establishing a classification prediction network branch and a position regression prediction network branch on the basis of the acquired Feature Head.

In the process of target detection, a foreground target and a background need to be distinguished, a classification prediction network branch can be used for distinguishing the foreground target and the background, the types of the targets are distinguished, and a position regression prediction network branch is used for determining the position and the angle of a rotating frame.

In one embodiment of the present invention, preferably, in step S31, the size according to the target label and the obtained size is

Characteristic diagram

Constructing an elliptical distribution to screen sample points, wherein

In order to be a two-dimensional size of the feature map,

the number of the channels specifically comprises:

step S311, utilizing step 3 to obtain the multi-scale characteristic diagram

；

Wherein it is assumed that

Is from input to

Total step size of layers, for feature maps

A position on

Its position on the input image is:

wherein the content of the first and second substances,

represents rounding down;

step S312, utilizing the position coordinates of the sample points

And rotating frame coordinates in the corresponding target label

Inner sample points:

wherein the content of the first and second substances,

is the x-axis coordinate of the central point of the rotating frame,

and

calculating the coordinates of the target label and the sample point to obtain:

；

wherein the content of the first and second substances,

and

represents the coordinate offset between the sample point and the center point of the rotating frame:

；

range threshold representing elliptical distribution:

。

in the embodiment, a large number of backgrounds can be ignored and foreground objects are highlighted by constructing the ellipse distribution sampling range based on the size and the rotation angle of the object, and the sampling is adaptively performed on the feature map by adjusting the lengths of the long side and the short side of the ellipse, so that the extraction of background information in the rectangular frame is reduced, and the object feature information is more accurately extracted.

As shown in fig. 2 and 3, in an embodiment of the present invention, preferably, in step S32, the method specifically includes:

step S321, utilizing the width of the rotating frame in the target label

And height

Obtaining the longest edge of each rotating frame

：

And the longest side of the rotating frame

：

In an embodiment of the present invention, preferably, in step S4, the network training is performed by using the foreground sample points and the predicted values, which specifically includes:

step S41, adopting focal loss to calculate loss in the target classification network branch, wherein the formula is as follows:

wherein the content of the first and second substances,Mthe number of the selected sample points in the image is counted;

step S43, the two branch network losses are weighted and averaged to obtain a total loss:

wherein the content of the first and second substances,

is the number of positive samples and is,

,

loss of classification, positional regression, respectively.

According to an aspect of the present invention, there is provided a computer-readable storage medium for storing computer instructions, which when executed by a processor, implement a method for detecting a rotating target in a remote sensing image according to any one of the above technical solutions.

In summary, the invention provides a method for detecting a rotating target of a remote sensing image, an electronic device and a storage medium, wherein in training, rich sample points are obtained by using an elliptical distribution sampling mode for a given target position label; secondly, by utilizing a self-adaptive foreground sampling strategy, high-quality foreground sample points are sequentially obtained from a high-level feature map to a low-level feature map and then are input into a loss function together with a foreground target predicted by a network, so that a more accurate target feature representation method is learned.

Furthermore, it should be noted that the present invention may be provided as a method, apparatus or computer program product. Accordingly, embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present invention may take the form of a computer program product embodied on one or more computer-usable storage media having computer-usable program code embodied in the medium.

Embodiments of the present invention are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, embedded processor, or other programmable data processing terminal to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing terminal to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be loaded onto a computer or other programmable data processing terminal to cause a series of operational steps to be performed on the computer or other programmable terminal to produce a computer implemented process such that the instructions which execute on the computer or other programmable terminal provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It should also be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in the process, method, article, or terminal equipment comprising the element.

Finally, it should be noted that while the above describes a preferred embodiment of the invention, it will be appreciated by those skilled in the art that, once the basic inventive concepts have been learned, numerous changes and modifications may be made without departing from the principles of the invention, which shall be deemed to be within the scope of the invention. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the embodiments of the invention.

Claims

1. A method for detecting a remote sensing image rotating target comprises the following steps:

s3, screening out sample points meeting the elliptical distribution on the multi-scale characteristic diagram by using the target labels corresponding to the remote sensing images in the S1, obtaining foreground sample points, and obtaining the real categories and regression values of the foreground sample points;

2. The method of claim 1, wherein in step S1, after the remote sensing image is obtained, the remote sensing image is preprocessed, and the preprocessing includes at least cropping and flipping.

3. The method according to claim 2, wherein in step S1, the method further comprises:

4. The method according to claim 3, wherein in step S2, the method specifically comprises:

and step S22, predicting the multi-scale feature map by adopting the classification prediction network branch and the position regression prediction network branch to obtain a target prediction value.

5. The method according to claim 1, wherein in step S3, the method specifically includes:

in step S32, a self-adaptive foreground sample sampling strategy is adopted, and a certain number of foreground sample points are sequentially sampled from a high layer to a low layer on the feature pyramid, and the true category and the regression value thereof are obtained.

6. The method according to claim 5, wherein in step S31, the method specifically includes:

step S311, utilizing step 3 to obtain the multi-scale characteristic diagram

；

Step S312, utilizing the position coordinates of the sample points

And the coordinates of the rotating frame in the corresponding target label

Inner sample points:

wherein the content of the first and second substances,

is the x-axis coordinate of the central point of the rotating frame,

and

calculating the coordinates of the target label and the sample point to obtain:

；

wherein the content of the first and second substances,

and

；

range threshold representing elliptical distribution:

。

7. the method according to claim 6, wherein in step S32, the method specifically includes:

step S321, utilizing the width of the rotating frame in the target label

And height

Obtaining the longest edge of each rotating frame

：

And the longest side of the rotating frame

：

8. The method according to claim 1, wherein in step S4, performing network training using the foreground sample points and the predicted values specifically includes:

wherein, α and βRespectively a balance factor and a smoothing factor,Mthe number of the selected sample points in the image is counted;

wherein the content of the first and second substances,

is the number of positive samples and is,

,

the loss of classification and position regression respectively.

9. An electronic device, comprising: one or more processors, one or more memories, and one or more computer programs; wherein a processor is connected to the memory, the one or more computer programs being stored in the memory, and when the electronic device is running, the processor executes the one or more computer programs stored in the memory to cause the electronic device to perform the method for remote sensing image rotation target detection according to any one of claims 1-8.

10. A computer-readable storage medium storing computer instructions which, when executed by a processor, implement a method for detecting a rotating object in a remote sensing image according to any one of claims 1 to 8.