CN113947723A

CN113947723A - High-resolution remote sensing scene target detection method based on size balance FCOS

Info

Publication number: CN113947723A
Application number: CN202111143539.9A
Authority: CN
Inventors: 尹建伟; 陈振乾; 尚永衡; 蔡钰祥; 杨莹春
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2021-09-28
Filing date: 2021-09-28
Publication date: 2022-01-18
Anticipated expiration: 2041-09-28
Also published as: CN113947723B

Abstract

The invention discloses a high-resolution remote sensing scene target detection method based on a size balance FCOS (fuzzy C-means OS). according to the centrality and border regression stage of a size balance coefficient in a target detection module of the FCOS, a centrality coefficient is dynamically adjusted according to regression information of each target, reasonable weight is distributed to the border regression process of each positive sample, a high-resolution remote sensing target is used for detecting a remote sensing data set to carry out model training, and a model remote sensing ground object is used for identifying. The method fully considers the defects of targets with different sizes under the FCOS centrality evaluation system, performs loss weight reinforcement on samples with positive samples distributed at the edges in a small target anchor frame, suppresses the redundant loss contribution in a large target, and realizes target size balance; the size balance FCOS improves the target detection precision in a high-resolution remote sensing scene under the condition of not introducing extra overhead to a model reasoning stage.

Description

High-resolution remote sensing scene target detection method based on size balance FCOS

Technical Field

The invention belongs to the technical field of computer vision and remote sensing image application, and particularly relates to a high-resolution remote sensing scene target detection method based on a size balance FCOS.

Background

In recent years, the satellite technology has been rapidly developed, the application field of remote sensing images is continuously expanded, and the remote sensing image system plays a great role in the fields of meteorology, geology, agriculture and forestry, military, smart cities and the like. By remote sensing detection, multi-level, visual angle and time observation can be carried out on the images of a large area on the earth in a short time, and the method is an important means for acquiring environmental information and earth resources; through the target detection technology based on deep learning, the extraction of the ground features of the high-resolution remote sensing image can be efficiently and accurately realized, and the manual interpretation cost is reduced.

The remote sensing target has the characteristics of size proportion, various rotation directions, numerous small targets and the like, and the method creates a great test for the traditional target detection algorithm based on an anchor-based (anchor-based). The anchor-free target detection network gets rid of the disadvantages of anchor-based network such as preset fixed proportion and size of anchor frame prior knowledge, and realizes target detection by directly regression based on the distance from a central point to a target frame. The FCOS (full probabilistic One-Stage Object Detection) target Detection network takes a point falling into a target as a positive sample to carry out regression through the centrality, and gives a higher border regression loss weight to the positive sample falling into the target centre through the centrality, and the edge is a low weight; for some small targets with only a small number of positive samples, the weight distribution strategy has a high probability that the positive samples fall on the target edge, and the loss contribution of the positive samples is reduced due to low weight distribution; for large targets, more positive samples are allocated, which causes a contribution of redundancy loss, thereby affecting the overall target detection quality.

A target detection technology, namely, the Faster R-CNN based on an anchor frame is provided in the documents [ Ren S, He K, Girshick R, et al. fast R-CNN: war real-time object detection with region processing networks [ J ]. Advances in neural information processing systems,2015,28: 91-99 ], and the target object is detected by setting the anchor frame with fixed proportion and size through prior knowledge. However, the target size change ratio in the remote sensing scene is large, the rotation directions are various, and the small targets are numerous, so that a great test is created for the target detection technology based on the anchor frame.

A non-anchor frame target detection technology FCOS is proposed in a document [ Tian Z, Shen C, Chen H, et al.Fcos: full volumetric one-stage object detection [ C ]// Proceedings of the IEEE/CVF international conference on computer vision.2019: 9627-9636 ], regression detection is carried out on a target based on a positive sample point, geometric and spatial distribution characteristics of a remote sensing ground object target are adapted, and the defects based on preset proportion and size in the anchor frame detection technology are overcome.

The existing FCOS gives different weights to different positive sample points in a centrality mode, but the centrality cannot correctly reflect the regression quality of an anchor frame, particularly in the case that a remote sensing scene mainly comprises a small target. For small targets, the weight distribution strategy of centrality has a larger probability, so that the positive sample points of the small targets fall on the edge of the anchor frame of the small targets, and the positive sample points acquire low weight mistakenly; for large targets, redundant positive samples exist, which contribute to low-quality regression loss, and influence final target detection.

Disclosure of Invention

In view of the above, the invention provides a high-resolution remote sensing scene target detection method based on a size balance FCOS, which adopts different processing strategies for positive samples in targets with different scales, so that a small target regresses to provide sufficient loss contribution, and simultaneously, redundant positive samples of a large target are inhibited, so that targets with different sizes can obtain ideal detection effects, thereby achieving size balance, and finally realizing the improvement of target detection precision on the premise of not introducing extra overhead.

A high-resolution remote sensing scene target detection method based on a size balance FCOS comprises the following steps:

(1) extracting the centrality and the GIoU (Generalized Intersection over Union) of a positive sample in the image, wherein the positive sample is a pixel point falling in a target anchor frame (artificially marked);

(2) counting the mean value mu of the centrality of all positive samples in the image_cAnd standard deviation σ_cAnd further calculates the centrality threshold c_t(ii) a Counting the number of positive samples falling into each target anchor frame in the image, and obtaining the average distribution number mu of the positive samples of the target anchor frame by averaging_pa；

(3) According to the number of positive samples in the target anchor frame and the average number mu of positive samples distributed in the target anchor frame_paCalculating the weighting coefficient w of the target anchor frame_g；

(4) The mean value mu of all positive samples GIoU in the statistical image_gAnd standard deviation σ_gFurther, the GIoU threshold value t is calculated_gSimultaneously finding out the maximum centrality of the positive sample in the target anchor frame;

(5) calculating and determining a size balance coefficient of each positive sample in the image;

(6) and constructing a size balance FCOS model, adjusting and designing a loss function L of the model, inputting image characteristics into the model, and training the model by using a gradient descent method according to the loss function L so as to perform target detection on ground objects in the remote sensing image.

Further, the centrality threshold c is calculated in the step (2) by the following formula_t；

c_t＝μ_c+λ_cσ_c

Wherein: lambda [ alpha ]_cThe weight parameter is set and is used for adjusting the influence of the centrality standard deviation on the centrality threshold.

Further, in the step (3), the weighting coefficient w of the target anchor frame is calculated by the following formula_g；

Wherein: n is a radical of_gThe number of positive samples in the target anchor frame.

Further, the GIoU threshold value t is calculated in the step (4) by the following equation_g；

t_g＝μ_g+λ_gσ_g

Wherein: lambda [ alpha ]_gIs a set weight parameter for adjusting the effect of the GIoU standard deviation on the GIoU threshold.

Further, the specific implementation manner of the step (5) is as follows: for any positive sample in the image, judging the maximum centrality c of the positive sample in the target anchor frame where the positive sample is located_g,maxAnd a centrality threshold c_tThe size relationship of (1): if c is_g,max<c_tCalculating the size balance coefficient q of the positive sample by the following relational expression;

q＝max(c_t,c_w)

c_w＝w_gc_m

wherein: c. C_mIs the centrality of the positive sample;

if c is_g,max≥c_tAnd the GIoU of the positive sample is smaller than the GIoU threshold value t_gIf the size balance coefficient q of the positive sample is 0;

if c is_g,max≥c_tAnd the GIoU of the positive sample is greater than or equal to the GIoU threshold t_gLet the size balance coefficient q of the positive sample be c_m。

Further, the expression of the loss function L in the step (6) is as follows:

wherein: n is a radical of_posThe number of positive samples in the image, (x, y) the coordinates of a certain positive sample in the corresponding feature pyramid layer,

when in use

Then

Otherwise

Size balance coefficient, L, for coordinate (x, y) corresponding to positive sample_cls() As a function of Focal distance for classification,

when in use

And is

Then

Otherwise

p_x,yAnd

respectively representing the prediction class label probability vector of the positive sample corresponding to the coordinate (x, y) and the corresponding truth label, L_reg() Is the GIoU loss function for bounding box regression, t_x,yAnd

for coordinates (x, y) corresponding to the distance vectors of the positive samples from the prediction anchor frame and the target anchor frame, respectively, L_ctr() As a cross-entropy function, q_x,yFor coordinates (x, y) corresponding to the predicted centrality of the positive sample, λ₁And λ₂Are all given weight parametersAnd (4) counting.

According to the method, the centrality coefficient is dynamically adjusted according to regression information of each target by utilizing the centrality and border regression stages of the size balance coefficient in a target detection module of the FCOS, reasonable weight is distributed to the border regression process of each positive sample, a remote sensing data set is detected by using a high-resolution remote sensing target to carry out model training, and the model is used for identifying remote sensing ground objects. The method fully considers the defects of targets with different sizes under the FCOS centrality evaluation system, performs loss weight reinforcement on samples with positive samples distributed at the edges in a small target anchor frame, suppresses the redundant loss contribution in a large target, and realizes target size balance; the size balance FCOS improves the target detection precision in a high-resolution remote sensing scene under the condition of not introducing extra overhead to a model reasoning stage.

Drawings

FIG. 1 is a schematic diagram of a network architecture for a size balanced FCOS employed in the present invention.

FIG. 2 is a schematic flow chart of the high-resolution remote sensing scene target detection method of the invention.

Fig. 3(a) -3 (f) are schematic diagrams illustrating distribution and processing procedures of positive samples in different sizes of targets, where fig. 3(a) is a case where an anchor frame of a small target and a positive sample thereof fall at an edge of the target anchor frame, fig. 3(b) is a case where the anchor frame of the positive sample within the anchor frame of the small target regresses, fig. 3(c) is a case where the positive sample falls within the distribution of the center of the small target after calculation by a size balance coefficient, fig. 3(d) is a case where the anchor frame of the large target and the positive sample thereof fall at the edge of the target anchor frame, fig. 3(e) is a case where the anchor frames of all the positive samples within the large target regresses, and fig. 3(f) is a case where the positive sample with low quality is removed by the size balance coefficient.

Detailed Description

In order to more specifically describe the present invention, the following detailed description is provided for the technical solution of the present invention with reference to the accompanying drawings and the specific embodiments.

The invention relates to a high-resolution remote sensing scene target detection method based on a size balance FCOS, which comprises the following steps:

(1) and in the size balance FCOS training stage, acquiring the centrality and the GIoU of the RoI.

The FCOS extracts features through a backbone network, and then allocates targets with different scales to different pyramid network layers for down-sampling by using a feature pyramid network; and the target detection module of the FCOS regresses the characteristics of each layer pixel by pixel, screens out the positive sample falling in the target anchor frame, obtains the regression probability, the centrality and the distance between the coordinate of the positive sample and the four edges of the target anchor frame, and calculates the GIoU value according to the distance.

The size balance FCOS network structure adopted by the invention is shown in figure 1, an input image firstly passes through a feature extraction network ResNet to obtain features of C3-C5 with different resolutions, and then a feature pyramid network FPN is used to obtain P3-P7 features with different resolutions.

The target detection module of the size balance FCOS extracts different layer characteristics on the FPN, and two groups of convolutions are used for completing a target detection task; each set of convolutions comprises 4 convolution kernels of 256 channels of 3 x 3, the corresponding pyramid network characteristic of the input is

l is the corresponding pyramid layer number, H_lAnd W_lIs corresponding to P_lThe resolution of (a); finally, the features obtained through the convolution group pass through convolution kernels of 1 3 multiplied by 3T channel, and finally the prediction category of the target is obtained

Where T is the total number of classes of the object and Cls represents the probability of each class for each point of the current feature layer.

The regression of the anchor frame and the centrality shares a group of convolution characteristics, and 1 convolution of 4 channels and 1 channel of 3 multiplied by 3 is respectively adopted for the characteristics obtained by the convolution group to obtain

Predicted anchor frame and predicted centrality of

Where 4 channels represent a channel where t ═ is (l, r, t,b) representing the distances from the four sides of the target anchor frame of the current point prediction, and the centrality prediction represents the centrality coefficient of the current sample point.

And (3) taking the samples falling into the target anchor frame as positive samples to carry out regression, and calculating the centrality and the GIoU value of the positive samples by the following formulas:

in the formula: t is t^*＝(l^*,r^*,t^*,b^*) Is the distance vector from the positive sample point to the left, right, top and bottom four edges of the target anchor frame, a is the anchor frame of the positive sample prediction, B is the anchor frame of the target, and C is the minimum convex hull that can contain both a and B.

(2) Global target information perception: counting the mean value and the standard deviation of the centrality of all positive samples in the current training image, and calculating a centrality threshold value according to the mean value and the standard deviation of the centrality; and calculating the number of positive samples falling into each target anchor frame, and counting the number of positive samples obtained by average distribution of each target anchor frame.

The work flow of global target information perception is shown in the first column on the left side of fig. 2, and the mean value μ of centrality of all positive samples is calculated first_cStandard deviation σ_cCalculating the centrality threshold c according to the centrality mean and the standard deviation_t＝μ_c+λ_cσ_cWherein λ is_cThe super parameter is a super parameter of the centrality threshold and is used for adjusting the influence of the centrality standard deviation on the centrality threshold. And then, calculating the number of positive samples distributed to each target anchor frame, and for the positive samples falling into a plurality of target anchor frames, classifying the positive samples into the target anchor frame with the smallest area. Calculating the number of evenly distributed positive samples of each target anchor frame

Wherein N is_pIs the total number of positive samples, N, of all layers of the pyramid network_AIs the total number of target anchor frames, μ_paIs the average of the number of positive samples assigned to each target anchor block.

In FIGS. 3(a) and 3(d), the total number of targets N_aNumber of samples N ═ 5_pAverage number of positive samples, μ, assigned to each target_pa2.5. The positive sample centrality in the two target anchor frames is C_g1＝[0.2]，C_g2＝[0.8,0.7,0.5,0.3]Mean centrality μ_c0.5, standard deviation of centrality σ_c0.23, according to the results of table 1, λ is set here_c0.25, centrality threshold c_t＝μ_c+λ_cσ_c＝0.56。

TABLE 1

λ_c	-0.50	-0.25	0	0.25	0.50
						mAP	37.6	38.0	37.8	37.7	37.6

The mAP is an average precision mean (mean average precision) of the size balance FCOS and is used for evaluating the target detection precision, and ResNet-50 and FPN are used as feature extraction networks in evaluation models.

(3) And (3) carrying out regression quality statistics on the target anchor frame positive sample: calculating a size balance coefficient according to the number of positive samples distributed in the current target anchor frame and the average value of the positive samples distributed by the global target; calculating a GIoU mean value and a standard deviation of a positive sample in the current target, and calculating a GIoU threshold value according to the GIoU mean value and the standard deviation; the maximum centrality of all positive samples in the target is calculated.

The workflow of the regression quality statistics of the positive samples of the target anchor frame is shown in the middle column of fig. 2, in this step, firstly, the target set G is traversed, and all the positive sample sets M in the current target anchor frame G are processed_gCalculate M_gMaximum centrality c of the positive sample_g,max(ii) a Calculating M_gGIoU mean μ of all positive samples_gStandard deviation σ_g. Calculating M according to the mean value and standard deviation of GIoU_gThe GIoU threshold of (1) is t_g＝μ_g+λ_gσ_gWherein λ is_gIs a super parameter of the GIoU threshold and is used to adjust the effect of the GIoU standard deviation on the threshold.

Calculating weighting coefficients

Wherein N is_gIs the number of positive samples in the current target anchor frame g by dividing N_gIs placed in the denominator such that w_gInversely proportional to the number of falling positive samples, so that smaller targets get a higher weight addition.

In FIGS. 3(b) and 3(e), the GIoU values of the samples in the two target anchor boxes are D_g1＝[0.8]，D_g2＝[0.9,0.8,0.5,0.2](ii) a The maximum centrality of the samples in the two target anchor frames is c_max,g1＝0.2，c_max,g20.8; according to N_g1＝1，N_g2＝4，μ_pa＝2.5，

Calculating to obtain a size balance coefficient w_g1＝1.58，w_g2＝0.79。

Calculate each target Anchor frame D_gMean value of middle GIoU_g1＝0.8，μ_g20.6, standard deviation σ_g1＝0，σ_g20.27; according to the results of Table 2, λ is set here_g-0.25, according to the GIoU threshold calculation formula t_g＝μ_g+λ_gσ_gTo obtain t_g1＝0.8，t_g2＝0.53。

TABLE 2

λ_g	-0.50	-0.25	0	0.25	0.50
						mAP	38.0	37.9	38.2	38.1	37.8

(4) And setting a size balance coefficient, and determining the centrality coefficient of the current positive sample to carry out weighting, removal or retention according to the GIoU and the centrality value of the target where the current positive sample is located. HeadFirst, go through M_gThe weighted center degree c of the positive sample m is obtained_w＝w_gc_mWherein c is_mIs the centrality of the current positive sample m. Then, the maximum centrality c of the positive sample in the current target anchor frame is judged_g,maxAnd a centrality threshold c_tThe size relationship of (1): if c is_g,maxLess than centrality threshold c_tThis case is considered to be a positive sample within the extreme target anchor box. At this time, the centering degree needs to be weighted, and the size balance coefficient q of the target is max (c)_t,c_w) I.e. the large values of the centrality threshold and the weighted centrality, ensure that the small targets whose positive samples all fall on the edge of the target anchor frame are weight protected. For targets greater than the threshold, if the GIoU value d of the positive sample_mIf the value is less than the GIoU threshold value, the low-quality regression is considered, the size balance coefficient q is set to be 0, otherwise the original centrality c of the q is kept_m。

As shown in the first column on the right in FIG. 2 and in FIG. 3(c), for the target g1, since c is_g,max<c_tWhere the positive samples regress to obtain a high quality anchor frame, but the centrality is suppressed, where the centrality is weighted, c_w1＝w_g1c_m10.3, the positive sample centrality q in the target is set to max (c)_t,c_w) 0.56; the centrality distribution of FIG. 3(c) is a distribution curve of the centrality threshold by selecting max (c)_t,c_w) Ensuring that the sample is near the target center point.

For target anchor frame g2, as shown in FIG. 3(e), due to the maximum centration c_g2,max≥c_tAt this time, the regression of the anchor frame with high weight and quality can be ensured, and redundant positive samples in the target need to be removed; traversal of positive samples within g2, GIoU value D for g2 positive samples_g2,30.5 and D_g2,40.5, since two GIoU values are less than t_g20.53, belonging to the low-quality anchor frame prediction, the size balance coefficient q is set to 0, and the positive sample is removed, and the original centrality of the remaining positive sample is retained, as shown in fig. 3(f), the removed positive sample is represented by "x", and the final remaining prediction anchor frame is retained by the solid rectangle.

(5) And during the calculation of the loss function, multiplying the obtained size balance coefficient by the frame regression loss of each positive sample, and enabling the centrality branch of the FCOS to carry out regression on the size balance coefficient to complete the training stage.

The computational expression of the loss function is as follows:

in the formula: n is a radical of_posThe number of positive samples in the image, (x, y) the coordinates of a certain positive sample in the corresponding feature pyramid layer,

when in use

Then

Otherwise

when in use

And is

Then

Otherwise

p_x,yAnd

for coordinates (x, y) corresponding to the distance vectors of the positive samples from the prediction anchor frame and the target anchor frame, respectively, L_ctr() As a cross-entropy function, q_x,yFor coordinates (x, y) corresponding to the predicted centrality of the positive sample, λ₁And λ₂Are given weight parameters.

Table 3 shows the comparison of FCOS and size-balanced FCOS for other positive sample processing strategies, where no centrality is introduced in FCOS and ATSS removes only below the GIoU threshold t compared to size-balanced FCOS_gThe suppressed small target is not weighted; the size balance coefficient realizes the improvement of target detection precision under the condition of not causing extra overhead, and has more superiority compared with other balance coefficient methods.

TABLE 3

Method	Without centrality	FCOS	ATSS	Size balanced FCOS
					mAP	37.5	37.8	38.0	38.2

(6) And performing target detection on the ground objects in the input preprocessed remote sensing image by using the trained size balance FCOS model.

The foregoing description of the embodiments is provided to enable one of ordinary skill in the art to make and use the invention, and it is to be understood that other modifications of the embodiments, and the generic principles defined herein may be applied to other embodiments without the use of inventive faculty, as will be readily apparent to those skilled in the art. Therefore, the present invention is not limited to the above embodiments, and those skilled in the art should make improvements and modifications to the present invention based on the disclosure of the present invention within the protection scope of the present invention.

Claims

1. A high-resolution remote sensing scene target detection method based on a size balance FCOS comprises the following steps:

(1) extracting the centrality and the GIoU of a positive sample in the image, wherein the positive sample is a pixel point falling in a target anchor frame;

(4) Counting places in imagesMean value μ of GIoU with positive sample_gAnd standard deviation σ_gFurther, the GIoU threshold value t is calculated_gSimultaneously finding out the maximum centrality of the positive sample in the target anchor frame;

2. The method for detecting the target in the high-resolution remote sensing scene according to claim 1, characterized in that: in the step (2), the centrality threshold c is calculated by the following formula_t；

c_t＝μ_c+λ_cσ_c

Wherein: lambda [ alpha ]_cIs the set weight parameter.

3. The method for detecting the target in the high-resolution remote sensing scene according to claim 1, characterized in that: in the step (3), the weighting coefficient w of the target anchor frame is calculated by the following formula_g；

4. The method for detecting the target in the high-resolution remote sensing scene according to claim 1, characterized in that: the GIoU threshold value t is calculated in the step (4) by the following formula_g；

t_g＝μ_g+λ_gσ_g

Wherein: lambda [ alpha ]_gIs the set weight parameter.

5. According to claim 1The high-resolution remote sensing scene target detection method is characterized by comprising the following steps: the specific implementation manner of the step (5) is as follows: for any positive sample in the image, judging the maximum centrality c of the positive sample in the target anchor frame where the positive sample is located_g，maxAnd a centrality threshold c_tThe size relationship of (1): if c is_g，max＜c_tCalculating the size balance coefficient q of the positive sample by the following relational expression;

q＝max(c_t，c_w)

c_w＝w_gc_m

wherein: c. C_mIs the centrality of the positive sample;

if c is_g，max≥c_tAnd the GIoU of the positive sample is smaller than the GIoU threshold value t_gIf the size balance coefficient q of the positive sample is 0;

if c is_g，max≥c_tAnd the GIoU of the positive sample is greater than or equal to the GIoU threshold t_gLet the size balance coefficient q of the positive sample be c_m。

6. The method for detecting the target in the high-resolution remote sensing scene according to claim 1, characterized in that: the expression of the loss function L in the step (6) is as follows:

when in use

Then

Otherwise

or 1, when

And is

Then

Otherwise

p_x，yAnd

respectively representing the prediction class label probability vector of the positive sample corresponding to the coordinate (x, y) and the corresponding truth label, L_reg() Is the GIoU loss function for bounding box regression, t_x，yAnd

for coordinates (x, y) corresponding to the distance vectors of the positive samples from the prediction anchor frame and the target anchor frame, respectively, L_ctr() As a cross-entropy function, q_x，yFor coordinates (x, y) corresponding to the predicted centrality of the positive sample, λ₁And λ₂Are given weight parameters.