CN112818964A

CN112818964A - Unmanned aerial vehicle detection method based on FoveaBox anchor-free neural network

Info

Publication number: CN112818964A
Application number: CN202110350008.0A
Authority: CN
Inventors: 屈景怡; 刘闪亮; 李云龙; 吴仁彪
Original assignee: Civil Aviation University of China
Current assignee: Civil Aviation University of China
Priority date: 2021-03-31
Filing date: 2021-03-31
Publication date: 2021-05-18

Abstract

The invention provides an unmanned aerial vehicle detection method based on a FoveaBox anchor-free neural network, which comprises the steps of firstly setting initial parameters of a FoveaBox neural network model, inputting training set images in an unmanned aerial vehicle database into the set FoveaBox neural network model for training, and obtaining an unmanned aerial vehicle detection model based on deep learning; inputting the unmanned aerial vehicle image to be detected into the unmanned aerial vehicle detection model trained in the step one to obtain a characteristic diagram of the multi-layer predicted target possibility; and then, the output characteristic diagram of the backbone network is processed through the position sub-network, and the output characteristic diagram of the backbone network is classified pixel by pixel in combination with the detection head sub-network, so that the target type and the position information are directly detected and obtained. The unmanned aerial vehicle detection method based on the FoveaBox anchor-free neural network can automatically identify the unmanned aerial vehicle and determine the position information of the unmanned aerial vehicle for the identified unmanned aerial vehicle mark, and is wide in application range and high in universality.

Description

Unmanned aerial vehicle detection method based on FoveaBox anchor-free neural network

Technical Field

The invention belongs to the technical field of unmanned aerial vehicle detection in video and image processing under key scenes, and particularly relates to an unmanned aerial vehicle detection method based on a FoveaBox anchor-free neural network.

Background

Along with the development of unmanned aerial vehicle technique and the reduction of cost, its load-carrying capacity is bigger and bigger, flight distance is more and more far away to and the continuous improvement of people's standard of living, the unmanned aerial vehicle market also grows rapidly, and is used in each field such as taking photo by plane, survey and drawing, delivery, rescue. Along with this, the probability of failure of the unmanned aerial vehicle invading the airport clearance area is also rapidly rising: the illegally used unmanned aerial vehicle enters a flight-forbidden airspace or other protected airspaces, normal radio communication is influenced, and great threat is caused to production and life of people. Especially, frequent 'black flying' unmanned aerial vehicles bring serious threats to the safe operation of civil aircrafts, so that huge economic losses are brought to airports, and extremely bad social influences are caused. Therefore, in the radio management work, research work on unmanned aerial vehicle supervision and countermeasures is increasingly emphasized, and research and application of unmanned aerial vehicle detection and identification technology become scientific research focus for increasingly severe unmanned aerial vehicle threats.

The domestic and foreign unmanned aerial vehicle detection and countercheck system is composed of the technical fields of detection tracking and early warning, interference, damage and deception, and is implemented by signal interference, radar detection, laser shot-shooting, comprehensive technology and the like. If the unmanned aerial vehicle performs tasks alone or only a few teams are used for fighting, conventional countermeasure can be adopted, but when the large teams of unmanned aerial vehicles adopting bee colony tactics come, the reaction time left for the fighters and the system is extremely short. Traditional air defense weapon systems are the most common anti-drone weapons and can be deployed on air-based, sea-based and land-based platforms, but these weapons have a great cost asymmetry problem for micro-drones, and these systems are bulky and cannot resist the intrusion of small and cheap drone clusters.

At present, for image target detection, the application of a deep neural network algorithm in the field of target detection is greatly developed and improved, and the deep neural network algorithm needs a large number of pictures as model training supports. Unmanned aerial vehicle image easily shoots, and unmanned aerial vehicle detects also belongs to the scope that the target detected, is expected to use the important environment occasion at unmanned aerial vehicle detection with the degree of depth neural network. But the detection performance of the anchor-based target detector is greatly influenced by the size of an anchor frame; the size of the anchor frame is set in advance and is invariable, so that the small target detection effect is poor; complex calculations will result due to the setting of the anchor box.

Disclosure of Invention

In view of this, the invention aims to provide an unmanned aerial vehicle detection method based on a FoveaBox anchor-free neural network, so as to solve the technical problems of poor unmanned aerial vehicle detection effect, large calculated amount, low detection rate and the like of the existing unmanned aerial vehicle detection method.

In order to achieve the purpose, the technical scheme of the invention is realized as follows:

an unmanned aerial vehicle detection method based on a FoveaBox anchor-free neural network comprises the following steps:

s1, setting initial parameters of a FoveaBox neural network model, and training the FoveaBox neural network model after the set parameters are input into a training set image in an unmanned aerial vehicle database to obtain a deep-learning unmanned aerial vehicle detection model;

s2, inputting the unmanned aerial vehicle image to be detected into the trained unmanned aerial vehicle detection model in S1 to obtain a multilayer characteristic diagram;

s3, processing the output characteristic diagram of the backbone network through the position sub-network, executing frame prediction on each position of the overlay target, and generating a boundary frame corresponding to each target; and the output characteristic diagram of the backbone network is classified pixel by combining with the detection head sub-network, so that the confidence that the target belongs to the unmanned aerial vehicle is obtained, and the type and the position information of the target are directly detected and obtained.

S4, mapping each target boundary frame corresponding to the target feature map obtained in the S3 back to the original image, and then calculating the normalized offset between the projection coordinates and the real image by using a Smooth L1 loss function to obtain a plurality of target prediction frames;

and S5, deleting the non-optimal target frames in the plurality of target prediction frames obtained in the S4 to obtain the optimal target boundary frames, and storing and displaying the type and position information of the targets.

Further, the specific method for obtaining the multilayer characteristic diagram in S2 includes: and performing feature extraction on the unmanned aerial vehicle image through a model backbone network and a feature pyramid network to obtain a multilayer feature map.

Further, the specific method for extracting the features of the unmanned aerial vehicle image by the feature pyramid network comprises the following steps: the feature pyramid network obtains a multi-level pyramid feature set from a single-scale input by using a top-down architecture and a transverse connection mode, and the pyramid of each level is used for detecting prediction targets of different sizes.

Further, in S5, a specific use method of deleting the non-optimal target frame from the plurality of target prediction frames obtained in S4 to obtain the optimal target bounding box is a non-maximum suppression method.

Further, the specific method for training the FoveaBox neural network model in S1 is as follows:

s11, setting a FoveaBox parameter grid frame;

s12, preprocessing the input unmanned aerial vehicle image containing the position information, firstly, performing image rotation at a data enhancement rate of 1.5 times to realize data expansion, and then inputting the unmanned aerial vehicle image into a FoveaBox neural network for training;

and S13, processing the input image containing the target position information by the FoveaBox neural network to obtain a positive sample set and a negative sample set required by training, and performing model training to obtain the unmanned aerial vehicle target detection model.

Further, the specific method for performing the frame prediction on each position of the overlay target in step S3 is as follows: training a prediction branch by Focal local to predict Loss F_lThe solving formula of (2) is as follows:

F_l＝-α(1-p_f)^γlog(p_f), (4)

in the formula, p_fFor a score, i.e. the probability of predicting a certain class of objects, for a certain pixel, the balance factor γ is to pay more attention to the difficult, wrongly scored samples during the training process, whichThe value of gamma is 1.5; the balance factor α is used to balance the non-proportional difference between the positive and negative samples themselves, where α is 0.4.

Compared with the prior art, the unmanned aerial vehicle detection method based on the FoveaBox anchorless neural network has the following advantages:

(1) the unmanned aerial vehicle detection method based on the FoveaBox anchor-free neural network can automatically identify the unmanned aerial vehicle, determines the position information of the unmanned aerial vehicle for the identified unmanned aerial vehicle mark, is suitable for unmanned aerial vehicles of various types, brands and sizes, can be suitable for severe weather such as fog, rain, snow and the like, and has wide application range and strong universality.

(2) The unmanned aerial vehicle detection method based on the FoveaBox anchor-free neural network is high in detection precision, and can effectively avoid the situations of false detection and missed detection.

(3) The unmanned aerial vehicle detection method based on the FoveaBox anchor-free neural network is simple in model operation, avoids using the unmanned aerial vehicle after complex training, and is high in practicability.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate an embodiment of the invention and, together with the description, serve to explain the invention and not to limit the invention. In the drawings:

fig. 1 is a flowchart of a method for detecting an unmanned aerial vehicle based on a FoveaBox anchor-free neural network according to an embodiment of the present invention;

FIG. 2 is a flow chart of an overall target detection implementation of the FoveaBox;

FIG. 3 is a target detection framework of a prior FoveaBox network model;

fig. 4 is a target detection framework structure of an improved FoveaBox network model according to an embodiment of the present invention;

FIG. 5 is the original FoveaBox network model in the training phase;

FIG. 6 illustrates a FoveaBox network model during a training phase according to an embodiment of the present invention;

fig. 7 is a detection result diagram obtained by detecting an image of an unmanned aerial vehicle by using an existing FoveaBox network model;

FIG. 8 is a graph comparing loss values of models before and after improvement in accordance with the present invention;

fig. 9 is 6 original images of the unmanned aerial vehicle to be detected according to the embodiment of the invention;

fig. 10 is a diagram of the detection result of the existing FoveaBox network model for the 6 images of the unmanned aerial vehicle input in fig. 9;

fig. 11 is a diagram illustrating a detection result of the improved FoveaBox network model on the 6 images of the drone input in fig. 9 according to the embodiment of the present invention.

Detailed Description

It should be noted that the embodiments and features of the embodiments may be combined with each other without conflict.

In the description of the present invention, it is to be understood that the terms "center", "longitudinal", "lateral", "up", "down", "front", "back", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", and the like, indicate orientations or positional relationships based on those shown in the drawings, and are used only for convenience in describing the present invention and for simplicity in description, and do not indicate or imply that the referenced devices or elements must have a particular orientation, be constructed and operated in a particular orientation, and thus, are not to be construed as limiting the present invention. Furthermore, the terms "first", "second", etc. are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first," "second," etc. may explicitly or implicitly include one or more of that feature. In the description of the present invention, "a plurality" means two or more unless otherwise specified.

In the description of the present invention, it should be noted that, unless otherwise explicitly specified or limited, the terms "mounted," "connected," and "connected" are to be construed broadly, e.g., as meaning either a fixed connection, a removable connection, or an integral connection; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meaning of the above terms in the present invention can be understood by those of ordinary skill in the art through specific situations.

The present invention will be described in detail below with reference to the embodiments with reference to the attached drawings.

The noun explains:

FoveaBox: a method of deep neural target detection;

RetinaNet: a neural network system;

smooth L1: a loss function;

the method comprises the following steps that (1) the warmup is a learning rate optimization method, a small learning rate is selected at the beginning of model training, and after the model is trained for a period of time, the preset learning rate is used for training;

an Epoch: when the network is trained, traversing all images of the training set into an Epoch;

iteration: the number of training iteration steps;

focal local: a unique loss function.

As shown in fig. 1 and the drawings, the method for detecting an unmanned aerial vehicle based on a FoveaBox anchor-free neural network includes the following steps: s1, setting initial parameters of a FoveaBox neural network model, inputting images of the unmanned aerial vehicle in an unmanned aerial vehicle database into the set FoveaBox neural network model for training, and obtaining an unmanned aerial vehicle detection model for deep learning;

wherein the anchorless target detector FoveaBox (FoveaBox: beyond Anchor-Based Object Detection) is one of the best frames of target Detection performance Based on natural scene images, the Detection precision of the Anchor target detector in a natural scene COCO test data set reaches 35.1%, and the inference speed under the public parameters is 95 ms;

as shown in fig. 2, where II is the input image, CR is the central region, Cl is the drone classification, and BB is the bounding box prediction.

S2, inputting the unmanned aerial vehicle image to be detected into the unmanned aerial vehicle detection model trained in S1, extracting the characteristics of the unmanned aerial vehicle image through a model backbone network (namely RetinaNet backbone network) and a characteristic Pyramid network (FPN), and then obtaining a multilayer characteristic diagram;

the FoveaBox neural network model is a basic feedforward convolutional network structure RetinaNet backbone network, and is combined with a characteristic pyramid network to generate an image characteristic diagram, the position of a target on the characteristic diagram is obtained according to a classification sub-network and a position sub-network, and then the position information returned to the original image is corrected according to a Smooth L1 loss function, so that the final classification and accurate position information of the target is obtained. Initial parameters of the FoveaBox neural network model comprise the number of network layers, and weight values and bias values of neurons in all layers; the initial learning rate of the FoveaBox neural network model is 0.0025, a warp strategy is adopted in the initial 500-step iteration process, and the instability (oscillation) of the model is avoided, so that the convergence speed of the model is higher, and the model effect is better; decays at a rate of 0.1 times in the 8 th to 11 th epochs, with a maximum training Iteration of 35000.

According to the principle of deep convolution, the unmanned aerial vehicle pictures are input into the convolutional neural network, a plurality of layers of feature maps with different sizes can be output, and different feature pairs respectively play a role in detecting targets with different sizes. In general, the shallow layer convolutional layer features are more sensitive to edges, contain detailed information in images and are more beneficial to detecting small targets; and the deep convolutional layer features are more sensitive to complex features, contain more semantic information in the image and are more beneficial to detecting large targets. Therefore, if simultaneous object detection is performed on different feature maps, a better detection effect will be obtained.

As shown in fig. 3 and 4, fig. 3 is a structure of an object detection framework of a conventional FoveaBox network model, where Sub-Cl is a classification subnetwork, Sub-BB is a bounding box prediction subnetwork, and OI is an output result; fig. 4 is a target detection framework structure of the improved FoveaBox network model of the present invention, and CF is a convolution fusion operation. The method comprises the steps of extracting 7 layers of multi-level feature maps with different levels from input unmanned aerial vehicle images, wherein different information of the images contained in the different layers is extracted, the shallower feature map contains semantic information at the more detailed part of the image, and the deeper feature map contains overall feature information of the image.

The FoveaBox neural network model is trained S1, and a feature pyramid network is used as a backbone network for subsequent detection. The FPN uses a top-down architecture and a transverse connection mode to obtain a multi-level pyramid feature set from a single-scale input, and the pyramid of each level is used for detecting prediction targets with different sizes.

In the invention, a 7-level pyramid network structure is constructed, and the level number used for training target detection is { P }_lI is 3,4,5,6,7, i represents the number of levels of the feature pyramid, and the feature layer P_lWith input images

The resolution of (2). After each level of pyramid features passes through a classification sub-network, k channels are output, wherein k is the number of classes of a training set, the size is H multiplied by W, H is the height of a feature map, W is the width of the feature map, each channel is a binary mask, the probability that the pixel belongs to a certain class is indicated, and k is 1 in the invention. Different pyramid levels correspond to different sized targets, and each feature pyramid responds to a target of a particular size for controlling the scale range of each pyramid. Each level layer in the pyramid has a basic range z, level P corresponding to the original image₃To P₇Having a basic range z of 32X 32 to 512X 512, respectively, pyramid level P_lThe effective area a of the layer is calculated as:

A＝[z/η，z·η], (1)

where η is to control the scale range of each level in the pyramid feature. During training, target objects that are not within the corresponding size range are automatically ignored, and in addition, one object may be simultaneously detected through multiple network pyramids.

The method for training the FoveaBox neural network model comprises the following steps: 1) preprocessing an input unmanned aerial vehicle image containing position information, firstly performing image rotation at a data enhancement rate of 1.5 times to realize data expansion, and then inputting the image into a FoveaBox neural network for training; 2) the FoveaBox neural network processes the input image containing the position information of the target, and the position information of the target in the image is set as G(x₁,y₁,x₂,y₂) And mapping its coordinates to a target feature pyramid P with stride of 2_lThe coordinates are as in formula (2):

wherein (x)₁,y₁,x₂,y₂) Position coordinates of the upper left corner and the lower right corner of the target frame on the original drawing, respectively, (x'₁,y′₁,x′₂,y′₂) Location coordinates of the upper left corner and the lower right corner of the target frame on the feature map, respectively, (c'_x,c′_y) Is the corresponding target center position, (w ', h') is the width and height of the target box on the feature map, s_lIs the convolution factor in the convolution process. Positive sample region R on the scoring profile^pos(x″₁,y″₁,x″₂,y″₂) Is defined as a reduced version of the original region, expressed as equation (3):

σ here₁Is the shrinkage factor and the negative samples are all the target areas except the positive samples. The invention introduces a contraction factor sigma₂And σ₁＜σ₂< 1, adjusting the size of the truth box obtained after mapping again according to the formula (3), taking the area 3 between the boundary and the truth box as a negative sample, and passing through sigma₁And σ₂The area between the scaled two borders is ignored. FIG. 5 shows that in the training phase of the original FoveaBox network model, the network directly selects positive and negative samples under the background of a true value box, in the figure, the middle position 1 is a positive sample domain (P), and the outward extending position 3 is a negative sample domain (N); FIG. 6 shows that in the training phase of the FoveaBox network model of the present invention, the network directly selects positive and negative samples under the background of the true value box, in the figure, the most central position 1 is the positive sample field (P), the outward extending position 2 is the ignore field (I), and the outward extending position is the outward extending positionSet 3 is the negative sample field (N). In the training phase, as shown in fig. 6, each pixel in the positive region is labeled with a corresponding target class label, and the negative sample region refers to the feature map region corresponding to the entire target except the positive sample. After the operation, an unmanned aerial vehicle target detection model is obtained.

S3, carrying out pixel-by-pixel classification on the output feature graph of the backbone through the detection head sub-network to obtain a classification sensitive semantic graph for predicting the existence possibility of the object, thereby obtaining the confidence that the object belongs to a certain class, and generating a bounding box corresponding to each object for each position possibly containing the object by combining the position sub-network, thereby directly detecting and obtaining the type of the object and the position information on the feature graph;

in the FoveaBox neural network model, in the prediction stage, the positive sample area only occupies a small part of the whole characteristic diagram, the negative sample area occupies a larger proportion, and in order to solve the problem that the positive and negative samples are not uniform in the training process, namely to obtain a better result in the prediction, the Focal local is adopted to train the prediction branch, and F is lost_lIs expressed by equation (4):

F_l＝-α(1-p_f)^γlog(p_f), (4)

in the formula (4), p_fFor the score of a certain pixel being predicted as a certain class of target, i.e. the probability of being predicted as a certain class, the balance factor γ is to pay more attention to the difficult, wrongly-scored sample in the training process, and the value of γ in the present invention is 1.5; the balance factor α is used to balance the ratio unevenness of the positive and negative samples themselves, and in the present invention, α has a value of 0.4.

In the prediction stage of the FoveaBox neural network model, the classification sub-network predicts the probability that each pixel on the feature map belongs to the target, and meanwhile, the classification sub-network generates a corresponding target boundary frame according to the target position obtained by the position sub-network.

And S4, mapping each target boundary box corresponding to the target feature map obtained in the S3 back to the original image, then calculating the normalized offset between the projection coordinates and the real image by using a Smooth L1 loss function to obtain the optimal target boundary box, and storing and displaying the type and position information of the target.

Each real bounding box G (x)₁,y₁,x₂,y₂) Converted from a positive sample R in this claim^posStarting with a single pixel point (x, y), calculating the normalized offset between (x, y) and the four boundaries as shown in formula (5):

where z is a spatial normalization factor that projects the output space to 1 as the center, enabling stable learning of the target, the function first maps the coordinates (x, y) to the input image, then calculates the normalized offset between the projected coordinates and G, and finally normalizes the target using a log-space function. In the invention, the obtained predicted boundary frame network corrects the position information returned to the original image through a Smooth L1 loss function, and a frame loss function S_lThe calculation formula is shown in formula (6).

In the formula p_bRepresents the probability of regression being correct after a certain target position is mapped to the original image, beta is an influence range factor, and when | p_bWhen the | is less than the beta, the Loss function is calculated by using the mean square error (L2 Loss), so that the convergence is faster; otherwise, the average absolute error (L1 Loss) is used for calculating a Loss function, so that the Loss value is insensitive to outliers and abnormal values, the gradient change is relatively small, the model is not easy to deviate from the optimal model in the training stage, and a plurality of target boundary boxes with small deviation are obtained.

And S5, deleting the Non-optimal target frames in the plurality of target prediction frames obtained in the step four by using a Non-Maximum Suppression (NMS) method to obtain the optimal target boundary frame, and storing and displaying the type and position information of the target.

The unmanned aerial vehicle database covers unmanned aerial vehicle images under various living scenes.

The database used in the invention is unmanned aerial vehicle images with different formats and sizes collected from the internet, the shortest side of the image is 120 pixels, the longest side of the image is 1200 pixels, as shown in fig. 7, and fig. 7 is a detection result diagram obtained by detecting the unmanned aerial vehicle images by using the existing FoveaBox network model.

The results of the present application are shown in fig. 8, fig. 9 and fig. 10, in which fig. 8 is a graph comparing loss values of models before and after improvement of the present invention, in which "original model" is a graph of loss values of an existing FoveaBox network model, and "existing model" is a graph of loss values of the improved FoveaBox network model of the present invention. Fig. 9 is 6 original images of the unmanned aerial vehicle to be detected, where the first line is an unmanned aerial vehicle picture in good weather, and the second line is an unmanned aerial vehicle picture in simulated heavy fog weather. Fig. 10 is a diagram of the detection result of the existing FoveaBox network model on the 6 drone images input in fig. 9. Fig. 11 is a diagram showing the detection results of the improved FoveaBox network model of the present invention on the 6 drone images input in fig. 9.

The method comprises the steps of firstly setting initial parameters of a FoveaBox neural network model, inputting training set images in an unmanned aerial vehicle database into the set FoveaBox neural network model for training, and obtaining an unmanned aerial vehicle detection model based on deep learning; inputting an unmanned aerial vehicle image to be detected into the unmanned aerial vehicle detection model trained in the step one, and performing feature extraction on the unmanned aerial vehicle image through a model backbone network and a feature pyramid network to obtain a feature map of the possibility of the multilayer predicted target; processing the output characteristic diagram of the backbone network through a position sub-network, performing frame prediction on each position possibly covering the target, and generating a boundary frame corresponding to each target; and the output characteristic diagram of the backbone network is classified pixel by combining with the detection head sub-network, so that the confidence that the target belongs to the unmanned aerial vehicle is obtained, and the type and the position information of the target are directly detected and obtained. The unmanned aerial vehicle detection system has good adaptability and detection performance for various types of unmanned aerial vehicles, has good detection performance in severe weather such as fog, rain, snow and the like, and enlarges the application range; the detection speed is high, and the problem of low manual detection efficiency in a specific occasion is effectively solved; the model is simple to operate and easy to train.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. An unmanned aerial vehicle detection method based on a FoveaBox anchor-free neural network is characterized by comprising the following steps: the method comprises the following steps:

2. The method for detecting the unmanned aerial vehicle based on the FoveaBox anchorless neural network as claimed in claim 1, wherein: the specific method for obtaining the multilayer characteristic diagram in the step S2 is as follows: and performing feature extraction on the unmanned aerial vehicle image through a model backbone network and a feature pyramid network to obtain a multilayer feature map.

3. The method for detecting the unmanned aerial vehicle based on the FoveaBox anchorless neural network as claimed in claim 2, wherein: the specific method for extracting the features of the unmanned aerial vehicle image by the feature pyramid network comprises the following steps: the feature pyramid network obtains a multi-level pyramid feature set from a single-scale input by using a top-down architecture and a transverse connection mode, and the pyramid of each level is used for detecting prediction targets of different sizes.

4. The method for detecting the unmanned aerial vehicle based on the FoveaBox anchorless neural network as claimed in claim 1, wherein: in S5, the specific use method of deleting the non-optimal target frame from the plurality of target prediction frames obtained in S4 to obtain the optimal target bounding box is a non-maximum suppression method.

5. The method for detecting the unmanned aerial vehicle based on the FoveaBox anchorless neural network as claimed in claim 1, wherein: the specific method for training the FoveaBox neural network model in the S1 is as follows:

s11, setting a FoveaBox parameter grid frame;

6. The method for detecting the unmanned aerial vehicle based on the FoveaBox anchorless neural network as claimed in claim 1, wherein: the specific method for performing the frame prediction on each position of the coverage target in step S3 is as follows: training a prediction branch by Focal local to predict Loss F_lThe solving formula of (2) is as follows:

F_l＝-α(1-p_f)^γlog(p_f), (4)

in the formula, p_fFor the score of a pixel being predicted as a class of objects, i.e. the probability of being predicted as a class, the balance factor γ is to pay more attention to the difficult, wrongly scored samples during the training process, where γ has a value of 1.5; the balance factor α is used to balance the non-proportional difference between the positive and negative samples themselves, where α is 0.4.