CN115205264A

CN115205264A - High-resolution remote sensing ship detection method based on improved YOLOv4

Info

Publication number: CN115205264A
Application number: CN202210860051.6A
Authority: CN
Inventors: 许鑫; 陈巍; 陈伟; 贺晨煜
Original assignee: Nanjing Institute of Technology
Current assignee: Nanjing Institute of Technology
Priority date: 2022-07-21
Filing date: 2022-07-21
Publication date: 2022-10-18

Abstract

A high-resolution remote sensing ship detection method based on improved YOLOv4 comprises the steps of collecting an original data set and establishing a target data set; adding a void space pyramid pooling module and a CPA module into an improved YOLOV4 network; in combination with the existing YOLOV4 method, an SPP module is replaced by an ASPP module, and the effect of detecting small targets is improved by adding multi-scale feature fusion; a CPA module is added into the improved feature fusion network to improve feature extraction effectiveness; in the stage of calculating the loss function, the XIoU loss function is used for replacing the CIoU loss function, and the problems of poor positioning precision and long training time of ships with large length-width ratio in network training are solved. The detection precision of the method is tested and verified by combining a large number of images collected by actual aerial photography, and experiments show that the method can realize automatic real-time detection of sea surface ships, improves the detection precision and efficiency, and has better detection effect compared with the traditional target detection technology.

Description

High-resolution remote sensing ship detection method based on improved YOLOv4

Technical Field

The invention relates to the technical field of ship detection, in particular to a high-resolution remote sensing ship detection method based on improved YOLOv 4.

Background

The optical remote sensing image has the characteristics of high spatial resolution, rich image content, small geometric deformation and the like, and has important value in many research fields (such as resource investigation, military exploration, ocean research and the like). The ship target detection based on the high-resolution remote sensing image is a hotspot and difficult problem in the field of machine vision, the main task is to judge the specific coordinate position of the ship target in the optical remote sensing image, and the method has great significance in civil, commercial, military and other aspects, not only can make an excellent contribution to sea supervision in coastal areas, but also can influence the national economy and territorial safety. Therefore, it is of great significance to research how to rapidly and accurately detect the ship target.

With the continuous development of remote sensing technology, geographic information system and other technologies in recent years, the traditional ship detection task completed by a manual method cannot meet the task requirements. The ship detection methods reported or put into use in the prior art can be roughly classified into two types: a traditional image processing method and a target detection algorithm based on deep learning. The former is mostly focused on filtering, feature extraction, threshold segmentation, edge detection, and the like. However, these methods are only suitable for application scenes with simple weather conditions and calm sea surface, and are susceptible to the influence of illumination intensity and sea surface light spots in complex scenes, so that the detection accuracy is reduced. Therefore, the application scenes of the methods are single and the methods have no universality. The convolutional neural network is represented by a convolutional neural network, the convolutional neural network can learn richer semantic information and high-level image feature representation through massive data training, the robustness is higher, and the detection effect is more efficient. Many researchers have achieved significant performance by applying deep learning techniques to marine vessel target detection. At present, network models adopted for ship detection tasks are mainly divided into Two categories, one category is a region-based Two-stage algorithm, and a target candidate frame needs to be generated firstly, and then classification and regression are carried out on the candidate frame. The other type is an One-stage algorithm which utilizes a regression idea and can directly predict the categories and positions of different targets. However, the traditional networks have shallow layers, cannot sufficiently extract the textural features of the ship target in the high-resolution remote sensing image, and have poor detection effect on the small ship target. And because the ship has the characteristic of large length-width ratio, the position loss function in the traditional deep learning can not accurately obtain the positioning information, so that the positioning deviation phenomenon exists.

Therefore, it is urgently needed to provide a method for improving the accuracy and robustness of the ship target detection of the high-resolution remote sensing image.

Disclosure of Invention

Aiming at the defects in the prior art, the invention provides a high-resolution remote sensing ship detection method based on improved YOLOv 4; the method can quickly and accurately realize the detection and identification of the ship target of the high-resolution remote sensing image.

In order to achieve the purpose, the invention adopts the following technical scheme:

a high-resolution remote sensing ship detection method based on improved YOLOv4 comprises the following steps:

s1: acquiring an original data set of a sea surface ship unmanned aerial vehicle aerial image, and establishing a target data set comprising a training set and a verification set;

s2: constructing an improved YOLOv4 network;

s3: designing an optimization loss function;

s4: training and verifying the improved YOLOv4 network in the step S2 by combining the optimization loss function designed in the step S3 and the training set and the verification set in the step S1; and then the ship target is detected and identified by the remote sensing image.

In order to optimize the technical scheme, the specific measures adopted further comprise:

further, the specific content of step S1 is:

s1.1: acquiring ship information on the sea surface by a sea surface ship unmanned aerial vehicle to obtain an original data set;

s1.2: marking a ship target in the original data set to obtain a target picture;

s1.3: preprocessing a target picture, comprising: cutting and zooming the target picture, wherein the zooming method specifically comprises the following steps: setting the maximum side length size as x, and the width and height of the current target picture as w and h respectively, then the width wnew = w min (x/w, x/h) of the zoomed target picture, and the height hnew = h min (x/w, x/h) of the zoomed target picture;

s1.4: establishing a target data set according to the cut and zoomed target picture, and dividing the target data set into a training set and a verification set; wherein the number ratio of the training set to the validation set is 4.

Further, the specific content of step S2 is:

s2.1: designing CPA modules

The CPA module comprises a channel attention module and a pixel attention module; and the calculation formula of the CPA module is as follows:

in the formula, M _c (P) is the feature weight of the channel attention module output, P is the input feature map, P' is the feature map of the channel attention module output, M _s (P ') is the feature weight of the pixel attention module output, and P' is the feature map of the CPA module output;

the design content of the channel attention module is as follows: for an input feature map P with the size of H, W, C, firstly carrying out global average pooling to obtain a feature map with 1, 1 and C so as to obtain global semantic information, realizing free lifting dimension of a channel by convolution of 2 1 and 1 so as to obtain depth channel feature information, obtaining the weight of channel features, and finally obtaining a new weighted feature map P' by multiplying the weight and feature values;

the pixel attention module comprises the following design contents: respectively using 3 convolution layers of 1 × 1 to generate a matrix Query, a Key and a Value for a new weighted feature map P 'obtained after the processing of the channel attention module, then performing Dot product Dot operation and Softmax function calculation on the matrix Query and the Key to obtain a weight with the size of (H × W), and performing Dot product Dot operation on the weight and the matrix Value to obtain a pixel-concerned weighted feature map P';

s2.2: designing ASPP modules

The ASPP module comprises three 3 × 3 cavity convolution layers with sampling rates of 6, 12 and 18 respectively, a 1 × 1 convolution layer and a global average pooling layer; generating 5 multi-scale features, splicing and fusing the 5 multi-scale features on channel dimensions, and finally adjusting the number of channels by using a 1 × 1 convolution layer;

s2.3: obtaining an improved YOLOv4 network structure

Based on the existing YOLOv4 network structure, a CPA module designed in the step S2.1 is added at the lower sampling front part, the CPA module is used for redistributing the channel characteristic weight and the pixel characteristic weight of the characteristic diagram, the detection precision is improved by increasing the weight of the interested characteristic region, and the background interference is reduced by reducing the weight of the irrelevant characteristic region; meanwhile, on the basis of the existing YOLOv4 network structure, an SPP module in the network is replaced by an ASPP module designed in the step S2.2; thereby obtaining an improved yollov 4 network structure.

Further, the specific content of step S3 is: replacing the CIoU loss function of the existing YOLOv4 network structure with the optimized XIoU loss function; wherein the optimized XIoU loss function L _XIoU The method specifically comprises the following steps:

L _XIoU ＝1-IoU+dIoU

in the formula, ioU is expressed as: ioU = | BBox _t ∩BBox _p |/|BBox _t ∪BBox _p In which BBox _t Being the real frame of the target picture, BBox _p The prediction frame of the target picture is taken as the IoU, and the ratio of the union set and the intersection set of the real frame and the prediction frame is taken as the IoU; AA ', BB', CC ', DD' are respectively represented as: the distances between the real frame of the target picture and the upper left corner point, the upper right corner point, the lower left corner and the lower right corner of the prediction frame are calculated, and c represents the slant length of the outer rectangle of the real frame and the prediction frame of the target picture; the dIoU is expressed as: with 4 corner points, data frames of the same shape and with the same aspect ratio can be fitted in the training.

The invention has the beneficial effects that:

1. when sea surface ship detection is carried out, the proportion of the backgrounds of the ocean, the port and the like in the whole image is large, and the length-width ratio of the ship is large. In order to suppress the interference of the background and effectively acquire the information of the foreground, a CPA module is embedded in the Neck to improve the visual representation; the CPA module designed by the application can redistribute the channel characteristic weight and the pixel characteristic weight of the characteristic diagram, improve the detection precision by increasing the weight of the interested characteristic region, reduce the interference of the background by reducing the weight of the irrelevant characteristic region, effectively selectively extract the characteristics according to the aspect ratio of the object, and finely distribute and process the information.

2. Because the ship target is small in the data set image, in order to improve the detection capability of the small target and enhance the semantic relation between local information and global information, a cavity space Pyramid Pooling (ASPP) is introduced to replace an original SPP module; compared with SPP, the ASPP integrates more scale features, and effectively improves the detection capability of small targets.

3. The method optimizes the loss function of YOLOv4, adopts the XIoU loss function to replace the CIoU loss function, can effectively reduce the calculated amount, and greatly improves the positioning precision of the ship target with large length-width ratio.

Drawings

Fig. 1 is a schematic diagram of the existing YOLOv4 network structure of the present invention.

Fig. 2 is a schematic structural diagram of a CPA module designed by the present invention.

FIG. 3 is a schematic diagram of a channel attention module structure designed by the present invention.

FIG. 4 is a schematic diagram of a pixel attention module according to the present invention.

Fig. 5 is a schematic structural diagram of an ASPP module designed by the present invention.

Fig. 6 is a schematic diagram of the improved YOLOv4 network structure of the present invention.

FIG. 7 is a schematic diagram of the CDIoU and CIoU loss functions of the present invention.

Fig. 8 is a schematic diagram comparing the detection effect of the traditional YOLOv4 network and the improved YOLOv4 network of the present application.

Detailed Description

The present invention will now be described in further detail with reference to the accompanying drawings.

The YOLO series algorithm is One of typical One-stage algorithms, and is the most widely applied algorithm at present. Redmon et al in 2016 proposed a YOLOV1 algorithm, which proposed combining the generated candidate boxes and classification regression into a network, greatly reduced the complexity of network computation, but had poor target localization accuracy. After a plurality of iterations, the method is continuously improved and the defects are made up, and the high-precision target can be realized on the premise of keeping the speed advantage by the current YOLOv 4. The YOLOv4 network structure is mainly composed of 4 parts, which are an input terminal, a Backbone network, a Neck and a Head, respectively. The specific network structure is shown in fig. 1.

As shown in fig. 1, YOLOv4 performs feature extraction on input data using a new backbone network CSPDarknet 53. The CSPDarknet53 is a modified version of Darknet53, integrating a Cross Stage Partial Network (CSPNet). The CSP module can reduce the calculation amount while maintaining the precision, and realizes the perfect combination of speed and precision. And in the target detection stage, a hack structure is usually inserted after the Backbone for better feature extraction. In YOLOv4, the Neck moiety includes both SPP and FPN + PAN structures. The SPP module can enable the input size of the convolutional neural network to be unlimited and perform multi-scale feature fusion. FPN + PAN Structure pairs FPN (feature pyramid) and PAN ^【18】 The two structures are fused, top-downFPN can extract abundant semantic feature, and PAN from bottom to top can make full use of shallow layer feature and extract abundant location feature. The Neck can further improve the feature extraction capability, perform feature fusion and improve the detection precision. And finally, outputting three feature graphs with different scales by YOLOv4 to detect the targets with different sizes, then performing non-maximum suppression (NMS), and adjusting the prior frame to obtain a final result.

The application improves the YOLOv4 algorithm

1. Network architecture optimization

1.1 CPA Module

When sea surface ship detection is carried out, the proportion of the backgrounds of the ocean, the port and the like in the whole image is large, and the length-width ratio of the ship is large. In order to suppress the interference of the background and effectively acquire the information of the foreground, a CPA module is embedded in the tack to improve the visual representation. As shown in fig. 2, the CPA module is a lightweight and efficient Attention module, which is mainly composed of two sub-modules, a Channel-wise Attention module and a Pixel-wise Attention module, and the mathematical expression of the modules is shown in formula 1. Wherein

Representing multiplication of elements, P being the input profile, M _c (P) is the feature weight of the channel attention module output, P' is the feature map of the channel attention module output, M _s (P ') is the feature weight of the pixel attention module output, and P' is the feature map of the CPA output.

The input feature map firstly passes through a channel attention module, the specific structure of which is shown in fig. 3, and assuming that the size of the input feature map P is H × W × C, firstly, global Average Pooling (GAP) is performed on P to obtain a feature map of 1 × C, so as to obtain global semantic information, and 2 1 reel machines are used to realize the free lifting dimension of a channel so as to obtain depth channel feature information (scaling ratio k = 8), so as to obtain the weight of the channel feature. Finally, a new weighted feature map P' is obtained by multiplying the weight and the feature value.

The specific structure of the pixel Attention module is shown in fig. 4, and the Dot-product Attention module (Dot-product Attention) is applied to P' in the present invention. The present invention generates matrices Query, key, and Value (simplified to Q, K, and V) using 3 convolution layers of 1 × 1, respectively, and the number of channels of the matrices Q and K becomes 1/4 of the original number (scaling ratio K = 4). Then, a Dot product (Dot) operation and a Softmax function calculation are performed on Q and K, and a weight with a size of (H × W) × (H × W) is obtained. And obtaining a weighted feature map P' concerned by the pixel through the dot product of the weight and the V, wherein the specific process is shown in formula 2.

P _attention (Q,K,V)＝softmax(Q·K ^T )·V (2)

The CPA module can redistribute the channel characteristic weight and the pixel characteristic weight of the characteristic diagram, improve the detection precision by increasing the weight of the interested characteristic region, reduce the interference of the background by reducing the weight of the irrelevant characteristic region, effectively selectively extract the characteristics according to the length-width ratio of the object, and finely distribute and process the information.

1.2 void space pyramid pooling (ASPP) Module

Since the ship target is small in the data set image, in order to improve the detection capability of the small target and enhance the semantic relationship between the local information and the global information, a void space Pyramid Pooling (ASPP) is introduced to replace the original SPP module, so as to aggregate the multi-scale context information, and the specific structure of the multi-scale context information is shown in fig. 5. In YOLOv4, SPPNet mainly utilizes 3 different scale pooling layers for 3-scale feature extraction and feature fusion. And the ASPP generates 5 multi-scale features by introducing 3 × 3 hole convolution layers with sampling rates of (6, 12, 18), 1 × 1 convolution layer and a global average pooling layer, then splices and fuses the 5 multi-scale features in the channel dimension, and finally adjusts the number of channels by using the 1 × 1 convolution layer. Compared with SPP, the ASPP integrates more scale features, and effectively improves the detection capability of small targets.

2. Improved network structure

The invention adds two CPA modules and replaces SPP module with ASPP module, the improved network structure diagram is shown in figure 6.

3. Loss function optimization

In order to improve the detection accuracy, the loss function is improved. The loss function of Yolov4 consists of 3 parts, each of which is a positional regression loss function (loss) _loc ) Loss of confidence function (loss) of object _obj ) Loss function (loss) of object classification _cls ) Specifically, it is shown in formula 3.

Loss＝loss _loc +loss _obj +loss _cls (3)

Among the position regression loss functions, YOLOv4 employs the CIoU loss function, which is specifically shown in the following

formulas

4, 5, 6, and 7.

Wherein S _P To predict the area of the box, S _T D and c represent the central point distance between the prediction frame and the real frame, the oblique length of the wrapping rectangle, and w ^T 、h ^T Width and height of the real frame, w ^P 、h ^P The width and height of the box are predicted.

Because the CIoU loss function relates to the inverse trigonometric function, the calculation amount is large, and the training time of the network is prolonged. Since the detected ship has the particularity of the aspect ratio, the aspect ratio of the detected object needs to be considered, and therefore the XIoU loss function is used instead of the CIoU loss function, which is specifically shown in the following formula.

L _XIoU ＝1-IoU+dIoU (8)

Wherein AA ', BB', CC 'and DD' are distances between the real frame and the upper left corner point, the upper right corner point, the lower left corner and the lower right corner of the prediction frame. Although the XIoU loss function does not take into account aspect ratio and center point distance as does the CIoU loss function, during the training process the model will slowly trend the 4 corner points of the prediction box towards the 4 corner points of the real box until they overlap.

The following description will be made by way of specific examples

The invention relates to a high-resolution remote sensing image ship detection method based on improved YOLOv4, which specifically comprises the following steps:

step 1, collecting an original data set of an unmanned aerial image of a sea surface ship, and establishing a target data set.

The original data set of the invention is from actual sea surface vessel unmanned aerial vehicle aerial images, which are 1800 sheets in total, wherein the training set 1440 is a verification set 360.

And marking the ship target in the original data set to obtain a target picture, covering the target area with the marking frame as much as possible during marking, and simultaneously reducing other background information in the marking frame.

In consideration of IO bottleneck in data training, the target picture is subjected to certain preprocessing, and clipping and scaling are performed on the basis of the target picture so as to reduce the size of an input picture in data training.

For scaling, assuming that the set maximum side length is l, and the width and height of the original picture are w and h, the width and height of the scaled target picture wnew and hnew are:

wnew＝w*min(l/w,l/h)

hnew＝h*min(l/w,l/h)

and establishing a target data set according to the cut and zoomed target picture, wherein the target data set is divided into a training set and a verification set.

Step 2, constructing an improved YOLOv4 network, comprising the following steps;

step 2.1, designing a CPA module, wherein the CPA module structure is shown in figure 2;

the CPA module consists of two sub-modules, a Channel-wise Attention module and a Pixel-wise Attention module. The channel attention module employs a SENET module, which contains one global averaging pooling layer and two 1 x 1 convolutional layers. The pixel attention module adopts a Dot product attention module, 3 convolution layers of 1 × 1 are respectively used for the feature map to generate matrixes Query, key and Value (simplified to Q, K and V), then Dot product (Dot) operation and Softmax function calculation are carried out on Q and K, and the obtained result and V are subjected to Dot product operation.

Step 2.2, adding an ASPP module, wherein the structure of the ASPP module is shown in figure 5;

the ASPP module generates 5 multi-scale features by introducing 3 sampling rates of 3 × 3 hole convolution layers, 1 × 1 convolution layer and a global average pooling layer which are respectively (6, 12 and 18), then splicing and fusing the 5 multi-scale features on channel dimensions, and finally adjusting the number of channels by using the 1 × 1 convolution layer, so that the receptive field can be effectively expanded, and the multi-scale structure can effectively improve the new detection energy of small targets.

And 2.3, acquiring an improved YOLOv4 network structure.

Based on the improvement of the first two steps, the invention proposes an improved YOLOv4 network model, as shown in fig. 6.

According to the invention, the CPA module is added at the front part of the down sampling in the feature fusion network, the CPA module is used for redistributing the channel feature weight and the pixel feature weight of the feature graph, the detection precision is improved by increasing the weight of the interested feature area, and the interference of the background is reduced by reducing the weight of the irrelevant feature area, so that the feature is effectively selectively extracted according to the aspect ratio of the object, and the information is finely distributed and processed. According to the invention, the SPP module is replaced by the ASPP module, so that the reception field is effectively enlarged, and multi-scale context information is aggregated to improve the detection performance of the small target ship.

Step 3, optimizing a loss function;

the method optimizes the loss function of YOLOv4, adopts the optimized XIoU loss function to replace the CIoU loss function, can effectively reduce the calculated amount, and greatly improves the positioning accuracy of the ship target with large length-width ratio.

And 4, in order to verify the effectiveness and the practicability of the target detection method in the technical scheme of the invention, a test set is adopted to test the target detection performance of the ship detection training network model, and the target detection performance is compared with other technologies applied to ship detection at the present stage.

The specific test contents are as follows:

the experimental environment is as follows: the system of Ubuntu18.04, the video card NVIDIA GeForce RTX2080Ti (the video memory is 11 GB), and the deep learning framework Pythrch;

the improved YOLOv4 model is trained on 300 iterations on an experimental data set. The XIoU loss function is shown in FIG. 7. As can be seen from fig. 7, the loss value gradually decays with increasing number of iterations and gradually stabilizes, eventually dropping to 0.9289%. Compared with the original CIoU loss function, the improvement of 1.46 percent is obtained.

The evaluation index uses the mAP (Mean Average Precision) and can reflect the overall level of Precision and accuracy. The improved yollov 4 mean accuracy (mAP) can reach 94.08%, and the overall effect of training is ideal.

The invention brings the improved modules in the network into yoolov 4 respectively for experiments, and as shown in table 1, we can clearly see the improved results of different improved modules for the network.

TABLE 1 mAP of improved modules in YOLOv4

The present invention uses five detection networks of fast RCNN, SSD, YOLOv3, YOLOv4 and the improved YOLOv4 proposed herein to perform comparative experiments on data sets, and the specific results are shown in table 2. It is analyzed that since the improved yollov 4 introduces 2 CPA modules and 1 ASPP module, the parameter quantity of the model is the largest in the four algorithms, and the detection speed is lower than yollov 3 and yollov 4, but the detection accuracy is the highest. And comprehensively evaluating, wherein the improved YOLOv4 has more ideal detection effect on the sea surface ship.

TABLE 2 network results comparison chart

In order to verify the effects of the network model before and after optimization, the original YOLOv4 and the improved YOLOv4 network models are used to perform a control experiment on the same data set, and the specific detection effect is shown in fig. 8.

It can be seen from the figure that, compared with the original YOLOv4, since the CPA module is embedded in the Neck by the improved YOLOv4 and the loss function is optimized, the confidence coefficient in the test image is improved to a certain extent. Because SPPNet is replaced by ASPPNet in the improved network model, the detection effect of the improved model on small objects is greatly improved. In the test image, the improved YOLOv4 can detect a small target which cannot be detected by the original network model.

The YOLOv4 algorithm is used as a mainstream algorithm in the field of current optical image target detection, and has the characteristics of high detection speed and high detection precision. Compared with the original YOLOV4 model, the improved training process of the YOLOV4 model is more convergent, 1.46% of improvement is achieved in a loss function, the complex calculated amount is effectively reduced, and the training speed is improved; and the target detection task is improved by 4.84% compared with the mAp0.5 of the original Yolov4 model. Therefore, the method has excellent detection effect on detecting the small-scale ship target and is suitable for being deployed on mobile equipment such as unmanned aerial vehicles.

It should be noted that the terms "upper", "lower", "left", "right", "front", "back", etc. used in the present invention are for clarity of description only, and are not intended to limit the scope of the present invention, and the relative relationship between the terms and the terms is not limited by the technical contents of the essential changes.

The above is only a preferred embodiment of the present invention, and the protection scope of the present invention is not limited to the above-mentioned embodiments, and all technical solutions belonging to the idea of the present invention belong to the protection scope of the present invention. It should be noted that modifications and adaptations to those skilled in the art without departing from the principles of the present invention may be apparent to those skilled in the relevant art and are intended to be within the scope of the present invention.

Claims

1. A high-resolution remote sensing ship detection method based on improved YOLOv4 is characterized by comprising the following steps:

s1: acquiring an original data set of an unmanned aerial image of a sea surface ship, and establishing a target data set comprising a training set and a verification set;

s2: constructing an improved YOLOv4 network;

s3: designing an optimization loss function;

s4: training and verifying the improved YOLOv4 network in the step S2 by combining the optimization loss function designed in the step S3 and the training set and the verification set in the step S1; and then the remote sensing image ship target is detected and identified.

2. The improved YOLOv 4-based high-resolution remote sensing ship detection method according to claim 1, wherein the specific content of the step S1 is as follows:

s1.1: acquiring ship information of the sea surface through a sea surface ship unmanned aerial vehicle to obtain an original data set;

3. The improved YOLOv 4-based high-resolution remote sensing ship detection method according to claim 1, wherein the specific content of the step S2 is as follows:

s2.1: designing CPA modules

s2.2: designing ASPP modules

The ASPP module comprises three 3 × 3 hole convolution layers with sampling rates of 6, 12 and 18 respectively, a 1 × 1 convolution layer and a global average pooling layer; generating 5 multi-scale features, splicing and fusing the 5 multi-scale features on the channel dimension, and finally adjusting the number of channels by using a 1 × 1 convolution layer;

s2.3: obtaining an improved YOLOv4 network structure

4. The improved YOLOv 4-based high-resolution remote sensing ship detection method according to claim 1, wherein the specific content of the step S3 is as follows: replacing the CIoU loss function of the existing YOLOv4 network structure with the optimized XIoU loss function; wherein the optimized XIoU loss function L _XIoU The method specifically comprises the following steps:

L _XIoU ＝1-IoU+dIoU

in the formula, ioU is expressed as: ioU = | BBox _t ∩BBox _p |/|BBox _t ∪BBox _p I, wherein BBox _t Being the real frame of the target picture, BBox _p The IoU is a ratio of a union set and an intersection set of a real frame and a prediction frame; AA ', BB', CC ', DD' are respectively represented as: the distances between the real frame of the target picture and the upper left corner point, the upper right corner point, the lower left corner and the lower right corner of the prediction frame are calculated, and c represents the slant length of the outer rectangle of the real frame and the prediction frame of the target picture; the dIoU is expressed as: with 4 corner points, data frames of the same shape and with the same aspect ratio can be fitted in the training.