CN111881918A

CN111881918A - Multi-scale rotating ship target detection algorithm

Info

Publication number: CN111881918A
Application number: CN202010528579.4A
Authority: CN
Inventors: 刘建辉; 江刚武; 王鑫; 张锐; 徐佰祺; 谭熊
Original assignee: Information Engineering University of PLA Strategic Support Force
Current assignee: Information Engineering University of PLA Strategic Support Force
Priority date: 2020-06-11
Filing date: 2020-06-11
Publication date: 2020-11-03
Anticipated expiration: 2040-06-11
Also published as: CN111881918B

Abstract

The invention provides a multi-scale rotating ship target detection algorithm. The algorithm comprises the following steps: acquiring a multi-scale feature map of an input image, wherein the multi-scale feature map comprises a first class scale feature map, a second class scale feature map and a third class scale feature map with sequentially increasing scales; performing feature fusion on the multi-scale feature map by adopting a feature pyramid network, wherein the feature pyramid network adopts a ResNet residual error network as a basic framework; inputting a feature map output by a feature pyramid network into a regional suggestion network through a 3 x 3 convolutional layer, classifying each anchor frame by adopting the regional suggestion network according to a set classification judgment condition, giving a parameter coordinate to each anchor frame, and obtaining a rotating boundary frame based on parameter coordinate regression; performing self-adaptive interest region alignment on a rotating bounding box generated by the regional suggestion network to obtain a high-quality feature map; and screening the candidate frames of the high-quality feature map according to the set rotation non-maximum value inhibition constraint condition to obtain a detection target.

Description

Multi-scale rotating ship target detection algorithm

Technical Field

The invention relates to the technical field of ship detection, in particular to a multi-scale rotating ship target detection algorithm.

Background

With the development of remote sensing technology, the acquisition of high-resolution remote sensing images becomes easier. Automatic detection of ships has long played an important role in the field of remote sensing, and has made great progress in port management, cargo transportation, rescue at sea, and the like. Meanwhile, berthing and navigation direction information of the ship has great significance. However, the large aspect ratio feature makes detection of the vessel more difficult than other objects (e.g., vehicles, buildings, aircraft, etc.).

In recent years, deep learning has been highly successful in the field of computer vision. The scheme based on the Regional Convolutional Neural Network (RCNN) provides a good approach for target detection, the detection result is far better than that of the traditional detection method, but the RCNN has some obvious defects in the calculation speed and the storage space. The Fast-RCNN significantly improves the detection efficiency through shared computation, effectively reduces the storage space, and adopts a Region pro-nodal network (RPN) to replace Selective Search (Selective Search) to realize end-to-end training, thereby improving the detection efficiency and accuracy. With the application of the depth CNN in target detection, the ship detection algorithm based on the deep learning is widely applied to remote sensing ship detection. Kang M et al (M.kang, X.Leng, Z.Lin, and K, Ji, "A modified Faster R-CNN based on CFAR algorithm for SAR ship detection," International work shop Remote Sensing with Intelligent Processing IEEE, pp.1-4,2017) use the target suggestion generated by Faster R-CNN as the protection window of CFAR algorithm, and then pick up small targets, thereby re-evaluating the bounding box with relatively low classification score in the detection network. Zhang R et al (r.zhang, j.yao, k.zhang, c.feng, and j.zhang, "S-CNN ship detection from high-resolution removal Sensing images," isps-international archives of the photo metric, removal Sensing and spatialinformation Sciences, vol.xli-B7, pp.423-430,2016) proposed a new CNN-based ship detection model, called SCNN, that combined with an improved significance detection method extracted a proposal for a particular design from the ship model. Kang M et al (M.kang, K.Ji, X.Leng, and Z.Lin, "context Region-based conditional passenger Network with Multilayer Fusion for SARShipedchoice," Remote Sens., vol.9, n.8, pp.860.2017) constructs a context Region-based CNN, and Multilayer Fusion is used for SAR ship detection, which is a well-designed deep-level Network consisting of an RPN with a high-resolution Network and an object detection Network with context characteristics. Tang et al (J.Tang, C.Deng.G.B.Huang, and B.Zhang, "Compressed-domain shift selection on space atmospheric image using deep neural network and hierarchical learning machine," IEEETrans. geosci. Remote Sens., vol.53, n.3.pp.1174-1185,2014) use Compressed domain to quickly extract candidate features of ships, use DNN to perform high-level feature representation and classification, and use ELM to perform efficient feature pool and decision.

However, the above-described method performs target detection on a horizontal area basis. In remote sensing satellite images, the aspect ratio of the ship is large, the ship tends to be densely distributed in a complex scene, and when the ship inclines, the overlapping area between the redundant area of the horizontal bounding box and the ship is relatively large. A large amount of noise is introduced in a complex scene and a large amount of redundant areas, so that the characteristic information is interfered and even submerged, and the accuracy of a detection result is influenced; also, the large number of redundant regions present in the horizontal bounding box also does not facilitate the operation of non-maximum suppression.

At present, methods for detecting ship targets based on a rotating area also exist, for example, "Zhongweifeng and the like". The remote sensing image ship target detection model [ J ] of the rotating rectangular area. computer aided design and graphics bulletin, 2019,31(11): 1935-; and the influence of the surrounding environment characteristics on the classifier is neglected because the ship is emphasized too much when the characteristics are pooled, so that the problem of sample misclassification is caused.

Disclosure of Invention

The invention provides a ship target detection algorithm based on multi-scale rotation, and aims to solve the problems that a ship target detection method based on a horizontal region has a large number of redundant regions so that a detection result is inaccurate, and an existing ship target detection method based on a rotating region has inaccurate coordinate regression and sample misclassification.

The invention provides a multi-scale rotating ship target detection algorithm, which comprises the following steps:

step 1, obtaining a multi-scale feature map of an input image, wherein the multi-scale feature map comprises a first class scale feature map, a second class scale feature map and a third class scale feature map with sequentially increasing scales;

step 2, performing feature fusion on the multi-scale feature map by adopting a feature pyramid network, wherein the feature pyramid network adopts a ResNet residual error network as a basic framework;

step 3, inputting a feature map output by the feature pyramid network into a regional suggestion network through a 3 x 3 convolutional layer, classifying each anchor frame by adopting the regional suggestion network according to a set classification judgment condition, endowing each anchor frame with parameter coordinates (x, y, w, h and theta), and obtaining a rotating boundary frame based on regression of the parameter coordinates (x, y, w, h and theta), wherein (x and y) represent coordinates of a central point of the boundary frame, w represents width of the boundary frame, h represents height of the boundary frame, theta represents an included angle between a main shaft direction and a horizontal shaft of a target, and theta belongs to [ -90 DEG, 0 ];

step 4, performing self-adaptive interest area alignment on a rotating bounding box generated by the area suggestion network to obtain a high-quality feature map;

and 5, screening the candidate frames of the high-quality feature map according to the set rotation non-maximum value inhibition constraint condition to obtain a detection target.

Further, the ResNet residual network comprises 4 layers from top to bottom, namely a P2 layer, a P3 layer, a P4 layer and a P5 layer which are sequentially connected; the P2 layer is used for processing the input first-class scale feature map into a second-class scale feature map and outputting the second-class scale feature map to the P3 layer; the P3 layer is used for processing the input second-class scale feature map into a third-class scale feature map and outputting the third-class scale feature map to the P4 layer; and the P4 layer is used for processing the input third-class scale feature map into a fourth-class scale feature map, generating 9 anchor frames according to a set anchor frame proportion for each feature point of each fourth-class scale feature map, wherein the anchor frame proportion is {1:7,1:5,1:3,1:2,1:1,2:1,3:1,5:1,7:1}, and the scale of the fourth-class scale feature map is larger than that of the third-class scale feature map.

Further, the classification determination condition is: when the IOU of the anchor frame is more than 0.6, the anchor frame is determined as a positive sample; when the IOU of the anchor frame is less than 0.25, the anchor frame is considered negative.

Further, the regression process of the rotating bounding box is as follows:

wherein (x, y, w, h, theta) represents the parameter coordinates of the predicted bounding box, (x)_a,y_a,w_a,h_a,θ_a) Parameter coordinates representing the anchor frame, (x)^*,y^*,w^*,h^*,θ^*) Parameter coordinates representing a real bounding box; (t)_x,t_y,t_w,t_h,t_θ) Regression parameters representing the prediction bounding box respectively represent the correction value of the central point of the prediction bounding box, the correction values of the length and the width and the correction value of the rotation angle; (t)^* _x,t^* _y,t^* _w,t^* _h,t^* _θ) Expressing regression parameters between the anchor frame and the real boundary frame, and respectively expressing a correction value of the central point of the real boundary frame, a correction value of the length and the width and a correction value of the rotation angle; k ∈ Z.

Further, the loss function used in the regression process is:

wherein l_iClass labels, p, representing target anchor boxes_iRepresents the probability distribution, u, of each layer calculated by softmax_i,v_iRepresenting parameterized coordinate vectors, u_i ^*,v_i ^*Offset value representing target anchor frame and real boundary frame, hyper-parameter lambda₁，λ₂Weight representing regression loss, N_clsIndicates the number of anchor frames, N_reg-hNumber of anchor boxes, N, participating in position regression_reg-rNumber of anchor boxes, p, participating in angle regression_j，p_kProbability values respectively belonging to the target categories;

L_cls(p_i,l_i)＝-logp_il_i

L_reg-h(u_j ^*,u_j)＝smooth_L1(u_j ^*-u_j)，L_reg-r(v_k ^*,v_k)＝smooth_L1(v_k ^*-v_k)

x＝u_j ^*-u_jor x ═ v_k ^*-v_k。

Further, the adaptive region of interest alignment in step 4 comprises: firstly, a mask bounding box is obtained through convolution suggestion training; the mask bounding box is then used to noise filter the rotating bounding box.

Further, the rotational non-maximum suppression constraint is: firstly, reserving a candidate box with an IOU smaller than 0.7; then, if the IOU is further judged and known to be in the range of [0.3,0.7], the candidate frame with the angle difference larger than 15 degrees is discarded.

The invention has the beneficial effects that:

according to the multi-scale rotating ship target detection algorithm provided by the embodiment of the invention, firstly, a characteristic pyramid network is used as a basic network of a detection frame, and the network can effectively integrate low-level position information and high-level semantic information to provide higher-level characteristic information for target detection; secondly, a self-adaptive region-of-interest alignment method is adopted to reduce the influence of redundant noise regions in the scheme and keep the integrity of semantic information and spatial information; and then, a rotation non-maximum value inhibition technology is adopted, and the redundancy of the rotating target is more strictly restricted, so that the algorithm can accurately predict the rotation boundary frame of the ship target.

Drawings

FIG. 1 is a schematic diagram of a detection framework of a multi-scale rotating ship target detection algorithm provided by an embodiment of the invention;

fig. 2 is a schematic structural diagram of a feature pyramid network according to an embodiment of the present invention;

FIG. 3 is a diagram illustrating adaptive region of interest alignment according to an embodiment of the present invention;

FIG. 4 is a diagram illustrating a smoothing function L1 according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of a distribution of sizes of ship targets in a training set and a test set provided by an embodiment of the present invention;

fig. 6 is a schematic diagram of a detection result of ship target detection by using the target detection algorithm of the present invention according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly described below with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Fig. 1 is a schematic view of a detection framework of a multi-scale rotating ship target detection algorithm provided in an embodiment of the present invention, and based on the detection framework, an embodiment of the present invention provides a multi-scale rotating ship target detection algorithm, which includes the following steps:

s101, obtaining a multi-scale feature map of an input image, wherein the multi-scale feature map comprises a first class of scale feature map, a second class of scale feature map and a third class of scale feature map, and scales of the first class of scale feature map, the second class of scale feature map and the third class of scale feature map are sequentially increased;

s102, performing feature fusion on the multi-scale feature map by adopting a feature pyramid network, wherein the feature pyramid network adopts a ResNet residual error network as a basic framework;

s103, inputting a feature map output by the feature pyramid network into a region suggestion network through a 3 x 3 convolutional layer, classifying each anchor frame by adopting the region suggestion network according to a set classification judgment condition, endowing each anchor frame with parameter coordinates (x, y, w, h and theta), and obtaining a rotating boundary frame based on regression of the parameter coordinates (x, y, w, h and theta), wherein (x and y) represent coordinates of a central point of the boundary frame, w represents width of the boundary frame, h represents height of the boundary frame, theta represents an included angle between a main shaft direction and a horizontal shaft of a target, and theta belongs to [ -90 DEG, 0 ];

s104, performing self-adaptive interest region alignment on a rotating bounding box generated by the region suggestion network to obtain a high-quality feature map;

specifically, the adaptive interest region alignment process in this step includes: firstly, a mask bounding box is obtained through convolution suggestion training; the mask bounding box is then used to noise filter the rotating bounding box.

For example, as shown in fig. 3, the embodiment of the present invention provides three methods for obtaining a fixed-length feature vector: (a) common ROI alignment; (b) rotating the ROI alignment; (c) adaptive ROI alignment; wherein the ROI is the region of interest.

As can be seen in fig. 3: common ROI alignment has a lot of noise, resulting in target features being covered; while rotational ROI alignment removes all noise through affine transformation, it loses spatial information of the target; the self-adaptive ROI alignment method designed by the invention automatically filters the noise area by introducing a mask, only a small amount of noise exists while the spatial information is kept, the stability of the network is improved, and a high-quality characteristic diagram is obtained.

And S105, screening the candidate frames of the high-quality feature map according to the set rotation non-maximum value inhibition constraint condition to obtain a detection target.

On the basis of the above embodiment, as shown in fig. 2, the ResNet residual network in step S102 includes 4 layers from top to bottom, which are a P2 layer, a P3 layer, a P4 layer and a P5 layer connected in sequence; the P2 layer is used for processing the input first-class scale feature map into a second-class scale feature map and outputting the second-class scale feature map to the P3 layer; the P3 layer is used for processing the input second-class scale feature map into a third-class scale feature map and outputting the third-class scale feature map to the P4 layer; and the P4 layer is used for processing the input third-class scale feature map into a fourth-class scale feature map, generating 9 anchor frames according to a set anchor frame proportion for each feature point of each fourth-class scale feature map, wherein the anchor frame proportion is {1:7,1:5,1:3,1:2,1:1,2:1,3:1,5:1,7:1}, and the scale of the fourth-class scale feature map is larger than that of the third-class scale feature map.

Specifically, in the stage of the feature extraction network, a ResNet residual network is used as a basic framework, and according to the structure of the residual network, the downsampling multiple corresponding to each feature map is {4,8,16,32 }. In a top-down network, the invention obtains features with higher resolution by interconnecting and fusing feature maps with different sizes.

As an implementable manner, in order to reduce the number of parameters, the present embodiment sets the number of channels of all the feature maps to 256. The anchor frame ratios are set to {1:7,1:5,1:3,1:2,1:1,2:1,3:1,5:1,7:1} in view of the characteristics of the ship, so that each feature point of each feature map generates 9 anchor frames according to the ratios, and since each anchor frame is provided with a binary label (for indicating whether the anchor frame is a positive sample or a negative sample) and a five-parameter coordinate, the output result of each layer has 9 × 2-18 channels, and the output result of each regression layer has 9 × 5-45 channels.

On the basis of the above embodiments, the regional suggestion network in step S103 is mainly responsible for completing 2 parts of work: firstly, classifying each anchor frame; then, regression is carried out on each anchor frame to obtain a rotating boundary frame.

Wherein, the classification process is as follows: specifically, during the training of the regional advice network, each anchor box has a two-class label and a five-parameter coordinate. The feature map output by layer P5 is input to the regional suggestion network through the 3 × 3 convolutional layers, and then classified and regressed by the two 1 × 1 convolutional layers. Because the negative samples account for the maximum proportion of all samples, classifying and regressing each anchor frame can cause a large amount of data redundancy, and therefore a certain amount of positive samples and negative samples need to be extracted and sent to a subsequent neural network. The classification determination conditions adopted in the present embodiment are as follows: when the IOU of the anchor frame is more than 0.6, the anchor frame is determined as a positive sample; when the IOU of the anchor frame is less than 0.25, the anchor frame is considered negative. The total number of samples is set to 256, and the ratio thereof is set to 1:1 in consideration of the balance of the positive and negative sample ratios.

In the target detection of the horizontal frame, a person skilled in the art generally uses 0.7 and 0.3 as the limit values for determining the positive and negative samples, considering that the IOU overlap is easy to calculate. However, the calculation of the rotated IOU is easily subject to a large fluctuation caused by a small change in angle, and is obviously different from the IOU threshold used in the detection of the horizontal frame, so the embodiment of the present invention avoids the risk of discriminating the positive sample from the negative sample due to a small change in the rotation angle by slightly lowering the threshold of the IOU.

The regression process was as follows: the regression process of the rotating bounding box is as follows:

The loss function used in the regression process is:

L_cls(p_i,l_i)＝-log p_il_i

x＝u_j ^*-u_jor x ═ v_k ^*-v_k。

Compared with the regression process of the loss function of the remote sensing image ship target detection algorithm of the 'Zhongweifeng.rotation rectangular region', the embodiment of the invention adopts the L1 smooth function as the loss function in the judgment of the category, and when the loss function is adopted to describe the loss, as shown in FIG. 4, when x is less than 1, the gradient is the self, and when the absolute value of x is more than 1, the gradient is considered to be 1, so that the back propagation of the gradient can be ensured, and the situation of gradient disappearance is not easy to occur.

Non-maximum suppression is to obtain a high quality bounding box and a small IOU. When vessels are densely distributed, conventional non-maximum suppression often faces the dilemma of the bounding box having a large IOU. IOU calculations on axially aligned bounding boxes may result in inaccurate IOU for skewed interactive bounding boxes, which in turn affects bounding box prediction. To solve this problem, the present embodiment adopts an oblique IOU calculation method based on the triangulation idea. The sensitive relationship between the IOU and the rotation angle θ often affects the detection result.

For the case that the IOU is greater than 0.7, the sample can be directly determined as a positive sample; for the case where the IOU is less than 0.3, it can be directly determined as a negative sample. But for the case of an IOU between 0.3 and 0.7, careful subdivision is required. For example, for a ship with an aspect ratio of 1:7, if the angle is different by 15 degrees, the IOU is only 0.38, and it is obviously not appropriate to directly determine the IOU as a positive sample. At this time, the judgment is made according to the angle difference, which refers to the size of the angle difference between the real bounding box and the predicted bounding box. Namely, if the angle difference is larger than 15 degrees, the overlapping degree is small, and the negative sample is judged; if the angle difference is smaller than 15 °, although the IOU is smaller than 0.7, the degree of overlap is visually high, and it can be said that the IOU is reduced by a slight angle difference, and it cannot be directly determined as a negative sample. Therefore, the present embodiment designs a rotational non-maximum suppression constraint: firstly, reserving a candidate box with an IOU smaller than 0.7; then, if the IOU is further judged and known to be in the range of [0.3,0.7], the candidate frame with the angle difference larger than 15 degrees is discarded.

In order to verify the effectiveness of the multi-scale rotating ship target detection algorithm provided by the invention, the invention also provides the following verification experiment.

The experimental environment is as follows: hardware environment: an Intel Core i9 processor; the NVIDIA GeForce GTX 1080 video card has 8G video memory; the memory is 32G. Software environment: pycharm + tensorlfow + python 3.6.

Data set and hyper-parameter settings: in the experiment, the proposed target detection algorithm is tested on an HRSC2016 dataset, wherein the HRSC2016 dataset comprises a training set and a testing set, wherein the training set comprises 617 images and 1748 ship targets; the test set contained 438 images, 1228 ship targets. The distribution of the target size of the ships in the training set and the test set is shown in fig. 5.

In the data set, the ship is divided into 3 levels, the first level is a 'ship', and 1 type of targets are shared; the second level is 4 types of targets including an aircraft carrier, a warship, a merchant ship and a submarine; the third level is a finer-grained ship model with 25 types of targets, and the specific category information is shown in table 1.

TABLE 1 HRSC2016 type information statistics table

All experiments were performed under the Tensorflow deep learning framework. This experiment uses the pre-trained model ResNet-101 to initialize the network.

For the HRSC2016 dataset, the experiment was trained for 40k iterations in total, with the learning rate for the first 20k iterations being 0.001, the learning rate for the next 10k iterations being 0.0001, and the learning rate for the remaining 10k iterations being 0.00001. The weight decay was 0.0001 and the momentum was 0.9. The Optimizer choice is Momentum Optimizer.

During training, this experiment randomly rotated the images and subtracted the mean values [103.939,116.779,123.68], which were from Image Net. Subtracting the mean value can centralize all dimensions of input data, and is beneficial to training of the model.

Evaluation indexes are as follows: and (3) performing target detection and identification based on deep learning by adopting a single-classification confusion matrix mode, and evaluating the model by calculating the AP and the mAP of the model in a test set. The number of targets actually and correctly identified as targets is referred to as TP; the number of targets that are actually non-targets but are identified as targets is called FP; the number of targets that are actually targets but are identified as non-targets is called FN; the number that is actually not targeted but is identified as targeted is called TN. Precision is the ratio of the actual number of positive samples to the total number of positive samples in the predicted sample, i.e. Precision

Recall is the Recall rate, which is the ratio of the actual number of samples in the predicted sample to the predicted number of samples, i.e.

Combining experimental results with different characteristic maps: in the process of target detection and identification, low-level feature semantic information is relatively less, but target positioning is accurate, and high-level feature semantic information is rich, but target positioning errors are relatively large. Therefore, the selection of the feature map is particularly important. In view of this, the experiment selects four different feature map combination strategies to study the influence of the feature map combination strategies on the detection performance. The specific combination strategy is shown in table 2.

TABLE 2 comparison of detection Performance for different signature combinations

Feature map combination	Recall (%)	Accuracy (%)
			P3	72.7	70.4
P2+P3	77.6	73.1
			P2+P3+P4	80.9	78.3
P2+P3+P4+P5	83.2	82.7

It can be seen that the detection performance of the model is the worst when only the P3 feature map is used. The detection performance is continuously improved along with the increase of the number of the fused feature layers, and in addition, the P2 layer is mainly used for small target detection, and the P5 layer is mainly used for large target detection. When all feature maps are used, the detection performance is optimized: the recall rate was 83.2% and the accuracy was 82.7%. Therefore, the multi-scale detection network is obviously superior to the single-scale detection network, especially in the aspect of small target detection. Better effect can be obtained only by making full use of effective fusion of characteristic information of each layer

Different test network experimental results: in order to prove that the method of the present invention is more competitive than the conventional computer vision detection method, the experiment compares the method of the present invention with the method based on fast RCNN and FPN, and the comparison result is shown in table 3.

TABLE 3 comparison of Performance of different detection methods

Detection method	Recall (%)	Accuracy (%)	Required time(s)
				Faster RCNN	75.3	73.7	0.11
FPN	77.1	76.9	0.15
				The method of the invention	83.2	82.7	0.18

In comparison with a fast RCNN and an FPN detection framework, the FPN based on the multi-scale network has better performance, the FPN adopting the experiment and fusing the multiple features obtains the highest precision value (82.7 percent), and the precision is higher than that in a document 'Zhongwei Peak and the like'. Compared with the two detection frameworks, the detection model provided by the invention has the advantages of best detection performance and highest recall rate. It is clear that the method of the invention provides superior performance on both multi-scale and high density objects. Fig. 6 is a schematic diagram of a detection result of ship target detection by using the detection algorithm of the present invention.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A multi-scale rotating ship target detection algorithm is characterized by comprising the following steps:

2. The target detection algorithm of claim 1, wherein the ResNet residual network comprises 4 layers from top to bottom, namely a P2 layer, a P3 layer, a P4 layer and a P5 layer which are connected in sequence; the P2 layer is used for processing the input first-class scale feature map into a second-class scale feature map and outputting the second-class scale feature map to the P3 layer; the P3 layer is used for processing the input second-class scale feature map into a third-class scale feature map and outputting the third-class scale feature map to the P4 layer; and the P4 layer is used for processing the input third-class scale feature map into a fourth-class scale feature map, generating 9 anchor frames according to a set anchor frame proportion for each feature point of each fourth-class scale feature map, wherein the anchor frame proportion is {1:7,1:5,1:3,1:2,1:1,2:1,3:1,5:1,7:1}, and the scale of the fourth-class scale feature map is larger than that of the third-class scale feature map.

3. The object detection algorithm of claim 1, wherein the classification decision condition is: when the IOU of the anchor frame is more than 0.6, the anchor frame is determined as a positive sample; when the IOU of the anchor frame is less than 0.25, the anchor frame is considered negative.

4. The object detection algorithm of claim 1, wherein the regression process of the rotated bounding box is:

5. The object detection algorithm of claim 4, wherein the loss function used in the regression process is:

wherein l_iLabel representing the target Anchor Box, p_iRepresents the probability distribution, u, of each layer calculated by softmax_i,v_iRepresenting parameterized coordinate vectors, u_i ^*,v_i ^*Offset value representing target anchor frame and real boundary frame, hyper-parameter lambda₁，λ₂Weight representing regression loss, N_clsIndicates the number of anchor frames, N_reg-hNumber of anchor boxes, N, participating in position regression_reg-rNumber of anchor boxes, p, participating in angle regression_j，p_kProbability values respectively belonging to the target categories;

L_cls(p_i,l_i)＝-logp_il_i

x＝u_j ^*-u_jor x ═ v_k ^*-v_k。

6. The object detection algorithm of claim 1, wherein the adaptive region of interest alignment in step 4 comprises: firstly, a mask bounding box is obtained through convolution suggestion training; the mask bounding box is then used to noise filter the rotating bounding box.

7. The target detection algorithm of claim 1, wherein the rotational non-maxima suppression constraint is: firstly, reserving a candidate box with an IOU smaller than 0.7; then, if the IOU is further judged and known to be in the range of [0.3,0.7], the candidate frame with the angle difference larger than 15 degrees is discarded.