CN111881918A - Multi-scale rotating ship target detection algorithm - Google Patents

Multi-scale rotating ship target detection algorithm Download PDF

Info

Publication number
CN111881918A
CN111881918A CN202010528579.4A CN202010528579A CN111881918A CN 111881918 A CN111881918 A CN 111881918A CN 202010528579 A CN202010528579 A CN 202010528579A CN 111881918 A CN111881918 A CN 111881918A
Authority
CN
China
Prior art keywords
feature map
scale feature
network
anchor
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010528579.4A
Other languages
Chinese (zh)
Other versions
CN111881918B (en
Inventor
刘建辉
江刚武
王鑫
张锐
徐佰祺
谭熊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Information Engineering University of PLA Strategic Support Force
Original Assignee
Information Engineering University of PLA Strategic Support Force
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Information Engineering University of PLA Strategic Support Force filed Critical Information Engineering University of PLA Strategic Support Force
Priority to CN202010528579.4A priority Critical patent/CN111881918B/en
Publication of CN111881918A publication Critical patent/CN111881918A/en
Application granted granted Critical
Publication of CN111881918B publication Critical patent/CN111881918B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • G06V10/464Salient features, e.g. scale invariant feature transforms [SIFT] using a plurality of salient features, e.g. bag-of-words [BoW] representations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Abstract

The invention provides a multi-scale rotating ship target detection algorithm. The algorithm comprises the following steps: acquiring a multi-scale feature map of an input image, wherein the multi-scale feature map comprises a first class scale feature map, a second class scale feature map and a third class scale feature map with sequentially increasing scales; performing feature fusion on the multi-scale feature map by adopting a feature pyramid network, wherein the feature pyramid network adopts a ResNet residual error network as a basic framework; inputting a feature map output by a feature pyramid network into a regional suggestion network through a 3 x 3 convolutional layer, classifying each anchor frame by adopting the regional suggestion network according to a set classification judgment condition, giving a parameter coordinate to each anchor frame, and obtaining a rotating boundary frame based on parameter coordinate regression; performing self-adaptive interest region alignment on a rotating bounding box generated by the regional suggestion network to obtain a high-quality feature map; and screening the candidate frames of the high-quality feature map according to the set rotation non-maximum value inhibition constraint condition to obtain a detection target.

Description

Multi-scale rotating ship target detection algorithm
Technical Field
The invention relates to the technical field of ship detection, in particular to a multi-scale rotating ship target detection algorithm.
Background
With the development of remote sensing technology, the acquisition of high-resolution remote sensing images becomes easier. Automatic detection of ships has long played an important role in the field of remote sensing, and has made great progress in port management, cargo transportation, rescue at sea, and the like. Meanwhile, berthing and navigation direction information of the ship has great significance. However, the large aspect ratio feature makes detection of the vessel more difficult than other objects (e.g., vehicles, buildings, aircraft, etc.).
In recent years, deep learning has been highly successful in the field of computer vision. The scheme based on the Regional Convolutional Neural Network (RCNN) provides a good approach for target detection, the detection result is far better than that of the traditional detection method, but the RCNN has some obvious defects in the calculation speed and the storage space. The Fast-RCNN significantly improves the detection efficiency through shared computation, effectively reduces the storage space, and adopts a Region pro-nodal network (RPN) to replace Selective Search (Selective Search) to realize end-to-end training, thereby improving the detection efficiency and accuracy. With the application of the depth CNN in target detection, the ship detection algorithm based on the deep learning is widely applied to remote sensing ship detection. Kang M et al (M.kang, X.Leng, Z.Lin, and K, Ji, "A modified Faster R-CNN based on CFAR algorithm for SAR ship detection," International work shop Remote Sensing with Intelligent Processing IEEE, pp.1-4,2017) use the target suggestion generated by Faster R-CNN as the protection window of CFAR algorithm, and then pick up small targets, thereby re-evaluating the bounding box with relatively low classification score in the detection network. Zhang R et al (r.zhang, j.yao, k.zhang, c.feng, and j.zhang, "S-CNN ship detection from high-resolution removal Sensing images," isps-international archives of the photo metric, removal Sensing and spatialinformation Sciences, vol.xli-B7, pp.423-430,2016) proposed a new CNN-based ship detection model, called SCNN, that combined with an improved significance detection method extracted a proposal for a particular design from the ship model. Kang M et al (M.kang, K.Ji, X.Leng, and Z.Lin, "context Region-based conditional passenger Network with Multilayer Fusion for SARShipedchoice," Remote Sens., vol.9, n.8, pp.860.2017) constructs a context Region-based CNN, and Multilayer Fusion is used for SAR ship detection, which is a well-designed deep-level Network consisting of an RPN with a high-resolution Network and an object detection Network with context characteristics. Tang et al (J.Tang, C.Deng.G.B.Huang, and B.Zhang, "Compressed-domain shift selection on space atmospheric image using deep neural network and hierarchical learning machine," IEEETrans. geosci. Remote Sens., vol.53, n.3.pp.1174-1185,2014) use Compressed domain to quickly extract candidate features of ships, use DNN to perform high-level feature representation and classification, and use ELM to perform efficient feature pool and decision.
However, the above-described method performs target detection on a horizontal area basis. In remote sensing satellite images, the aspect ratio of the ship is large, the ship tends to be densely distributed in a complex scene, and when the ship inclines, the overlapping area between the redundant area of the horizontal bounding box and the ship is relatively large. A large amount of noise is introduced in a complex scene and a large amount of redundant areas, so that the characteristic information is interfered and even submerged, and the accuracy of a detection result is influenced; also, the large number of redundant regions present in the horizontal bounding box also does not facilitate the operation of non-maximum suppression.
At present, methods for detecting ship targets based on a rotating area also exist, for example, "Zhongweifeng and the like". The remote sensing image ship target detection model [ J ] of the rotating rectangular area. computer aided design and graphics bulletin, 2019,31(11): 1935-; and the influence of the surrounding environment characteristics on the classifier is neglected because the ship is emphasized too much when the characteristics are pooled, so that the problem of sample misclassification is caused.
Disclosure of Invention
The invention provides a ship target detection algorithm based on multi-scale rotation, and aims to solve the problems that a ship target detection method based on a horizontal region has a large number of redundant regions so that a detection result is inaccurate, and an existing ship target detection method based on a rotating region has inaccurate coordinate regression and sample misclassification.
The invention provides a multi-scale rotating ship target detection algorithm, which comprises the following steps:
step 1, obtaining a multi-scale feature map of an input image, wherein the multi-scale feature map comprises a first class scale feature map, a second class scale feature map and a third class scale feature map with sequentially increasing scales;
step 2, performing feature fusion on the multi-scale feature map by adopting a feature pyramid network, wherein the feature pyramid network adopts a ResNet residual error network as a basic framework;
step 3, inputting a feature map output by the feature pyramid network into a regional suggestion network through a 3 x 3 convolutional layer, classifying each anchor frame by adopting the regional suggestion network according to a set classification judgment condition, endowing each anchor frame with parameter coordinates (x, y, w, h and theta), and obtaining a rotating boundary frame based on regression of the parameter coordinates (x, y, w, h and theta), wherein (x and y) represent coordinates of a central point of the boundary frame, w represents width of the boundary frame, h represents height of the boundary frame, theta represents an included angle between a main shaft direction and a horizontal shaft of a target, and theta belongs to [ -90 DEG, 0 ];
step 4, performing self-adaptive interest area alignment on a rotating bounding box generated by the area suggestion network to obtain a high-quality feature map;
and 5, screening the candidate frames of the high-quality feature map according to the set rotation non-maximum value inhibition constraint condition to obtain a detection target.
Further, the ResNet residual network comprises 4 layers from top to bottom, namely a P2 layer, a P3 layer, a P4 layer and a P5 layer which are sequentially connected; the P2 layer is used for processing the input first-class scale feature map into a second-class scale feature map and outputting the second-class scale feature map to the P3 layer; the P3 layer is used for processing the input second-class scale feature map into a third-class scale feature map and outputting the third-class scale feature map to the P4 layer; and the P4 layer is used for processing the input third-class scale feature map into a fourth-class scale feature map, generating 9 anchor frames according to a set anchor frame proportion for each feature point of each fourth-class scale feature map, wherein the anchor frame proportion is {1:7,1:5,1:3,1:2,1:1,2:1,3:1,5:1,7:1}, and the scale of the fourth-class scale feature map is larger than that of the third-class scale feature map.
Further, the classification determination condition is: when the IOU of the anchor frame is more than 0.6, the anchor frame is determined as a positive sample; when the IOU of the anchor frame is less than 0.25, the anchor frame is considered negative.
Further, the regression process of the rotating bounding box is as follows:
Figure BDA0002534549940000031
Figure BDA0002534549940000032
wherein (x, y, w, h, theta) represents the parameter coordinates of the predicted bounding box, (x)a,ya,wa,haa) Parameter coordinates representing the anchor frame, (x)*,y*,w*,h**) Parameter coordinates representing a real bounding box; (t)x,ty,tw,th,tθ) Regression parameters representing the prediction bounding box respectively represent the correction value of the central point of the prediction bounding box, the correction values of the length and the width and the correction value of the rotation angle; (t)* x,t* y,t* w,t* h,t* θ) Expressing regression parameters between the anchor frame and the real boundary frame, and respectively expressing a correction value of the central point of the real boundary frame, a correction value of the length and the width and a correction value of the rotation angle; k ∈ Z.
Further, the loss function used in the regression process is:
Figure BDA0002534549940000041
wherein liClass labels, p, representing target anchor boxesiRepresents the probability distribution, u, of each layer calculated by softmaxi,viRepresenting parameterized coordinate vectors, ui *,vi *Offset value representing target anchor frame and real boundary frame, hyper-parameter lambda1,λ2Weight representing regression loss, NclsIndicates the number of anchor frames, Nreg-hNumber of anchor boxes, N, participating in position regressionreg-rNumber of anchor boxes, p, participating in angle regressionj,pkProbability values respectively belonging to the target categories;
Lcls(pi,li)=-logpili
Lreg-h(uj *,uj)=smoothL1(uj *-uj),Lreg-r(vk *,vk)=smoothL1(vk *-vk)
Figure BDA0002534549940000042
x=uj *-ujor x ═ vk *-vk
Further, the adaptive region of interest alignment in step 4 comprises: firstly, a mask bounding box is obtained through convolution suggestion training; the mask bounding box is then used to noise filter the rotating bounding box.
Further, the rotational non-maximum suppression constraint is: firstly, reserving a candidate box with an IOU smaller than 0.7; then, if the IOU is further judged and known to be in the range of [0.3,0.7], the candidate frame with the angle difference larger than 15 degrees is discarded.
The invention has the beneficial effects that:
according to the multi-scale rotating ship target detection algorithm provided by the embodiment of the invention, firstly, a characteristic pyramid network is used as a basic network of a detection frame, and the network can effectively integrate low-level position information and high-level semantic information to provide higher-level characteristic information for target detection; secondly, a self-adaptive region-of-interest alignment method is adopted to reduce the influence of redundant noise regions in the scheme and keep the integrity of semantic information and spatial information; and then, a rotation non-maximum value inhibition technology is adopted, and the redundancy of the rotating target is more strictly restricted, so that the algorithm can accurately predict the rotation boundary frame of the ship target.
Drawings
FIG. 1 is a schematic diagram of a detection framework of a multi-scale rotating ship target detection algorithm provided by an embodiment of the invention;
fig. 2 is a schematic structural diagram of a feature pyramid network according to an embodiment of the present invention;
FIG. 3 is a diagram illustrating adaptive region of interest alignment according to an embodiment of the present invention;
FIG. 4 is a diagram illustrating a smoothing function L1 according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of a distribution of sizes of ship targets in a training set and a test set provided by an embodiment of the present invention;
fig. 6 is a schematic diagram of a detection result of ship target detection by using the target detection algorithm of the present invention according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly described below with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 is a schematic view of a detection framework of a multi-scale rotating ship target detection algorithm provided in an embodiment of the present invention, and based on the detection framework, an embodiment of the present invention provides a multi-scale rotating ship target detection algorithm, which includes the following steps:
s101, obtaining a multi-scale feature map of an input image, wherein the multi-scale feature map comprises a first class of scale feature map, a second class of scale feature map and a third class of scale feature map, and scales of the first class of scale feature map, the second class of scale feature map and the third class of scale feature map are sequentially increased;
s102, performing feature fusion on the multi-scale feature map by adopting a feature pyramid network, wherein the feature pyramid network adopts a ResNet residual error network as a basic framework;
s103, inputting a feature map output by the feature pyramid network into a region suggestion network through a 3 x 3 convolutional layer, classifying each anchor frame by adopting the region suggestion network according to a set classification judgment condition, endowing each anchor frame with parameter coordinates (x, y, w, h and theta), and obtaining a rotating boundary frame based on regression of the parameter coordinates (x, y, w, h and theta), wherein (x and y) represent coordinates of a central point of the boundary frame, w represents width of the boundary frame, h represents height of the boundary frame, theta represents an included angle between a main shaft direction and a horizontal shaft of a target, and theta belongs to [ -90 DEG, 0 ];
s104, performing self-adaptive interest region alignment on a rotating bounding box generated by the region suggestion network to obtain a high-quality feature map;
specifically, the adaptive interest region alignment process in this step includes: firstly, a mask bounding box is obtained through convolution suggestion training; the mask bounding box is then used to noise filter the rotating bounding box.
For example, as shown in fig. 3, the embodiment of the present invention provides three methods for obtaining a fixed-length feature vector: (a) common ROI alignment; (b) rotating the ROI alignment; (c) adaptive ROI alignment; wherein the ROI is the region of interest.
As can be seen in fig. 3: common ROI alignment has a lot of noise, resulting in target features being covered; while rotational ROI alignment removes all noise through affine transformation, it loses spatial information of the target; the self-adaptive ROI alignment method designed by the invention automatically filters the noise area by introducing a mask, only a small amount of noise exists while the spatial information is kept, the stability of the network is improved, and a high-quality characteristic diagram is obtained.
And S105, screening the candidate frames of the high-quality feature map according to the set rotation non-maximum value inhibition constraint condition to obtain a detection target.
According to the multi-scale rotating ship target detection algorithm provided by the embodiment of the invention, firstly, a characteristic pyramid network is used as a basic network of a detection frame, and the network can effectively integrate low-level position information and high-level semantic information to provide higher-level characteristic information for target detection; secondly, a self-adaptive region-of-interest alignment method is adopted to reduce the influence of redundant noise regions in the scheme and keep the integrity of semantic information and spatial information; and then, a rotation non-maximum value inhibition technology is adopted, and the redundancy of the rotating target is more strictly restricted, so that the algorithm can accurately predict the rotation boundary frame of the ship target.
On the basis of the above embodiment, as shown in fig. 2, the ResNet residual network in step S102 includes 4 layers from top to bottom, which are a P2 layer, a P3 layer, a P4 layer and a P5 layer connected in sequence; the P2 layer is used for processing the input first-class scale feature map into a second-class scale feature map and outputting the second-class scale feature map to the P3 layer; the P3 layer is used for processing the input second-class scale feature map into a third-class scale feature map and outputting the third-class scale feature map to the P4 layer; and the P4 layer is used for processing the input third-class scale feature map into a fourth-class scale feature map, generating 9 anchor frames according to a set anchor frame proportion for each feature point of each fourth-class scale feature map, wherein the anchor frame proportion is {1:7,1:5,1:3,1:2,1:1,2:1,3:1,5:1,7:1}, and the scale of the fourth-class scale feature map is larger than that of the third-class scale feature map.
Specifically, in the stage of the feature extraction network, a ResNet residual network is used as a basic framework, and according to the structure of the residual network, the downsampling multiple corresponding to each feature map is {4,8,16,32 }. In a top-down network, the invention obtains features with higher resolution by interconnecting and fusing feature maps with different sizes.
As an implementable manner, in order to reduce the number of parameters, the present embodiment sets the number of channels of all the feature maps to 256. The anchor frame ratios are set to {1:7,1:5,1:3,1:2,1:1,2:1,3:1,5:1,7:1} in view of the characteristics of the ship, so that each feature point of each feature map generates 9 anchor frames according to the ratios, and since each anchor frame is provided with a binary label (for indicating whether the anchor frame is a positive sample or a negative sample) and a five-parameter coordinate, the output result of each layer has 9 × 2-18 channels, and the output result of each regression layer has 9 × 5-45 channels.
On the basis of the above embodiments, the regional suggestion network in step S103 is mainly responsible for completing 2 parts of work: firstly, classifying each anchor frame; then, regression is carried out on each anchor frame to obtain a rotating boundary frame.
Wherein, the classification process is as follows: specifically, during the training of the regional advice network, each anchor box has a two-class label and a five-parameter coordinate. The feature map output by layer P5 is input to the regional suggestion network through the 3 × 3 convolutional layers, and then classified and regressed by the two 1 × 1 convolutional layers. Because the negative samples account for the maximum proportion of all samples, classifying and regressing each anchor frame can cause a large amount of data redundancy, and therefore a certain amount of positive samples and negative samples need to be extracted and sent to a subsequent neural network. The classification determination conditions adopted in the present embodiment are as follows: when the IOU of the anchor frame is more than 0.6, the anchor frame is determined as a positive sample; when the IOU of the anchor frame is less than 0.25, the anchor frame is considered negative. The total number of samples is set to 256, and the ratio thereof is set to 1:1 in consideration of the balance of the positive and negative sample ratios.
In the target detection of the horizontal frame, a person skilled in the art generally uses 0.7 and 0.3 as the limit values for determining the positive and negative samples, considering that the IOU overlap is easy to calculate. However, the calculation of the rotated IOU is easily subject to a large fluctuation caused by a small change in angle, and is obviously different from the IOU threshold used in the detection of the horizontal frame, so the embodiment of the present invention avoids the risk of discriminating the positive sample from the negative sample due to a small change in the rotation angle by slightly lowering the threshold of the IOU.
The regression process was as follows: the regression process of the rotating bounding box is as follows:
Figure BDA0002534549940000071
Figure BDA0002534549940000072
wherein (x, y, w, h, theta) represents the parameter coordinates of the predicted bounding box, (x)a,ya,wa,haa) Parameter coordinates representing the anchor frame, (x)*,y*,w*,h**) Parameter coordinates representing a real bounding box; (t)x,ty,tw,th,tθ) Regression parameters representing the prediction bounding box respectively represent the correction value of the central point of the prediction bounding box, the correction values of the length and the width and the correction value of the rotation angle; (t)* x,t* y,t* w,t* h,t* θ) Expressing regression parameters between the anchor frame and the real boundary frame, and respectively expressing a correction value of the central point of the real boundary frame, a correction value of the length and the width and a correction value of the rotation angle; k ∈ Z.
The loss function used in the regression process is:
Figure BDA0002534549940000081
wherein liClass labels, p, representing target anchor boxesiRepresents the probability distribution, u, of each layer calculated by softmaxi,viRepresenting parameterized coordinate vectors, ui *,vi *Offset value representing target anchor frame and real boundary frame, hyper-parameter lambda1,λ2Weight representing regression loss, NclsIndicates the number of anchor frames, Nreg-hNumber of anchor boxes, N, participating in position regressionreg-rNumber of anchor boxes, p, participating in angle regressionj,pkProbability values respectively belonging to the target categories;
Lcls(pi,li)=-log pili
Lreg-h(uj *,uj)=smoothL1(uj *-uj),Lreg-r(vk *,vk)=smoothL1(vk *-vk)
Figure BDA0002534549940000082
x=uj *-ujor x ═ vk *-vk
Compared with the regression process of the loss function of the remote sensing image ship target detection algorithm of the 'Zhongweifeng.rotation rectangular region', the embodiment of the invention adopts the L1 smooth function as the loss function in the judgment of the category, and when the loss function is adopted to describe the loss, as shown in FIG. 4, when x is less than 1, the gradient is the self, and when the absolute value of x is more than 1, the gradient is considered to be 1, so that the back propagation of the gradient can be ensured, and the situation of gradient disappearance is not easy to occur.
Non-maximum suppression is to obtain a high quality bounding box and a small IOU. When vessels are densely distributed, conventional non-maximum suppression often faces the dilemma of the bounding box having a large IOU. IOU calculations on axially aligned bounding boxes may result in inaccurate IOU for skewed interactive bounding boxes, which in turn affects bounding box prediction. To solve this problem, the present embodiment adopts an oblique IOU calculation method based on the triangulation idea. The sensitive relationship between the IOU and the rotation angle θ often affects the detection result.
For the case that the IOU is greater than 0.7, the sample can be directly determined as a positive sample; for the case where the IOU is less than 0.3, it can be directly determined as a negative sample. But for the case of an IOU between 0.3 and 0.7, careful subdivision is required. For example, for a ship with an aspect ratio of 1:7, if the angle is different by 15 degrees, the IOU is only 0.38, and it is obviously not appropriate to directly determine the IOU as a positive sample. At this time, the judgment is made according to the angle difference, which refers to the size of the angle difference between the real bounding box and the predicted bounding box. Namely, if the angle difference is larger than 15 degrees, the overlapping degree is small, and the negative sample is judged; if the angle difference is smaller than 15 °, although the IOU is smaller than 0.7, the degree of overlap is visually high, and it can be said that the IOU is reduced by a slight angle difference, and it cannot be directly determined as a negative sample. Therefore, the present embodiment designs a rotational non-maximum suppression constraint: firstly, reserving a candidate box with an IOU smaller than 0.7; then, if the IOU is further judged and known to be in the range of [0.3,0.7], the candidate frame with the angle difference larger than 15 degrees is discarded.
In order to verify the effectiveness of the multi-scale rotating ship target detection algorithm provided by the invention, the invention also provides the following verification experiment.
The experimental environment is as follows: hardware environment: an Intel Core i9 processor; the NVIDIA GeForce GTX 1080 video card has 8G video memory; the memory is 32G. Software environment: pycharm + tensorlfow + python 3.6.
Data set and hyper-parameter settings: in the experiment, the proposed target detection algorithm is tested on an HRSC2016 dataset, wherein the HRSC2016 dataset comprises a training set and a testing set, wherein the training set comprises 617 images and 1748 ship targets; the test set contained 438 images, 1228 ship targets. The distribution of the target size of the ships in the training set and the test set is shown in fig. 5.
In the data set, the ship is divided into 3 levels, the first level is a 'ship', and 1 type of targets are shared; the second level is 4 types of targets including an aircraft carrier, a warship, a merchant ship and a submarine; the third level is a finer-grained ship model with 25 types of targets, and the specific category information is shown in table 1.
TABLE 1 HRSC2016 type information statistics table
Figure BDA0002534549940000091
Figure BDA0002534549940000101
All experiments were performed under the Tensorflow deep learning framework. This experiment uses the pre-trained model ResNet-101 to initialize the network.
For the HRSC2016 dataset, the experiment was trained for 40k iterations in total, with the learning rate for the first 20k iterations being 0.001, the learning rate for the next 10k iterations being 0.0001, and the learning rate for the remaining 10k iterations being 0.00001. The weight decay was 0.0001 and the momentum was 0.9. The Optimizer choice is Momentum Optimizer.
During training, this experiment randomly rotated the images and subtracted the mean values [103.939,116.779,123.68], which were from Image Net. Subtracting the mean value can centralize all dimensions of input data, and is beneficial to training of the model.
Evaluation indexes are as follows: and (3) performing target detection and identification based on deep learning by adopting a single-classification confusion matrix mode, and evaluating the model by calculating the AP and the mAP of the model in a test set. The number of targets actually and correctly identified as targets is referred to as TP; the number of targets that are actually non-targets but are identified as targets is called FP; the number of targets that are actually targets but are identified as non-targets is called FN; the number that is actually not targeted but is identified as targeted is called TN. Precision is the ratio of the actual number of positive samples to the total number of positive samples in the predicted sample, i.e. Precision
Figure BDA0002534549940000111
Recall is the Recall rate, which is the ratio of the actual number of samples in the predicted sample to the predicted number of samples, i.e.
Figure BDA0002534549940000112
Combining experimental results with different characteristic maps: in the process of target detection and identification, low-level feature semantic information is relatively less, but target positioning is accurate, and high-level feature semantic information is rich, but target positioning errors are relatively large. Therefore, the selection of the feature map is particularly important. In view of this, the experiment selects four different feature map combination strategies to study the influence of the feature map combination strategies on the detection performance. The specific combination strategy is shown in table 2.
TABLE 2 comparison of detection Performance for different signature combinations
Feature map combination Recall (%) Accuracy (%)
P3 72.7 70.4
P2+P3 77.6 73.1
P2+P3+P4 80.9 78.3
P2+P3+P4+P5 83.2 82.7
It can be seen that the detection performance of the model is the worst when only the P3 feature map is used. The detection performance is continuously improved along with the increase of the number of the fused feature layers, and in addition, the P2 layer is mainly used for small target detection, and the P5 layer is mainly used for large target detection. When all feature maps are used, the detection performance is optimized: the recall rate was 83.2% and the accuracy was 82.7%. Therefore, the multi-scale detection network is obviously superior to the single-scale detection network, especially in the aspect of small target detection. Better effect can be obtained only by making full use of effective fusion of characteristic information of each layer
Different test network experimental results: in order to prove that the method of the present invention is more competitive than the conventional computer vision detection method, the experiment compares the method of the present invention with the method based on fast RCNN and FPN, and the comparison result is shown in table 3.
TABLE 3 comparison of Performance of different detection methods
Detection method Recall (%) Accuracy (%) Required time(s)
Faster RCNN 75.3 73.7 0.11
FPN 77.1 76.9 0.15
The method of the invention 83.2 82.7 0.18
In comparison with a fast RCNN and an FPN detection framework, the FPN based on the multi-scale network has better performance, the FPN adopting the experiment and fusing the multiple features obtains the highest precision value (82.7 percent), and the precision is higher than that in a document 'Zhongwei Peak and the like'. Compared with the two detection frameworks, the detection model provided by the invention has the advantages of best detection performance and highest recall rate. It is clear that the method of the invention provides superior performance on both multi-scale and high density objects. Fig. 6 is a schematic diagram of a detection result of ship target detection by using the detection algorithm of the present invention.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (7)

1. A multi-scale rotating ship target detection algorithm is characterized by comprising the following steps:
step 1, obtaining a multi-scale feature map of an input image, wherein the multi-scale feature map comprises a first class scale feature map, a second class scale feature map and a third class scale feature map with sequentially increasing scales;
step 2, performing feature fusion on the multi-scale feature map by adopting a feature pyramid network, wherein the feature pyramid network adopts a ResNet residual error network as a basic framework;
step 3, inputting a feature map output by the feature pyramid network into a regional suggestion network through a 3 x 3 convolutional layer, classifying each anchor frame by adopting the regional suggestion network according to a set classification judgment condition, endowing each anchor frame with parameter coordinates (x, y, w, h and theta), and obtaining a rotating boundary frame based on regression of the parameter coordinates (x, y, w, h and theta), wherein (x and y) represent coordinates of a central point of the boundary frame, w represents width of the boundary frame, h represents height of the boundary frame, theta represents an included angle between a main shaft direction and a horizontal shaft of a target, and theta belongs to [ -90 DEG, 0 ];
step 4, performing self-adaptive interest area alignment on a rotating bounding box generated by the area suggestion network to obtain a high-quality feature map;
and 5, screening the candidate frames of the high-quality feature map according to the set rotation non-maximum value inhibition constraint condition to obtain a detection target.
2. The target detection algorithm of claim 1, wherein the ResNet residual network comprises 4 layers from top to bottom, namely a P2 layer, a P3 layer, a P4 layer and a P5 layer which are connected in sequence; the P2 layer is used for processing the input first-class scale feature map into a second-class scale feature map and outputting the second-class scale feature map to the P3 layer; the P3 layer is used for processing the input second-class scale feature map into a third-class scale feature map and outputting the third-class scale feature map to the P4 layer; and the P4 layer is used for processing the input third-class scale feature map into a fourth-class scale feature map, generating 9 anchor frames according to a set anchor frame proportion for each feature point of each fourth-class scale feature map, wherein the anchor frame proportion is {1:7,1:5,1:3,1:2,1:1,2:1,3:1,5:1,7:1}, and the scale of the fourth-class scale feature map is larger than that of the third-class scale feature map.
3. The object detection algorithm of claim 1, wherein the classification decision condition is: when the IOU of the anchor frame is more than 0.6, the anchor frame is determined as a positive sample; when the IOU of the anchor frame is less than 0.25, the anchor frame is considered negative.
4. The object detection algorithm of claim 1, wherein the regression process of the rotated bounding box is:
Figure FDA0002534549930000021
Figure FDA0002534549930000022
wherein (x, y, w, h, theta) represents the parameter coordinates of the predicted bounding box, (x)a,ya,wa,haa) Parameter coordinates representing the anchor frame, (x)*,y*,w*,h**) Parameter coordinates representing a real bounding box; (t)x,ty,tw,th,tθ) Regression parameters representing the prediction bounding box respectively represent the correction value of the central point of the prediction bounding box, the correction values of the length and the width and the correction value of the rotation angle; (t)* x,t* y,t* w,t* h,t* θ) Expressing regression parameters between the anchor frame and the real boundary frame, and respectively expressing a correction value of the central point of the real boundary frame, a correction value of the length and the width and a correction value of the rotation angle; k ∈ Z.
5. The object detection algorithm of claim 4, wherein the loss function used in the regression process is:
Figure FDA0002534549930000023
wherein liLabel representing the target Anchor Box, piRepresents the probability distribution, u, of each layer calculated by softmaxi,viRepresenting parameterized coordinate vectors, ui *,vi *Offset value representing target anchor frame and real boundary frame, hyper-parameter lambda1,λ2Weight representing regression loss, NclsIndicates the number of anchor frames, Nreg-hNumber of anchor boxes, N, participating in position regressionreg-rNumber of anchor boxes, p, participating in angle regressionj,pkProbability values respectively belonging to the target categories;
Lcls(pi,li)=-logpili
Lreg-h(uj *,uj)=smoothL1(uj *-uj),Lreg-r(vk *,vk)=smoothL1(vk *-vk)
Figure FDA0002534549930000024
x=uj *-ujor x ═ vk *-vk
6. The object detection algorithm of claim 1, wherein the adaptive region of interest alignment in step 4 comprises: firstly, a mask bounding box is obtained through convolution suggestion training; the mask bounding box is then used to noise filter the rotating bounding box.
7. The target detection algorithm of claim 1, wherein the rotational non-maxima suppression constraint is: firstly, reserving a candidate box with an IOU smaller than 0.7; then, if the IOU is further judged and known to be in the range of [0.3,0.7], the candidate frame with the angle difference larger than 15 degrees is discarded.
CN202010528579.4A 2020-06-11 2020-06-11 Multi-scale rotating ship target detection algorithm Active CN111881918B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010528579.4A CN111881918B (en) 2020-06-11 2020-06-11 Multi-scale rotating ship target detection algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010528579.4A CN111881918B (en) 2020-06-11 2020-06-11 Multi-scale rotating ship target detection algorithm

Publications (2)

Publication Number Publication Date
CN111881918A true CN111881918A (en) 2020-11-03
CN111881918B CN111881918B (en) 2022-10-04

Family

ID=73156757

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010528579.4A Active CN111881918B (en) 2020-06-11 2020-06-11 Multi-scale rotating ship target detection algorithm

Country Status (1)

Country Link
CN (1) CN111881918B (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112395969A (en) * 2020-11-13 2021-02-23 中国人民解放军空军工程大学 Remote sensing image rotating ship detection method based on characteristic pyramid
CN112668648A (en) * 2020-12-29 2021-04-16 西安电子科技大学 Infrared and visible light fusion identification method based on symmetric fusion network
CN112766194A (en) * 2021-01-26 2021-05-07 上海海洋大学 Detection method for mesoscale ocean eddy
CN112766221A (en) * 2021-02-01 2021-05-07 福州大学 Ship direction and position multitask-based SAR image ship target detection method
CN112800955A (en) * 2021-01-27 2021-05-14 中国人民解放军战略支援部队信息工程大学 Remote sensing image rotating target detection method and system based on weighted bidirectional feature pyramid
CN112883887A (en) * 2021-03-01 2021-06-01 中央财经大学 Building example automatic extraction method based on high spatial resolution optical remote sensing image
CN113012153A (en) * 2021-04-30 2021-06-22 武汉纺织大学 Aluminum profile flaw detection method
CN113033672A (en) * 2021-03-29 2021-06-25 西安电子科技大学 Multi-class optical image rotating target self-adaptive detection method based on feature enhancement
CN113095373A (en) * 2021-03-22 2021-07-09 南京邮电大学 Ship detection method and system based on self-adaptive position prediction and capable of detecting any rotation angle
CN113420648A (en) * 2021-06-22 2021-09-21 深圳市华汉伟业科技有限公司 Target detection method and system with rotation adaptability
CN113536936A (en) * 2021-06-17 2021-10-22 中国人民解放军海军航空大学航空作战勤务学院 Ship target detection method and system
CN115294452A (en) * 2022-08-08 2022-11-04 中国人民解放军火箭军工程大学 Rotary SAR ship target detection method based on bidirectional characteristic pyramid network
CN116310837A (en) * 2023-04-11 2023-06-23 安徽大学 SAR ship target rotation detection method and system
CN116823838A (en) * 2023-08-31 2023-09-29 武汉理工大学三亚科教创新园 Ocean ship detection method and system with Gaussian prior label distribution and characteristic decoupling

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111027547A (en) * 2019-12-06 2020-04-17 南京大学 Automatic detection method for multi-scale polymorphic target in two-dimensional image
CN111091105A (en) * 2019-12-23 2020-05-01 郑州轻工业大学 Remote sensing image target detection method based on new frame regression loss function

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111027547A (en) * 2019-12-06 2020-04-17 南京大学 Automatic detection method for multi-scale polymorphic target in two-dimensional image
CN111091105A (en) * 2019-12-23 2020-05-01 郑州轻工业大学 Remote sensing image target detection method based on new frame regression loss function

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
周慧等: "基于特征金字塔模型的高分辨率遥感图像船舶目标检测", 《大连海事大学学报》 *

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112395969A (en) * 2020-11-13 2021-02-23 中国人民解放军空军工程大学 Remote sensing image rotating ship detection method based on characteristic pyramid
CN112668648A (en) * 2020-12-29 2021-04-16 西安电子科技大学 Infrared and visible light fusion identification method based on symmetric fusion network
CN112668648B (en) * 2020-12-29 2023-06-20 西安电子科技大学 Infrared and visible light fusion recognition method based on symmetrical fusion network
CN112766194A (en) * 2021-01-26 2021-05-07 上海海洋大学 Detection method for mesoscale ocean eddy
CN112800955A (en) * 2021-01-27 2021-05-14 中国人民解放军战略支援部队信息工程大学 Remote sensing image rotating target detection method and system based on weighted bidirectional feature pyramid
CN112766221A (en) * 2021-02-01 2021-05-07 福州大学 Ship direction and position multitask-based SAR image ship target detection method
CN112766221B (en) * 2021-02-01 2022-06-14 福州大学 Ship direction and position multitasking-based SAR image ship target detection method
CN112883887A (en) * 2021-03-01 2021-06-01 中央财经大学 Building example automatic extraction method based on high spatial resolution optical remote sensing image
CN113095373B (en) * 2021-03-22 2022-09-27 南京邮电大学 Ship detection method and system based on self-adaptive position prediction and capable of detecting any rotation angle
CN113095373A (en) * 2021-03-22 2021-07-09 南京邮电大学 Ship detection method and system based on self-adaptive position prediction and capable of detecting any rotation angle
CN113033672B (en) * 2021-03-29 2023-07-28 西安电子科技大学 Multi-class optical image rotation target self-adaptive detection method based on feature enhancement
CN113033672A (en) * 2021-03-29 2021-06-25 西安电子科技大学 Multi-class optical image rotating target self-adaptive detection method based on feature enhancement
CN113012153A (en) * 2021-04-30 2021-06-22 武汉纺织大学 Aluminum profile flaw detection method
CN113536936A (en) * 2021-06-17 2021-10-22 中国人民解放军海军航空大学航空作战勤务学院 Ship target detection method and system
CN113420648A (en) * 2021-06-22 2021-09-21 深圳市华汉伟业科技有限公司 Target detection method and system with rotation adaptability
CN115294452A (en) * 2022-08-08 2022-11-04 中国人民解放军火箭军工程大学 Rotary SAR ship target detection method based on bidirectional characteristic pyramid network
CN116310837A (en) * 2023-04-11 2023-06-23 安徽大学 SAR ship target rotation detection method and system
CN116310837B (en) * 2023-04-11 2024-04-23 安徽大学 SAR ship target rotation detection method and system
CN116823838A (en) * 2023-08-31 2023-09-29 武汉理工大学三亚科教创新园 Ocean ship detection method and system with Gaussian prior label distribution and characteristic decoupling
CN116823838B (en) * 2023-08-31 2023-11-14 武汉理工大学三亚科教创新园 Ocean ship detection method and system with Gaussian prior label distribution and characteristic decoupling

Also Published As

Publication number Publication date
CN111881918B (en) 2022-10-04

Similar Documents

Publication Publication Date Title
CN111881918B (en) Multi-scale rotating ship target detection algorithm
CN110276269B (en) Remote sensing image target detection method based on attention mechanism
CN109583425B (en) Remote sensing image ship integrated recognition method based on deep learning
CN111563473B (en) Remote sensing ship identification method based on dense feature fusion and pixel level attention
Xu et al. Scale-aware feature pyramid architecture for marine object detection
CN112560671B (en) Ship detection method based on rotary convolution neural network
CN110728658A (en) High-resolution remote sensing image weak target detection method based on deep learning
CN111179285B (en) Image processing method, system and storage medium
CN111091095B (en) Method for detecting ship target in remote sensing image
US20080040083A1 (en) System and Method for Solid Component Evaluation in Mixed Ground Glass Nodules
CN106651880B (en) Offshore moving target detection method based on multi-feature fusion thermal infrared remote sensing image
CN111476159A (en) Method and device for training and detecting detection model based on double-angle regression
CN111027445B (en) Marine ship target identification method
CN111914804A (en) Multi-angle rotation remote sensing image small target detection method
CN112800955A (en) Remote sensing image rotating target detection method and system based on weighted bidirectional feature pyramid
CN111723632A (en) Ship tracking method and system based on twin network
CN110309808B (en) Self-adaptive smoke root node detection method in large-scale space
CN114627156A (en) Consumption-level unmanned aerial vehicle video moving target accurate tracking method
CN107808165B (en) Infrared image matching method based on SUSAN corner detection
Shi et al. Obstacle type recognition in visual images via dilated convolutional neural network for unmanned surface vehicles
CN116311387B (en) Cross-modal pedestrian re-identification method based on feature intersection
von Braun et al. Utilizing mask R-CNN for waterline detection in CANOE sprint video analysis
Wang et al. High-quality angle prediction for oriented object detection in remote sensing images
CN113409325B (en) Large-breadth SAR image ship target detection and identification method based on fine segmentation
CN115376007A (en) Object detection method, device, equipment, medium and computer program product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant