CN114565824A

CN114565824A - Single-stage rotating ship detection method based on full convolution network

Info

Publication number: CN114565824A
Application number: CN202210198503.9A
Authority: CN
Inventors: 杨淑媛; 李源钊; 冯志玺; 王敏; 高欣怡; 谭豪; 柯希鹏; 李奕彤; 翟蕾; 李宇星; 焦李成
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2022-03-02
Filing date: 2022-03-02
Publication date: 2022-05-31

Abstract

The invention provides a single-stage rotating ship detection method based on a full convolution network, which comprises the following steps: acquiring a training sample set and a test sample set; constructing a single-stage rotating ship target detection model; performing iterative training on the single-stage rotating ship target detection model; and detecting the positions and the category confidence degrees of the bounding boxes of all targets by the trained single-stage rotating ship target detection model. The invention is based on a full convolution single-stage target detection network model for detecting a horizontal frame, adds an angle branch and optimizes a network structure and a loss function on the basis of the original network, directly generates a prediction result pixel by pixel without an anchor frame through a characteristic diagram obtained by convolution of multiple layers of full convolution layers in the network, realizes the quick detection of a rotating ship target, improves the detection efficiency on the premise of ensuring the detection precision of the rotating ship target, and can be used in the fields of offshore monitoring, marine defense early warning, marine maintenance right and the like.

Description

Single-stage rotating ship detection method based on full convolution network

Technical Field

The invention belongs to the technical field of image processing, relates to a remote sensing image ship detection method, and particularly relates to a rotating ship target detection method based on a deep learning remote sensing image, which can be used in the fields of offshore monitoring, marine defense early warning, marine right maintenance and the like.

Background

The development of the remote sensing technology is greatly helpful for people to know and explore the world, and the remote sensing technology has many distinct technical characteristics, such as wide data range, short acquisition period, strong data comprehensiveness and the like. The remote sensing rotary target detection means that the boundary frame of the target comprises the center point position and the length and the width of the horizontal boundary frame, and meanwhile, an offset angle mark is added. The detection of the rotary target in the remote sensing image is more challenging than the detection of a horizontal boundary frame, and due to the difference of shooting angles, the target object in the remote sensing aerial image is different from the object in a natural image, the arrangement direction of the target object is arbitrary and not perpendicular to the ground, and the target object is small in size and dense in arrangement, such as remote sensing targets of airplanes, ships, containers and the like. At this time, the effect of the horizontal bounding box fitting the shape of the target is not ideal, and the object can be better marked by using the rotating bounding box. Especially for marine ship targets, due to the difference of the length-width ratio of ships, the ship targets can be accurately selected by rotating the boundary frame, so that the class of the ship targets can be further accurately identified and the ship advancing direction can be further accurately judged; meanwhile, for a port scene, compared with a horizontal boundary frame, the rotary boundary frame has better identification accuracy and recall rate for ship targets which are arranged close to the shore.

Compared with the traditional target detection method, the deep learning-based method has the characteristics of high precision, high accuracy, end-to-end training test and the like, and is the mainstream remote sensing target detection method at present. The target detection method based on deep learning comprises a single-stage target detection method and a double-stage target detection method. The method comprises the following steps that a double-stage target detection method such as fast-RCNN (fast-RCNN) performs feature extraction and screening of a certain number of candidate frames in a first stage, fine regression and screening are performed on the candidate frames generated in the first stage in a second stage, and finally a detection result is obtained; the single-stage target detection method such as the YOLOv3 subtracts the step of performing refined regression on the candidate frame in the second stage, and the targets are regressed and classified directly through the characteristic points, so that the detection speed of the detection model is greatly improved, and compared with a double-stage model, the detection model is lighter in weight and more convenient to actually deploy and apply. Meanwhile, the design of the detector without the anchor frame is a new development trend, for example, the FCOS is an idea of getting rid of an anchor frame mechanism, a target detection task is converted into the estimation of key points, the operation memory consumption is effectively saved, and the detection effect of the small target is improved to a certain extent.

In order to detect a rotating ship target more accurately, researchers often start from two aspects of feature extraction and loss functions, for example, a patent application with the application publication number of CN112395969A entitled "a remote sensing image rotating ship detection method based on a feature pyramid" discloses a dual-stage rotating target detection method based on a convolutional neural network, and the invention uses the convolutional neural network based on the convolutional neural network to perform dual-stage target detection, and utilizes the convolutional neural network with stronger feature extraction capability to extract features; performing multi-scale detection by adopting a characteristic pyramid; the key point is that the rotating frame is directly utilized to detect the ship targets in different directions. The method solves the problems of insufficient feature extraction capability and particularity of multi-direction, different sizes, dense distribution and the like of ships in the traditional method, but the method adopts the design of anchor frames with rotating angles, and each feature point comprises 84 anchor frames, so that the detection precision is improved to a certain extent, but the detection speed is lost to a certain extent.

Disclosure of Invention

The invention aims to overcome the defects in the prior art, provides a single-stage rotating ship detection method based on a full convolution network, and is used for solving the technical problems of low accuracy and low efficiency of remote sensing rotating frame detection in the prior art.

In order to achieve the purpose, the technical scheme adopted by the invention comprises the following steps:

(1) acquiring a training sample set and a testing sample set:

(1a) obtaining M optical remote sensing ship images containing ship targets, carrying out rotating frame marking on the ship targets in each optical remote sensing ship image, and marking the ship targetsNormalizing the size of each injected optical remote sensing ship image, and then carrying out color standardization operation on the normalized images to obtain preprocessed T optical remote sensing ship images H ═ H with the size of W multiplied by R₁,H₂,…,H_m,…,H_MWherein M is more than or equal to 1000, H_mRepresenting the t-th preprocessed optical remote sensing ship image, wherein W and R represent the number of pixel points of picture lines and columns;

(1b) for each preprocessed optical remote sensing ship image H_mPerforming data enhancement, and combining Q preprocessed optical remote sensing ship images in H and optical remote sensing ship images corresponding to the optical remote sensing ship images after data enhancement into a training sample set H_trainForming a test sample set H by the rest Y preprocessed optical remote sensing ship images and the optical remote sensing ship images with the corresponding data enhanced_testWherein

M＝Q+Y；

(2) Constructing a single-stage rotating ship target detection model O based on a full convolution network:

(2a) constructing a structure of a single-stage rotating ship target detection model O based on a full convolution network:

constructing a single-stage rotating ship target detection model O comprising a backbone network and a detection network which are connected in sequence, wherein: the main network comprises a feature extraction sub-network and a feature enhancement sub-network which are connected in sequence; the feature enhancer network comprises three feature enhancement layers arranged in parallel; the detection network comprises a detection sub-network connected with each characteristic enhancement layer, and each detection sub-network comprises a first full convolution network and a second full convolution network which are arranged in parallel and are composed of a plurality of first full convolution layers; the output end of the first full convolution network is connected with a second full convolution layer, a third full convolution layer and a fourth full convolution layer which are arranged in parallel; the output end of the second full convolution network is connected with a fifth full convolution layer;

(2b) defining a loss function L of a single-stage rotating ship target detection model O:

L＝L_cls+αL_reg+βL_C

wherein x and y respectively represent the abscissa and ordinate of the feature point, and L_clsRepresents the class loss function, L_regRepresents the regression loss function, L_cRepresenting the centrality loss function, p_x,yIs the output of the feature point prediction category,

is a category label, t_x,yIs the output of the predictive regression of the feature points,

regression tag, N_posRepresenting the number of generated predicted targets,

represents an indicator when

When the temperature of the water is higher than the set temperature,

get 1 when

When the utility model is used, the water is discharged,

taking 0, alpha and beta as weight parameters, alpha + beta being 1, and the value range [0, 1%]；

(3) Performing iterative training on a single-stage rotating ship detection model based on a full convolution network:

(3a) the number of the initial iteration is T, the maximum iteration times is T, T is more than or equal to 10000, and the detection model of the T-th iteration single-stage rotating ship is O^tThe weights of the tth iteration detection network and the feature extraction sub-network are respectively omega_1t、ω_2tAnd make O^t＝O，t＝1；

(3b) Predicting the position, the rotary deflection angle, the central degree offset and the category confidence of each target;

(3b1) will be derived from the training sample set H_trainB training samples randomly selected from the training samples are used as a single-stage rotating ship detection model O^tExtracting a multi-scale feature map of a target in each training sample by a feature extraction sub-network in the backbone network; the feature enhancement sub-network performs feature enhancement on each feature map, wherein b is more than or equal to 8;

(3b2) a first full convolution network and a second full convolution network in the detection network respectively perform multi-layer full convolution operation on each feature map after feature enhancement, and a second full convolution layer, a third full convolution layer and a fourth full convolution layer respectively perform single-layer full convolution operation on each regression feature map obtained by the multi-layer full convolution operation of the first full convolution network to obtain a position label of a prediction target

Angle label

Centrality label

The fifth full convolution layer carries out single-layer full convolution operation on each classification feature map obtained by multi-layer full convolution operation of the second full convolution network to obtain a class confidence label of the predicted target

(3c) Using a loss function L and passing

And

calculating O^tLoss value L of^tBy back propagation through L^tCalculating O^tGradient λ of the parameter^tThen using a gradient descent method through lambda^tWeight omega of detection network and feature extraction sub-network_1tAnd ω_2tUpdating is carried out;

(3d) judging whether T is true or not, if so, obtaining a trained single-stage rotating ship detection network model O, otherwise, making T be T +1, and executing the step (3 b);

(4) obtaining a detection result of the single-stage rotating ship:

set H of test samples_testThe forward propagation is carried out as the input of a trained single-stage rotating ship detection model O to obtain the position label of a predicted target

Angle label

Centrality label

Category confidence labels

NMS for predicting targets using a threshold μ rotation non-maximum suppression method

Screening to obtain H_testThe bounding box and class confidence of each object contained.

Compared with the prior art, the invention has the following advantages:

1. in the process of training the single-stage rotating ship detection network model and acquiring the detection result of the single-stage rotating ship, the first full convolution network and the second full convolution network can acquire the position label, the angle label, the centrality label and the classification confidence label of the predicted target through single-layer full convolution operation

The influence of complex process for acquiring the parameters by the rotation angle anchor frame method on the detection speed in the prior art is avoided, and the ship detection efficiency with large length-width ratio is effectively improved.

2. The invention uses the full convolution network in the detection network, converts the height and width of the middle layer feature mapping of each layer of feature graph obtained by the main network into the size of the input feature graph through the transposition convolution layer, so that the prediction result is in one-to-one correspondence with the input feature graph in the space height and width, and directly predicts the target pixel by pixel point, thereby avoiding the defect of large calculated amount caused by the operation of generating a candidate frame in the first stage and regressing and screening the candidate frame in the second stage by the existing two-stage target detection technology, reducing the memory consumption and further improving the detection speed.

3. The loss function comprises centrality offset loss, loss calculation is carried out on offset of the center of the candidate frame and the corresponding feature point, the weight of a prediction frame with large central point offset is reduced, and compared with a traditional single-stage target detection model, the generation of a large number of low-quality candidate frames caused by the fact that the prediction central point is far away from the feature point is avoided; meanwhile, the loss function carries out weighted calculation on the angle offset and the position offset obtained by different full-volume integral branches, so that the training is easier to converge to a certain degree, and the detection precision is improved.

Drawings

FIG. 1 is a flow chart of an implementation of the present invention;

FIG. 2 is a schematic structural diagram of a single-stage rotating ship target detection model constructed by the invention;

FIG. 3 is a schematic diagram of remote sensing rotation frame labeling according to the present invention;

fig. 4 is a detection result diagram of a remote sensing visible light image according to an embodiment of the present invention.

Detailed Description

The invention is described in further detail below with reference to the figures and the specific embodiments.

Referring to fig. 1, the present invention includes the steps of:

step 1) obtaining a training sample set and a testing sample set:

(1a) obtaining M optical remote sensing ship images containing ship targets, carrying out rotating frame labeling on the ship targets in each optical remote sensing ship image, then carrying out size normalization on each optical remote sensing ship image subjected to ship target labeling, carrying out color standardization operation on the normalized images, and obtaining preprocessed T optical remote sensing ship images H (H) { H (H) } with the size of W multiplied by R (W) }₁,H₂,…,H_m,…,H_MWherein M is more than or equal to 1000, H_mRepresenting the t-th preprocessed optical remote sensing ship image, wherein W and R represent the number of pixel points of picture lines and columns;

referring to fig. 3, the labeling method is to use the horizontal coordinate x and the vertical coordinate y of the center of the rotating labeling frame, the length l and the width w of the rotating labeling frame, and the counterclockwise included angle theta between the long edge of the rotating labeling frame and the horizontal direction as the boundary frame position label of each rotating target; the size of the obtained remote sensing visible light images is different, the images are unified to a unified size by using size normalization operation, so that unified training and labeling of a network are facilitated, rgb three channels are converted into a bgr channel according to network input requirements, then standardization operation is performed, the two steps of operation aim at converting the size and color channels of the images, so that the input requirements of the network are met, convergence in the training process is easier, in the embodiment, RoLabel Img labeling software is used for labeling a ship target of a visible light remote sensing data set, W is 1024, H is 1024, and M is 1102;

(1b) for each preprocessed optical remote sensing ship image H_mNumber of advancesAccording to the enhancement, Q preprocessed optical remote sensing ship images in H and the optical remote sensing ship images corresponding to the Q preprocessed optical remote sensing ship images after data enhancement form a training sample set H_trainForming a test sample set H by the rest Y preprocessed optical remote sensing ship images and the optical remote sensing ship images with the corresponding data enhanced_testWherein

M＝Q+Y；

The data set is expanded in a data enhancement mode, so that the occurrence of a training overfitting situation is avoided, in the data enhancement mode in this example, one or more of rotation enhancement, flip enhancement, scaling enhancement and noise enhancement methods are randomly selected according to an occurrence probability of 0.5, in this embodiment, Q is 802, and Y is 300;

step 2) constructing a single-stage rotating ship target detection model O based on a full convolution network

(2a) Constructing a single-stage rotating ship target detection model O based on a full convolution network, wherein the structure of the model O is shown in FIG. 2;

constructing a single-stage rotating ship target detection model O comprising a backbone network and a detection network which are connected in sequence, wherein: the main network comprises a feature extraction sub-network and a feature enhancement sub-network which are connected in sequence; the detection network comprises a first full convolution network and a second full convolution network which are arranged in parallel and are composed of a plurality of first full convolution layers; the output end of the first full convolution network is connected with a second full convolution layer, a third full convolution layer and a fourth full convolution layer which are arranged in parallel; the output end of the second full convolution network is connected with a fifth full convolution layer;

the feature extraction sub-network in the backbone network comprises a first convolution layer, a maximum pooling layer and four block blocks which are connected in sequence; the feature enhancement network comprises three feature enhancement layers which are connected in sequence, wherein a first feature enhancement layer, a second feature enhancement layer and a third feature enhancement layer are respectively connected with a second block, a third block and a fourth block in a feature extraction sub-network;

the specific parameters of the feature extraction sub-network are as follows: the size of the first convolution kernel is 7 x 7, the step length of the convolution kernel is 2, and the number of the convolution kernels is 64; the first block comprises 3 first convolution blocks which are connected in sequence, the first convolution blocks comprise 2 convolution layers, the number of convolution kernels of the first convolution layer and the second convolution layer is 64, the size of the convolution kernels of the first convolution layer and the third convolution kernel is 1 x 1, and the number of convolution kernels of the third convolution layer is 128; the second block comprises 4 second convolution blocks which are connected in sequence, the second convolution block comprises 2 convolution layers, the number of convolution kernels of the first convolution layer and the second convolution layer is 128, the size of convolution kernels of the first convolution layer and the third convolution kernel is 1 x 1, and the number of convolution kernels of the other third convolution layer is 256; the third block comprises 6 third convolution blocks which are connected in sequence, the third convolution blocks comprise 2 convolution layers, the number of convolution kernels of the first convolution layer and the second convolution layer is 256, the size of the convolution kernels of the first convolution layer and the third convolution kernel is 1 x 1, and the number of convolution kernels of the other third convolution layer is 512; the fourth block comprises 3 fourth convolution blocks which are connected in sequence, the fourth convolution block comprises 2 convolution layers, the number of convolution kernels of the first convolution layer and the second convolution layer is 512, the size of convolution kernels of the first convolution layer and the third convolution layer is 1 x 1, and the number of convolution kernels of the other third convolution layer is 1024;

the first full convolution network and the second full convolution network in the detection network both comprise 4 first full convolution layers, and the specific parameters of the detection network are as follows: the size of the first full convolution layer convolution kernel is 3 multiplied by 3, and the number of channels is 256; the size of the second full convolution layer convolution kernel is 3 multiplied by 3, and the number of channels is 4; the size of the third full convolution layer convolution kernel is 3 multiplied by 3, and the number of channels is 1; the size of a fifth full convolution layer convolution kernel of the second full convolution network is 3 multiplied by 3, and the number of channels is L; where L represents the number of detection categories.

L＝L_cls+αL_reg+βL_C

wherein L is_clsIndicating class loss function, L_regRepresents a regression loss function, L_cRepresenting a centrality loss function, f_focallossRepresents the focalloss loss function, f_GIoURepresenting the GIoU loss function, f_centernessRepresenting the centenness loss function, p_x,yIs the output of the feature point prediction category,

regression tag, N_posRepresenting the number of generated predicted targets,

represents an indicator when

When the temperature of the water is higher than the set temperature,

get 1 when

When the temperature of the water is higher than the set temperature,

taking 0, alpha and beta as weight parameters, alpha + beta being 1, and the value range [0, 1%](ii) a Class loss function f_focallossRegression loss function f_GIoUCentral loss function f_centernessThe concrete formulas are respectively as follows:

f_focalloss＝-(1-p_t)^γlog(p_t)

wherein p is_tRepresenting the confidence of generating label, gamma is a hyperparameter, and takes [0, 5%]，A^pArea of candidate frame representing feature point generation, A^gRepresenting the real boxed area containing the feature points, I represents A^pAnd A^gThe overlapping area, l, t, r, b respectively represents the distance from the center point to the left side, the upper side, the right side and the lower side of the regression frame label;

the method is rotary frame target detection, compared with horizontal frame detection, angle branches are introduced into the rotary frame detection, and normalization is performed in a weighting mode; solving the problem of sample imbalance by using focalloss loss as a classified loss function; the regression loss uses GIoU loss, the intersection and parallel ratio between the prediction frame and the real label frame is calculated, compared with the traditional IoU loss and smoothL1 loss, the GIoU loss not only focuses on the overlapping area, but also focuses on other non-overlapping areas, and the specific position information can be more accurately reflected on the ship target with the large length-width ratio; introducing centrality offset loss centerness, calculating the offset between the feature point and the central point of the real label, further filtering the edge target in the training process, accelerating the convergence speed, and improving the quality of the generated candidate frame, wherein in the embodiment, α is 1/5, β is 4/5, and γ is 2;

step 3) performing iterative training on the single-stage rotating ship detection model based on the full convolution network:

In this embodiment, T is 30000 to ensure that the network training is more sufficient;

(3b1) will be derived from the training sample set H_trainB training samples randomly selected from the training samples are used as a single-stage rotating ship detection model O^tExtracting a multi-scale feature map of a target in each training sample by a feature extraction sub-network in the backbone network; the feature extraction sub-network performs feature enhancement on each feature map, wherein b is more than or equal to 8;

performing feature extraction on the input image by a feature extraction sub-network to obtain feature maps with different multi-level downsampling degrees; in the network forward propagation process, convolving feature graphs generated by convolving the 2 nd block, the 3 rd block and the 4 th block of the feature enhancement network respectively through 1 x 1 convolution kernels to obtain a corresponding first feature enhancement layer, a corresponding second feature enhancement layer and a corresponding third feature enhancement layer; the third characteristic enhancement layer obtains and adds the characteristic diagram with the same size as the second characteristic layer through up-sampling, and the second characteristic enhancement layer obtains and adds the characteristic diagram with the same size as the first characteristic layer through up-sampling, so that the effect of characteristic fusion is achieved, the characteristic semantic information of the low-layer characteristic diagram is less, but the target position is accurate; semantic information of the high-level feature map is rich, but the target position is rough, and different feature layers have strong semantic features through feature fusion operation of the feature enhancement sub-network; in this embodiment, b is 8, and b is 8, which is designed to be limited by the display memory of the experimental device, and the maximum number of training samples for each iteration can only be set to 8, otherwise, the number exceeds the memory;

(3b2) a first full convolution network and a second full convolution network in the detection network respectively carry out multi-layer full convolution operation on each feature map after feature enhancement, and a second full convolution layer, a third full convolution layer and a fourth full convolution layer respectively carry out single-layer full convolution operation on each regression feature map obtained by the multi-layer full convolution operation of the first full convolution network to obtain a position label

Angle label

Centrality label

The fifth full convolution layer performs full convolution operation on each classification feature graph obtained by multilayer full convolution operation of the second full convolution network to obtain a class confidence label of the predicted target

Connecting the fourth full convolution layer for calculating the central degree offset into the first full convolution network instead of the second full convolution network, wherein the fourth full convolution layer is used for sharing the same first full convolution network with the second full convolution layer for obtaining the regression candidate frame position label and the third full convolution layer for obtaining the regression candidate frame deflection angle label, so that richer position information is obtained, and the calculated central point offset is more accurate;

(3c) using a loss function L and passing

And

the weight value updating formula in the step 3c) is as follows:

wherein eta represents learning step length, 0.0001-0.1, omega_2t' and omega_1tRespectively represent omega_1tAnd omega_2tAs a result of the update, the result of the update,

representing the partial derivative calculation. In this embodiment, the optimizer function uses a random gradient descent SGD, and the learning rate is attenuated when the network iterates for a certain number of times in order to prevent the loss function from falling into a local minimum, where the initial learning rate η is 0.01, the learning rate η is 0.001 when the network iterates for the 1 st ten thousand, and the learning rate η is 0.0001 when the network iterates for the 2 nd ten thousand;

step 4) obtaining a detection result of the single-stage rotating ship:

set H of test samples_testCarrying out forward propagation as the input of the trained single-stage rotating ship detection model O to obtain a position label of a predicted target

Angle label

Centrality label

Category confidence labels

NMS on predicted targets using a threshold μ rotation non-maximum inhibition method

Screening to obtain H_testThe bounding box and class confidence of each object contained. The detection result of each image of the test set is output, the result is shown in figure 4,

the detection result schematic diagram is shown in fig. 4(a) and 4(b), wherein fig. 4(a) and 4(b) are ship detection results of remote sensing images, ship targets are all detected by using a rotating frame, and classification labels and confidence degrees of the corresponding targets are marked; in this example, μ ═ 0.5.

The technical effects of the present invention are further illustrated by the following experiments:

1. simulation conditions and contents:

the experiment simulation platform uses a processor Intel Xeon CPU E5-2680V 3 for the experiment, the main frequency of the processor is 2.50GHz, the internal memory is 128GB, and the display card is NVIDIA GTX TITAN V. The operating system is ubuntu 18.04. The software platform constructs and trains neural network models for python 3.8.11 and pytorech 1.7.0, accelerated using Nvidia Cuda 10.1 and Cudnn v 8.

The target detection evaluation index adopted in the simulation is that mAP (mean Average precision), each type of detection AP and FPS are adopted as evaluation indexes in the experiment, and the evaluation indexes are used as main evaluation indexes in the field of target detection and are comprehensive indexes capable of reflecting the performance of each aspect of the algorithm. AP (average Precision) indicates the average Precision under different Recall rates (Recall), where Precision (Precision) is the ratio of correctly detected samples to the total number of detections, Recall rate is the ratio of correctly detected samples to all true samples, and mAP is the average of all kinds of APs. There are two methods for calculating the AP: the first is the calculation method of PASCAL VOC CHALLENGE before 2010, first setting a set of thresholds [0,0.1,0.2, …,1 ]. Then for recall greater than each threshold, a corresponding maximum precision is obtained, thus calculating 11 precisions. The AP is the average of these 11 precisions. This method is called 11-point interpolated average precision (11-point interpolated average precision). The second method is a computational method modified from 2010 by the PASCAL VOC CHARGEN. The new calculation method assumes that M positive examples exist in N samples, then M call values [1/M, 2/M., M/M ] are obtained, for each call value r, the maximum precision corresponding to (r' > = r) can be calculated, and then the M precision values are averaged to obtain the final AP value. The APs herein are all calculated using the second criterion. And for the target detection task, calculating the intersection ratio of all the predicted bounding boxes determined as targets and GT (GT) (ground Truth), and determining as a correct detection result if the intersection ratio is greater than the set threshold value of 0.5. In addition, fps (frames Per second) represents the number of pictures detected by the model Per second, and is obtained by dividing the total number of detected pictures by the total detection time, and is used for describing the detection speed of the model.

The existing single-stage and double-stage rotating frame target detection network used in simulation: including the comparison experiment of RoI-Transformer, Faster R-CNN with increased angular branching, Retianet with increased angular branching and oriented R-CNN. The HRSC2016 data set is used for training the network of the invention to be compared with the existing rotating frame target detection networks of RoITrans and Faster R-cnn-R, retinanet-R, and the accuracy mAP and the inference speed FPS are respectively compared.

The detection efficiency and detection accuracy of the present invention and the prior art were simulated separately, and the results are shown in table 1.

2. And (3) simulation result analysis:

TABLE 1

Detection method	mAP	FPS
			Faster R-CNN R	74.43	15.9
Retinanet R	69.21	18.4
			RoI trans	90.13	13.3
Oriented R-CNN	89.94	13.8
			FCR-det	89.27	16.8

As can be seen from table 1, under the condition of not losing the detection accuracy, the detection speed can be significantly increased by using the method provided by the present invention, and the detection speed is significantly increased compared to the dual-stage target detection module, so that the method is easier to deploy at each mobile terminal.

Claims

1. A single-stage rotating ship detection method based on a full convolution network is characterized by comprising the following steps:

(1) acquiring a training sample set and a testing sample set:

(1a) obtaining M optical remote sensing ship images containing ship targets, labeling the ship targets in each optical remote sensing ship image by a rotating frame, then carrying out size normalization on each optical remote sensing ship image subjected to ship target labeling, and carrying out color standardization operation on the normalized images to obtain preprocessed T optical remote sensing ship images H ═ R with the size of W × R₁,H₂,…,H_m,…,H_MWherein M is more than or equal to 1000, H_mOptical remote sensing ship representing preprocessed t frameThe image, W and R represent the number of pixel points of picture rows and columns;

(1b) for each preprocessed optical remote sensing ship image H_mPerforming data enhancement, and combining Q preprocessed optical remote sensing ship images in H and optical remote sensing ship images corresponding to the optical remote sensing ship images after data enhancement into a training sample set H_trainForming a test sample set H by the rest Y preprocessed optical remote sensing ship images and the optical remote sensing ship images with the corresponding data enhancement_testWherein

M＝Q+Y；

constructing a single-stage rotating ship target detection model O comprising a trunk network and a detection network which are connected in sequence, wherein: the main network comprises a feature extraction sub-network and a feature enhancement sub-network which are connected in sequence; the feature enhancer network comprises three feature enhancement layers arranged in parallel; the detection network comprises a detection sub-network connected with each characteristic enhancement layer, and each detection sub-network comprises a first full convolution network and a second full convolution network which are arranged in parallel and are composed of a plurality of first full convolution layers; the output end of the first full convolution network is connected with a second full convolution layer, a third full convolution layer and a fourth full convolution layer which are arranged in parallel; the output end of the second full convolution network is connected with a fifth full convolution layer;

(2b) defining a loss function L and a category loss function L of a single-stage rotating ship target detection model O_clsRegression loss function L_regCentral loss function L_c：

L＝L_cls+αL_reg+βL_C

Wherein p is_x,yIs the output of the feature point prediction category,

the regression label is returned to the user terminal,

represents an indicator when

When the temperature of the water is higher than the set temperature,

get 1 when

When the temperature of the water is higher than the set temperature,

(3a) the number of the initialized iteration is T, the maximum iteration times is T, T is more than or equal to 10000, and the detection model of the T-th iteration single-stage rotating ship is O^tThe weight score of the tth iteration detection network and the feature extraction sub-networkIs otherwise omega_1t、ω_2tAnd make O^t＝O，t＝1；

Angle label

Centrality label

The fifth full convolution layer performs full convolution operation on each classification feature map obtained by multi-layer full convolution operation of the second full convolution network to obtain a class confidence label of the predicted target

(3c) Using a loss function L and passing

And

(4) obtaining a detection result of the single-stage rotating ship:

test sample set H_testThe forward propagation is carried out as the input of a trained single-stage rotating ship detection model O to obtain the position label of a predicted target

Angle label

Centrality label

Category confidence labels

2. The full convolution network-based single-stage rotating ship detection method according to claim 1, wherein the rotating frame labeling is performed on the ship target in each optical remote sensing ship image in step (1a), and the implementation method is as follows: and taking a horizontal coordinate x and a vertical coordinate y of the center of the rotary labeling frame, the length l and the width w of the rotary labeling frame and a counterclockwise included angle theta between the long edge of the rotary labeling frame and the horizontal direction as a boundary frame position label of each rotary target.

3. The full convolution network-based single-stage rotating ship detection method according to claim 1, wherein the step (1b) is implemented by performing preprocessing on each preprocessed optical remote sensing ship image H_tAnd performing data enhancement by adopting a rotation enhancement method, a turnover enhancement method, a scaling enhancement method or a noise enhancement method.

4. The full convolution network based single-stage rotating vessel detection method according to claim 1, wherein the single-stage rotating vessel target detection model O in the step (2a) is a model in which:

the specific parameters of the feature extraction subnetwork are: the size of the first convolution kernel is 7 x 7, the step size of the convolution kernel is 2, and the number of the convolution kernels is 64; the first block comprises 3 first convolution blocks which are connected in sequence, the first convolution blocks comprise 2 convolution layers, the number of convolution kernels of the first convolution layer and the second convolution layer is 64, the size of the convolution kernels of the first convolution layer and the third convolution kernel is 1 x 1, and the number of convolution kernels of the third convolution layer is 128; the second block comprises 4 second convolution blocks which are connected in sequence, the second convolution block comprises 2 convolution layers, the number of convolution kernels of the first convolution layer and the second convolution layer is 128, the size of convolution kernels of the first convolution layer and the third convolution kernel is 1 x 1, and the number of convolution kernels of the other third convolution layer is 256; the third block comprises 6 third convolution blocks which are connected in sequence, the third convolution blocks comprise 2 convolution layers, the number of convolution kernels of the first convolution layer and the second convolution layer is 256, the size of the convolution kernels of the first convolution layer and the third convolution kernel is 1 x 1, and the number of convolution kernels of the other third convolution layer is 512; the fourth block comprises 3 fourth convolution blocks which are connected in sequence, the fourth convolution block comprises 2 convolution layers, the number of convolution kernels of the first convolution layer and the second convolution layer is 512, the size of convolution kernels of the first convolution layer and the third convolution layer is 1 x 1, and the number of convolution kernels of the other third convolution layer is 1024;

5. The full convolution network-based single-stage rotating vessel detection method according to claim 1, wherein the loss function L and the class loss function L for constructing the single-stage rotating vessel target detection model O in the step (2b)_regRegression loss function L_regCentral loss function L_cThe specific loss function formulas used are respectively as follows:

f_focalloss＝-(1-p_t)^γlog(p_t)

wherein f is_focallossRepresents the focal loss function, f_GIoURepresenting the GIoU loss function, L_cRepresenting the centenness loss function; p is a radical of_tRepresenting the confidence of the generated label, gamma is a hyperparameter, and [0,5 ] is taken]，A^pArea of candidate frame representing feature point generation, A^gRepresenting the real boxed area containing the feature points, I represents A^pAnd A^gThe overlap area, l, t, r, b, respectively represents the distance from the center point to the left, top, right, and bottom sides of the regression box label.

6. The full convolution network-based single-stage rotating ship detection method according to claim 1, wherein weights of the tth iteration detection network and the feature extraction sub-network in the step (3a) are ω and ω, respectively¹And omega²Wherein ω is¹Network weights trained on ImageNet are adopted

Carry out initialization, omega²Initialization is performed using the He initialization method.

7. The full convolution network-based single-stage rotating ship detection method according to claim 1, wherein in the step (3c), the detection network and the weight ω of the feature extraction sub-network are set to be zero_1tAnd omega_2tThe update formulas of (a) are respectively:

wherein eta represents learning step length, 0.0001-0.1, omega_2t' and omega_1tRespectively represent omega_1tAnd ω_2tAs a result of the update, the result of the update,

representing partial derivativesAnd (4) calculating.