CN114565824A - Single-stage rotating ship detection method based on full convolution network - Google Patents

Single-stage rotating ship detection method based on full convolution network Download PDF

Info

Publication number
CN114565824A
CN114565824A CN202210198503.9A CN202210198503A CN114565824A CN 114565824 A CN114565824 A CN 114565824A CN 202210198503 A CN202210198503 A CN 202210198503A CN 114565824 A CN114565824 A CN 114565824A
Authority
CN
China
Prior art keywords
convolution
network
ship
layer
full convolution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210198503.9A
Other languages
Chinese (zh)
Inventor
杨淑媛
李源钊
冯志玺
王敏
高欣怡
谭豪
柯希鹏
李奕彤
翟蕾
李宇星
焦李成
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN202210198503.9A priority Critical patent/CN114565824A/en
Publication of CN114565824A publication Critical patent/CN114565824A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Abstract

The invention provides a single-stage rotating ship detection method based on a full convolution network, which comprises the following steps: acquiring a training sample set and a test sample set; constructing a single-stage rotating ship target detection model; performing iterative training on the single-stage rotating ship target detection model; and detecting the positions and the category confidence degrees of the bounding boxes of all targets by the trained single-stage rotating ship target detection model. The invention is based on a full convolution single-stage target detection network model for detecting a horizontal frame, adds an angle branch and optimizes a network structure and a loss function on the basis of the original network, directly generates a prediction result pixel by pixel without an anchor frame through a characteristic diagram obtained by convolution of multiple layers of full convolution layers in the network, realizes the quick detection of a rotating ship target, improves the detection efficiency on the premise of ensuring the detection precision of the rotating ship target, and can be used in the fields of offshore monitoring, marine defense early warning, marine maintenance right and the like.

Description

Single-stage rotating ship detection method based on full convolution network
Technical Field
The invention belongs to the technical field of image processing, relates to a remote sensing image ship detection method, and particularly relates to a rotating ship target detection method based on a deep learning remote sensing image, which can be used in the fields of offshore monitoring, marine defense early warning, marine right maintenance and the like.
Background
The development of the remote sensing technology is greatly helpful for people to know and explore the world, and the remote sensing technology has many distinct technical characteristics, such as wide data range, short acquisition period, strong data comprehensiveness and the like. The remote sensing rotary target detection means that the boundary frame of the target comprises the center point position and the length and the width of the horizontal boundary frame, and meanwhile, an offset angle mark is added. The detection of the rotary target in the remote sensing image is more challenging than the detection of a horizontal boundary frame, and due to the difference of shooting angles, the target object in the remote sensing aerial image is different from the object in a natural image, the arrangement direction of the target object is arbitrary and not perpendicular to the ground, and the target object is small in size and dense in arrangement, such as remote sensing targets of airplanes, ships, containers and the like. At this time, the effect of the horizontal bounding box fitting the shape of the target is not ideal, and the object can be better marked by using the rotating bounding box. Especially for marine ship targets, due to the difference of the length-width ratio of ships, the ship targets can be accurately selected by rotating the boundary frame, so that the class of the ship targets can be further accurately identified and the ship advancing direction can be further accurately judged; meanwhile, for a port scene, compared with a horizontal boundary frame, the rotary boundary frame has better identification accuracy and recall rate for ship targets which are arranged close to the shore.
Compared with the traditional target detection method, the deep learning-based method has the characteristics of high precision, high accuracy, end-to-end training test and the like, and is the mainstream remote sensing target detection method at present. The target detection method based on deep learning comprises a single-stage target detection method and a double-stage target detection method. The method comprises the following steps that a double-stage target detection method such as fast-RCNN (fast-RCNN) performs feature extraction and screening of a certain number of candidate frames in a first stage, fine regression and screening are performed on the candidate frames generated in the first stage in a second stage, and finally a detection result is obtained; the single-stage target detection method such as the YOLOv3 subtracts the step of performing refined regression on the candidate frame in the second stage, and the targets are regressed and classified directly through the characteristic points, so that the detection speed of the detection model is greatly improved, and compared with a double-stage model, the detection model is lighter in weight and more convenient to actually deploy and apply. Meanwhile, the design of the detector without the anchor frame is a new development trend, for example, the FCOS is an idea of getting rid of an anchor frame mechanism, a target detection task is converted into the estimation of key points, the operation memory consumption is effectively saved, and the detection effect of the small target is improved to a certain extent.
In order to detect a rotating ship target more accurately, researchers often start from two aspects of feature extraction and loss functions, for example, a patent application with the application publication number of CN112395969A entitled "a remote sensing image rotating ship detection method based on a feature pyramid" discloses a dual-stage rotating target detection method based on a convolutional neural network, and the invention uses the convolutional neural network based on the convolutional neural network to perform dual-stage target detection, and utilizes the convolutional neural network with stronger feature extraction capability to extract features; performing multi-scale detection by adopting a characteristic pyramid; the key point is that the rotating frame is directly utilized to detect the ship targets in different directions. The method solves the problems of insufficient feature extraction capability and particularity of multi-direction, different sizes, dense distribution and the like of ships in the traditional method, but the method adopts the design of anchor frames with rotating angles, and each feature point comprises 84 anchor frames, so that the detection precision is improved to a certain extent, but the detection speed is lost to a certain extent.
Disclosure of Invention
The invention aims to overcome the defects in the prior art, provides a single-stage rotating ship detection method based on a full convolution network, and is used for solving the technical problems of low accuracy and low efficiency of remote sensing rotating frame detection in the prior art.
In order to achieve the purpose, the technical scheme adopted by the invention comprises the following steps:
(1) acquiring a training sample set and a testing sample set:
(1a) obtaining M optical remote sensing ship images containing ship targets, carrying out rotating frame marking on the ship targets in each optical remote sensing ship image, and marking the ship targetsNormalizing the size of each injected optical remote sensing ship image, and then carrying out color standardization operation on the normalized images to obtain preprocessed T optical remote sensing ship images H ═ H with the size of W multiplied by R1,H2,…,Hm,…,HMWherein M is more than or equal to 1000, HmRepresenting the t-th preprocessed optical remote sensing ship image, wherein W and R represent the number of pixel points of picture lines and columns;
(1b) for each preprocessed optical remote sensing ship image HmPerforming data enhancement, and combining Q preprocessed optical remote sensing ship images in H and optical remote sensing ship images corresponding to the optical remote sensing ship images after data enhancement into a training sample set HtrainForming a test sample set H by the rest Y preprocessed optical remote sensing ship images and the optical remote sensing ship images with the corresponding data enhancedtestWherein
Figure BDA0003528161790000031
M=Q+Y;
(2) Constructing a single-stage rotating ship target detection model O based on a full convolution network:
(2a) constructing a structure of a single-stage rotating ship target detection model O based on a full convolution network:
constructing a single-stage rotating ship target detection model O comprising a backbone network and a detection network which are connected in sequence, wherein: the main network comprises a feature extraction sub-network and a feature enhancement sub-network which are connected in sequence; the feature enhancer network comprises three feature enhancement layers arranged in parallel; the detection network comprises a detection sub-network connected with each characteristic enhancement layer, and each detection sub-network comprises a first full convolution network and a second full convolution network which are arranged in parallel and are composed of a plurality of first full convolution layers; the output end of the first full convolution network is connected with a second full convolution layer, a third full convolution layer and a fourth full convolution layer which are arranged in parallel; the output end of the second full convolution network is connected with a fifth full convolution layer;
(2b) defining a loss function L of a single-stage rotating ship target detection model O:
L=Lcls+αLreg+βLC
Figure BDA0003528161790000032
Figure BDA0003528161790000033
Figure BDA0003528161790000034
wherein x and y respectively represent the abscissa and ordinate of the feature point, and LclsRepresents the class loss function, LregRepresents the regression loss function, LcRepresenting the centrality loss function, px,yIs the output of the feature point prediction category,
Figure BDA0003528161790000035
is a category label, tx,yIs the output of the predictive regression of the feature points,
Figure BDA0003528161790000036
regression tag, NposRepresenting the number of generated predicted targets,
Figure BDA0003528161790000041
represents an indicator when
Figure BDA0003528161790000042
When the temperature of the water is higher than the set temperature,
Figure BDA0003528161790000043
get 1 when
Figure BDA0003528161790000044
When the utility model is used, the water is discharged,
Figure BDA0003528161790000045
taking 0, alpha and beta as weight parameters, alpha + beta being 1, and the value range [0, 1%];
(3) Performing iterative training on a single-stage rotating ship detection model based on a full convolution network:
(3a) the number of the initial iteration is T, the maximum iteration times is T, T is more than or equal to 10000, and the detection model of the T-th iteration single-stage rotating ship is OtThe weights of the tth iteration detection network and the feature extraction sub-network are respectively omega1t、ω2tAnd make Ot=O,t=1;
(3b) Predicting the position, the rotary deflection angle, the central degree offset and the category confidence of each target;
(3b1) will be derived from the training sample set HtrainB training samples randomly selected from the training samples are used as a single-stage rotating ship detection model OtExtracting a multi-scale feature map of a target in each training sample by a feature extraction sub-network in the backbone network; the feature enhancement sub-network performs feature enhancement on each feature map, wherein b is more than or equal to 8;
(3b2) a first full convolution network and a second full convolution network in the detection network respectively perform multi-layer full convolution operation on each feature map after feature enhancement, and a second full convolution layer, a third full convolution layer and a fourth full convolution layer respectively perform single-layer full convolution operation on each regression feature map obtained by the multi-layer full convolution operation of the first full convolution network to obtain a position label of a prediction target
Figure BDA0003528161790000046
Angle label
Figure BDA0003528161790000047
Centrality label
Figure BDA0003528161790000048
The fifth full convolution layer carries out single-layer full convolution operation on each classification feature map obtained by multi-layer full convolution operation of the second full convolution network to obtain a class confidence label of the predicted target
Figure BDA0003528161790000049
(3c) Using a loss function L and passing
Figure BDA00035281617900000410
And
Figure BDA00035281617900000411
calculating OtLoss value L oftBy back propagation through LtCalculating OtGradient λ of the parametertThen using a gradient descent method through lambdatWeight omega of detection network and feature extraction sub-network1tAnd ω2tUpdating is carried out;
(3d) judging whether T is true or not, if so, obtaining a trained single-stage rotating ship detection network model O, otherwise, making T be T +1, and executing the step (3 b);
(4) obtaining a detection result of the single-stage rotating ship:
set H of test samplestestThe forward propagation is carried out as the input of a trained single-stage rotating ship detection model O to obtain the position label of a predicted target
Figure BDA00035281617900000412
Angle label
Figure BDA00035281617900000413
Centrality label
Figure BDA00035281617900000414
Category confidence labels
Figure BDA0003528161790000051
NMS for predicting targets using a threshold μ rotation non-maximum suppression method
Figure BDA0003528161790000052
Figure BDA0003528161790000053
Screening to obtain HtestThe bounding box and class confidence of each object contained.
Compared with the prior art, the invention has the following advantages:
1. in the process of training the single-stage rotating ship detection network model and acquiring the detection result of the single-stage rotating ship, the first full convolution network and the second full convolution network can acquire the position label, the angle label, the centrality label and the classification confidence label of the predicted target through single-layer full convolution operation
Figure BDA0003528161790000054
The influence of complex process for acquiring the parameters by the rotation angle anchor frame method on the detection speed in the prior art is avoided, and the ship detection efficiency with large length-width ratio is effectively improved.
2. The invention uses the full convolution network in the detection network, converts the height and width of the middle layer feature mapping of each layer of feature graph obtained by the main network into the size of the input feature graph through the transposition convolution layer, so that the prediction result is in one-to-one correspondence with the input feature graph in the space height and width, and directly predicts the target pixel by pixel point, thereby avoiding the defect of large calculated amount caused by the operation of generating a candidate frame in the first stage and regressing and screening the candidate frame in the second stage by the existing two-stage target detection technology, reducing the memory consumption and further improving the detection speed.
3. The loss function comprises centrality offset loss, loss calculation is carried out on offset of the center of the candidate frame and the corresponding feature point, the weight of a prediction frame with large central point offset is reduced, and compared with a traditional single-stage target detection model, the generation of a large number of low-quality candidate frames caused by the fact that the prediction central point is far away from the feature point is avoided; meanwhile, the loss function carries out weighted calculation on the angle offset and the position offset obtained by different full-volume integral branches, so that the training is easier to converge to a certain degree, and the detection precision is improved.
Drawings
FIG. 1 is a flow chart of an implementation of the present invention;
FIG. 2 is a schematic structural diagram of a single-stage rotating ship target detection model constructed by the invention;
FIG. 3 is a schematic diagram of remote sensing rotation frame labeling according to the present invention;
fig. 4 is a detection result diagram of a remote sensing visible light image according to an embodiment of the present invention.
Detailed Description
The invention is described in further detail below with reference to the figures and the specific embodiments.
Referring to fig. 1, the present invention includes the steps of:
step 1) obtaining a training sample set and a testing sample set:
(1a) obtaining M optical remote sensing ship images containing ship targets, carrying out rotating frame labeling on the ship targets in each optical remote sensing ship image, then carrying out size normalization on each optical remote sensing ship image subjected to ship target labeling, carrying out color standardization operation on the normalized images, and obtaining preprocessed T optical remote sensing ship images H (H) { H (H) } with the size of W multiplied by R (W) }1,H2,…,Hm,…,HMWherein M is more than or equal to 1000, HmRepresenting the t-th preprocessed optical remote sensing ship image, wherein W and R represent the number of pixel points of picture lines and columns;
referring to fig. 3, the labeling method is to use the horizontal coordinate x and the vertical coordinate y of the center of the rotating labeling frame, the length l and the width w of the rotating labeling frame, and the counterclockwise included angle theta between the long edge of the rotating labeling frame and the horizontal direction as the boundary frame position label of each rotating target; the size of the obtained remote sensing visible light images is different, the images are unified to a unified size by using size normalization operation, so that unified training and labeling of a network are facilitated, rgb three channels are converted into a bgr channel according to network input requirements, then standardization operation is performed, the two steps of operation aim at converting the size and color channels of the images, so that the input requirements of the network are met, convergence in the training process is easier, in the embodiment, RoLabel Img labeling software is used for labeling a ship target of a visible light remote sensing data set, W is 1024, H is 1024, and M is 1102;
(1b) for each preprocessed optical remote sensing ship image HmNumber of advancesAccording to the enhancement, Q preprocessed optical remote sensing ship images in H and the optical remote sensing ship images corresponding to the Q preprocessed optical remote sensing ship images after data enhancement form a training sample set HtrainForming a test sample set H by the rest Y preprocessed optical remote sensing ship images and the optical remote sensing ship images with the corresponding data enhancedtestWherein
Figure BDA0003528161790000061
M=Q+Y;
The data set is expanded in a data enhancement mode, so that the occurrence of a training overfitting situation is avoided, in the data enhancement mode in this example, one or more of rotation enhancement, flip enhancement, scaling enhancement and noise enhancement methods are randomly selected according to an occurrence probability of 0.5, in this embodiment, Q is 802, and Y is 300;
step 2) constructing a single-stage rotating ship target detection model O based on a full convolution network
(2a) Constructing a single-stage rotating ship target detection model O based on a full convolution network, wherein the structure of the model O is shown in FIG. 2;
constructing a single-stage rotating ship target detection model O comprising a backbone network and a detection network which are connected in sequence, wherein: the main network comprises a feature extraction sub-network and a feature enhancement sub-network which are connected in sequence; the detection network comprises a first full convolution network and a second full convolution network which are arranged in parallel and are composed of a plurality of first full convolution layers; the output end of the first full convolution network is connected with a second full convolution layer, a third full convolution layer and a fourth full convolution layer which are arranged in parallel; the output end of the second full convolution network is connected with a fifth full convolution layer;
the feature extraction sub-network in the backbone network comprises a first convolution layer, a maximum pooling layer and four block blocks which are connected in sequence; the feature enhancement network comprises three feature enhancement layers which are connected in sequence, wherein a first feature enhancement layer, a second feature enhancement layer and a third feature enhancement layer are respectively connected with a second block, a third block and a fourth block in a feature extraction sub-network;
the specific parameters of the feature extraction sub-network are as follows: the size of the first convolution kernel is 7 x 7, the step length of the convolution kernel is 2, and the number of the convolution kernels is 64; the first block comprises 3 first convolution blocks which are connected in sequence, the first convolution blocks comprise 2 convolution layers, the number of convolution kernels of the first convolution layer and the second convolution layer is 64, the size of the convolution kernels of the first convolution layer and the third convolution kernel is 1 x 1, and the number of convolution kernels of the third convolution layer is 128; the second block comprises 4 second convolution blocks which are connected in sequence, the second convolution block comprises 2 convolution layers, the number of convolution kernels of the first convolution layer and the second convolution layer is 128, the size of convolution kernels of the first convolution layer and the third convolution kernel is 1 x 1, and the number of convolution kernels of the other third convolution layer is 256; the third block comprises 6 third convolution blocks which are connected in sequence, the third convolution blocks comprise 2 convolution layers, the number of convolution kernels of the first convolution layer and the second convolution layer is 256, the size of the convolution kernels of the first convolution layer and the third convolution kernel is 1 x 1, and the number of convolution kernels of the other third convolution layer is 512; the fourth block comprises 3 fourth convolution blocks which are connected in sequence, the fourth convolution block comprises 2 convolution layers, the number of convolution kernels of the first convolution layer and the second convolution layer is 512, the size of convolution kernels of the first convolution layer and the third convolution layer is 1 x 1, and the number of convolution kernels of the other third convolution layer is 1024;
the first full convolution network and the second full convolution network in the detection network both comprise 4 first full convolution layers, and the specific parameters of the detection network are as follows: the size of the first full convolution layer convolution kernel is 3 multiplied by 3, and the number of channels is 256; the size of the second full convolution layer convolution kernel is 3 multiplied by 3, and the number of channels is 4; the size of the third full convolution layer convolution kernel is 3 multiplied by 3, and the number of channels is 1; the size of a fifth full convolution layer convolution kernel of the second full convolution network is 3 multiplied by 3, and the number of channels is L; where L represents the number of detection categories.
(2b) Defining a loss function L of a single-stage rotating ship target detection model O:
L=Lcls+αLreg+βLC
Figure BDA0003528161790000081
Figure BDA0003528161790000082
Figure BDA0003528161790000083
wherein L isclsIndicating class loss function, LregRepresents a regression loss function, LcRepresenting a centrality loss function, ffocallossRepresents the focalloss loss function, fGIoURepresenting the GIoU loss function, fcenternessRepresenting the centenness loss function, px,yIs the output of the feature point prediction category,
Figure BDA0003528161790000084
is a category label, tx,yIs the output of the predictive regression of the feature points,
Figure BDA0003528161790000085
regression tag, NposRepresenting the number of generated predicted targets,
Figure BDA0003528161790000086
represents an indicator when
Figure BDA0003528161790000087
When the temperature of the water is higher than the set temperature,
Figure BDA0003528161790000088
get 1 when
Figure BDA0003528161790000089
When the temperature of the water is higher than the set temperature,
Figure BDA00035281617900000810
taking 0, alpha and beta as weight parameters, alpha + beta being 1, and the value range [0, 1%](ii) a Class loss function ffocallossRegression loss function fGIoUCentral loss function fcenternessThe concrete formulas are respectively as follows:
ffocalloss=-(1-pt)γlog(pt)
Figure BDA00035281617900000811
Figure BDA00035281617900000812
wherein p istRepresenting the confidence of generating label, gamma is a hyperparameter, and takes [0, 5%],ApArea of candidate frame representing feature point generation, AgRepresenting the real boxed area containing the feature points, I represents ApAnd AgThe overlapping area, l, t, r, b respectively represents the distance from the center point to the left side, the upper side, the right side and the lower side of the regression frame label;
the method is rotary frame target detection, compared with horizontal frame detection, angle branches are introduced into the rotary frame detection, and normalization is performed in a weighting mode; solving the problem of sample imbalance by using focalloss loss as a classified loss function; the regression loss uses GIoU loss, the intersection and parallel ratio between the prediction frame and the real label frame is calculated, compared with the traditional IoU loss and smoothL1 loss, the GIoU loss not only focuses on the overlapping area, but also focuses on other non-overlapping areas, and the specific position information can be more accurately reflected on the ship target with the large length-width ratio; introducing centrality offset loss centerness, calculating the offset between the feature point and the central point of the real label, further filtering the edge target in the training process, accelerating the convergence speed, and improving the quality of the generated candidate frame, wherein in the embodiment, α is 1/5, β is 4/5, and γ is 2;
step 3) performing iterative training on the single-stage rotating ship detection model based on the full convolution network:
(3a) the number of the initial iteration is T, the maximum iteration times is T, T is more than or equal to 10000, and the detection model of the T-th iteration single-stage rotating ship is OtThe weights of the tth iteration detection network and the feature extraction sub-network are respectively omega1t、ω2tAnd make Ot=O,t=1;
In this embodiment, T is 30000 to ensure that the network training is more sufficient;
(3b) predicting the position, the rotary deflection angle, the central degree offset and the category confidence of each target;
(3b1) will be derived from the training sample set HtrainB training samples randomly selected from the training samples are used as a single-stage rotating ship detection model OtExtracting a multi-scale feature map of a target in each training sample by a feature extraction sub-network in the backbone network; the feature extraction sub-network performs feature enhancement on each feature map, wherein b is more than or equal to 8;
performing feature extraction on the input image by a feature extraction sub-network to obtain feature maps with different multi-level downsampling degrees; in the network forward propagation process, convolving feature graphs generated by convolving the 2 nd block, the 3 rd block and the 4 th block of the feature enhancement network respectively through 1 x 1 convolution kernels to obtain a corresponding first feature enhancement layer, a corresponding second feature enhancement layer and a corresponding third feature enhancement layer; the third characteristic enhancement layer obtains and adds the characteristic diagram with the same size as the second characteristic layer through up-sampling, and the second characteristic enhancement layer obtains and adds the characteristic diagram with the same size as the first characteristic layer through up-sampling, so that the effect of characteristic fusion is achieved, the characteristic semantic information of the low-layer characteristic diagram is less, but the target position is accurate; semantic information of the high-level feature map is rich, but the target position is rough, and different feature layers have strong semantic features through feature fusion operation of the feature enhancement sub-network; in this embodiment, b is 8, and b is 8, which is designed to be limited by the display memory of the experimental device, and the maximum number of training samples for each iteration can only be set to 8, otherwise, the number exceeds the memory;
(3b2) a first full convolution network and a second full convolution network in the detection network respectively carry out multi-layer full convolution operation on each feature map after feature enhancement, and a second full convolution layer, a third full convolution layer and a fourth full convolution layer respectively carry out single-layer full convolution operation on each regression feature map obtained by the multi-layer full convolution operation of the first full convolution network to obtain a position label
Figure BDA0003528161790000101
Angle label
Figure BDA0003528161790000102
Centrality label
Figure BDA0003528161790000103
The fifth full convolution layer performs full convolution operation on each classification feature graph obtained by multilayer full convolution operation of the second full convolution network to obtain a class confidence label of the predicted target
Figure BDA0003528161790000104
Connecting the fourth full convolution layer for calculating the central degree offset into the first full convolution network instead of the second full convolution network, wherein the fourth full convolution layer is used for sharing the same first full convolution network with the second full convolution layer for obtaining the regression candidate frame position label and the third full convolution layer for obtaining the regression candidate frame deflection angle label, so that richer position information is obtained, and the calculated central point offset is more accurate;
(3c) using a loss function L and passing
Figure BDA0003528161790000105
And
Figure BDA0003528161790000106
calculating OtLoss value L oftBy back propagation through LtCalculating OtGradient λ of the parametertThen using a gradient descent method through lambdatWeight omega of detection network and feature extraction sub-network1tAnd ω2tUpdating is carried out;
the weight value updating formula in the step 3c) is as follows:
Figure BDA0003528161790000107
Figure BDA0003528161790000108
wherein eta represents learning step length, 0.0001-0.1, omega2t' and omega1tRespectively represent omega1tAnd omega2tAs a result of the update, the result of the update,
Figure BDA0003528161790000109
representing the partial derivative calculation. In this embodiment, the optimizer function uses a random gradient descent SGD, and the learning rate is attenuated when the network iterates for a certain number of times in order to prevent the loss function from falling into a local minimum, where the initial learning rate η is 0.01, the learning rate η is 0.001 when the network iterates for the 1 st ten thousand, and the learning rate η is 0.0001 when the network iterates for the 2 nd ten thousand;
(3d) judging whether T is true or not, if so, obtaining a trained single-stage rotating ship detection network model O, otherwise, making T be T +1, and executing the step (3 b);
step 4) obtaining a detection result of the single-stage rotating ship:
set H of test samplestestCarrying out forward propagation as the input of the trained single-stage rotating ship detection model O to obtain a position label of a predicted target
Figure BDA0003528161790000111
Angle label
Figure BDA0003528161790000112
Centrality label
Figure BDA0003528161790000113
Category confidence labels
Figure BDA0003528161790000114
NMS on predicted targets using a threshold μ rotation non-maximum inhibition method
Figure BDA0003528161790000115
Figure BDA0003528161790000116
Screening to obtain HtestThe bounding box and class confidence of each object contained. The detection result of each image of the test set is output, the result is shown in figure 4,
the detection result schematic diagram is shown in fig. 4(a) and 4(b), wherein fig. 4(a) and 4(b) are ship detection results of remote sensing images, ship targets are all detected by using a rotating frame, and classification labels and confidence degrees of the corresponding targets are marked; in this example, μ ═ 0.5.
The technical effects of the present invention are further illustrated by the following experiments:
1. simulation conditions and contents:
the experiment simulation platform uses a processor Intel Xeon CPU E5-2680V 3 for the experiment, the main frequency of the processor is 2.50GHz, the internal memory is 128GB, and the display card is NVIDIA GTX TITAN V. The operating system is ubuntu 18.04. The software platform constructs and trains neural network models for python 3.8.11 and pytorech 1.7.0, accelerated using Nvidia Cuda 10.1 and Cudnn v 8.
The target detection evaluation index adopted in the simulation is that mAP (mean Average precision), each type of detection AP and FPS are adopted as evaluation indexes in the experiment, and the evaluation indexes are used as main evaluation indexes in the field of target detection and are comprehensive indexes capable of reflecting the performance of each aspect of the algorithm. AP (average Precision) indicates the average Precision under different Recall rates (Recall), where Precision (Precision) is the ratio of correctly detected samples to the total number of detections, Recall rate is the ratio of correctly detected samples to all true samples, and mAP is the average of all kinds of APs. There are two methods for calculating the AP: the first is the calculation method of PASCAL VOC CHALLENGE before 2010, first setting a set of thresholds [0,0.1,0.2, …,1 ]. Then for recall greater than each threshold, a corresponding maximum precision is obtained, thus calculating 11 precisions. The AP is the average of these 11 precisions. This method is called 11-point interpolated average precision (11-point interpolated average precision). The second method is a computational method modified from 2010 by the PASCAL VOC CHARGEN. The new calculation method assumes that M positive examples exist in N samples, then M call values [1/M, 2/M., M/M ] are obtained, for each call value r, the maximum precision corresponding to (r' > = r) can be calculated, and then the M precision values are averaged to obtain the final AP value. The APs herein are all calculated using the second criterion. And for the target detection task, calculating the intersection ratio of all the predicted bounding boxes determined as targets and GT (GT) (ground Truth), and determining as a correct detection result if the intersection ratio is greater than the set threshold value of 0.5. In addition, fps (frames Per second) represents the number of pictures detected by the model Per second, and is obtained by dividing the total number of detected pictures by the total detection time, and is used for describing the detection speed of the model.
The existing single-stage and double-stage rotating frame target detection network used in simulation: including the comparison experiment of RoI-Transformer, Faster R-CNN with increased angular branching, Retianet with increased angular branching and oriented R-CNN. The HRSC2016 data set is used for training the network of the invention to be compared with the existing rotating frame target detection networks of RoITrans and Faster R-cnn-R, retinanet-R, and the accuracy mAP and the inference speed FPS are respectively compared.
The detection efficiency and detection accuracy of the present invention and the prior art were simulated separately, and the results are shown in table 1.
2. And (3) simulation result analysis:
TABLE 1
Detection method mAP FPS
Faster R-CNN R 74.43 15.9
Retinanet R 69.21 18.4
RoI trans 90.13 13.3
Oriented R-CNN 89.94 13.8
FCR-det 89.27 16.8
As can be seen from table 1, under the condition of not losing the detection accuracy, the detection speed can be significantly increased by using the method provided by the present invention, and the detection speed is significantly increased compared to the dual-stage target detection module, so that the method is easier to deploy at each mobile terminal.

Claims (7)

1. A single-stage rotating ship detection method based on a full convolution network is characterized by comprising the following steps:
(1) acquiring a training sample set and a testing sample set:
(1a) obtaining M optical remote sensing ship images containing ship targets, labeling the ship targets in each optical remote sensing ship image by a rotating frame, then carrying out size normalization on each optical remote sensing ship image subjected to ship target labeling, and carrying out color standardization operation on the normalized images to obtain preprocessed T optical remote sensing ship images H ═ R with the size of W × R1,H2,…,Hm,…,HMWherein M is more than or equal to 1000, HmOptical remote sensing ship representing preprocessed t frameThe image, W and R represent the number of pixel points of picture rows and columns;
(1b) for each preprocessed optical remote sensing ship image HmPerforming data enhancement, and combining Q preprocessed optical remote sensing ship images in H and optical remote sensing ship images corresponding to the optical remote sensing ship images after data enhancement into a training sample set HtrainForming a test sample set H by the rest Y preprocessed optical remote sensing ship images and the optical remote sensing ship images with the corresponding data enhancementtestWherein
Figure FDA0003528161780000011
M=Q+Y;
(2) Constructing a single-stage rotating ship target detection model O based on a full convolution network:
(2a) constructing a structure of a single-stage rotating ship target detection model O based on a full convolution network:
constructing a single-stage rotating ship target detection model O comprising a trunk network and a detection network which are connected in sequence, wherein: the main network comprises a feature extraction sub-network and a feature enhancement sub-network which are connected in sequence; the feature enhancer network comprises three feature enhancement layers arranged in parallel; the detection network comprises a detection sub-network connected with each characteristic enhancement layer, and each detection sub-network comprises a first full convolution network and a second full convolution network which are arranged in parallel and are composed of a plurality of first full convolution layers; the output end of the first full convolution network is connected with a second full convolution layer, a third full convolution layer and a fourth full convolution layer which are arranged in parallel; the output end of the second full convolution network is connected with a fifth full convolution layer;
(2b) defining a loss function L and a category loss function L of a single-stage rotating ship target detection model OclsRegression loss function LregCentral loss function Lc
L=Lcls+αLreg+βLC
Figure FDA0003528161780000021
Figure FDA0003528161780000022
Figure FDA0003528161780000023
Wherein p isx,yIs the output of the feature point prediction category,
Figure FDA0003528161780000024
is a category label, tx,yIs the output of the predictive regression of the feature points,
Figure FDA0003528161780000025
the regression label is returned to the user terminal,
Figure FDA0003528161780000026
represents an indicator when
Figure FDA0003528161780000027
When the temperature of the water is higher than the set temperature,
Figure FDA0003528161780000028
get 1 when
Figure FDA0003528161780000029
When the temperature of the water is higher than the set temperature,
Figure FDA00035281617800000210
taking 0, alpha and beta as weight parameters, alpha + beta being 1, and the value range [0, 1%];
(3) Performing iterative training on a single-stage rotating ship detection model based on a full convolution network:
(3a) the number of the initialized iteration is T, the maximum iteration times is T, T is more than or equal to 10000, and the detection model of the T-th iteration single-stage rotating ship is OtThe weight score of the tth iteration detection network and the feature extraction sub-networkIs otherwise omega1t、ω2tAnd make Ot=O,t=1;
(3b) Predicting the position, the rotary deflection angle, the central degree offset and the category confidence of each target;
(3b1) will be derived from the training sample set HtrainB training samples randomly selected from the training samples are used as a single-stage rotating ship detection model OtExtracting a multi-scale feature map of a target in each training sample by a feature extraction sub-network in the backbone network; the feature enhancement sub-network performs feature enhancement on each feature map, wherein b is more than or equal to 8;
(3b2) a first full convolution network and a second full convolution network in the detection network respectively carry out multi-layer full convolution operation on each feature map after feature enhancement, and a second full convolution layer, a third full convolution layer and a fourth full convolution layer respectively carry out single-layer full convolution operation on each regression feature map obtained by the multi-layer full convolution operation of the first full convolution network to obtain a position label
Figure FDA00035281617800000211
Angle label
Figure FDA00035281617800000212
Centrality label
Figure FDA00035281617800000213
The fifth full convolution layer performs full convolution operation on each classification feature map obtained by multi-layer full convolution operation of the second full convolution network to obtain a class confidence label of the predicted target
Figure FDA00035281617800000214
(3c) Using a loss function L and passing
Figure FDA00035281617800000215
And
Figure FDA00035281617800000216
calculating OtLoss value L oftBy back propagation through LtCalculating OtGradient λ of the parametertThen using a gradient descent method through lambdatWeight omega of detection network and feature extraction sub-network1tAnd ω2tUpdating is carried out;
(3d) judging whether T is true or not, if so, obtaining a trained single-stage rotating ship detection network model O, otherwise, making T be T +1, and executing the step (3 b);
(4) obtaining a detection result of the single-stage rotating ship:
test sample set HtestThe forward propagation is carried out as the input of a trained single-stage rotating ship detection model O to obtain the position label of a predicted target
Figure FDA0003528161780000031
Angle label
Figure FDA0003528161780000032
Centrality label
Figure FDA0003528161780000033
Category confidence labels
Figure FDA0003528161780000034
NMS for predicting targets using a threshold μ rotation non-maximum suppression method
Figure FDA0003528161780000035
Figure FDA0003528161780000036
Screening to obtain HtestThe bounding box and class confidence of each object contained.
2. The full convolution network-based single-stage rotating ship detection method according to claim 1, wherein the rotating frame labeling is performed on the ship target in each optical remote sensing ship image in step (1a), and the implementation method is as follows: and taking a horizontal coordinate x and a vertical coordinate y of the center of the rotary labeling frame, the length l and the width w of the rotary labeling frame and a counterclockwise included angle theta between the long edge of the rotary labeling frame and the horizontal direction as a boundary frame position label of each rotary target.
3. The full convolution network-based single-stage rotating ship detection method according to claim 1, wherein the step (1b) is implemented by performing preprocessing on each preprocessed optical remote sensing ship image HtAnd performing data enhancement by adopting a rotation enhancement method, a turnover enhancement method, a scaling enhancement method or a noise enhancement method.
4. The full convolution network based single-stage rotating vessel detection method according to claim 1, wherein the single-stage rotating vessel target detection model O in the step (2a) is a model in which:
the feature extraction sub-network in the backbone network comprises a first convolution layer, a maximum pooling layer and four block blocks which are connected in sequence; the feature enhancement network comprises three feature enhancement layers which are connected in sequence, wherein a first feature enhancement layer, a second feature enhancement layer and a third feature enhancement layer are respectively connected with a second block, a third block and a fourth block in a feature extraction sub-network;
the specific parameters of the feature extraction subnetwork are: the size of the first convolution kernel is 7 x 7, the step size of the convolution kernel is 2, and the number of the convolution kernels is 64; the first block comprises 3 first convolution blocks which are connected in sequence, the first convolution blocks comprise 2 convolution layers, the number of convolution kernels of the first convolution layer and the second convolution layer is 64, the size of the convolution kernels of the first convolution layer and the third convolution kernel is 1 x 1, and the number of convolution kernels of the third convolution layer is 128; the second block comprises 4 second convolution blocks which are connected in sequence, the second convolution block comprises 2 convolution layers, the number of convolution kernels of the first convolution layer and the second convolution layer is 128, the size of convolution kernels of the first convolution layer and the third convolution kernel is 1 x 1, and the number of convolution kernels of the other third convolution layer is 256; the third block comprises 6 third convolution blocks which are connected in sequence, the third convolution blocks comprise 2 convolution layers, the number of convolution kernels of the first convolution layer and the second convolution layer is 256, the size of the convolution kernels of the first convolution layer and the third convolution kernel is 1 x 1, and the number of convolution kernels of the other third convolution layer is 512; the fourth block comprises 3 fourth convolution blocks which are connected in sequence, the fourth convolution block comprises 2 convolution layers, the number of convolution kernels of the first convolution layer and the second convolution layer is 512, the size of convolution kernels of the first convolution layer and the third convolution layer is 1 x 1, and the number of convolution kernels of the other third convolution layer is 1024;
the first full convolution network and the second full convolution network in the detection network both comprise 4 first full convolution layers, and the specific parameters of the detection network are as follows: the size of the first full convolution layer convolution kernel is 3 multiplied by 3, and the number of channels is 256; the size of the second full convolution layer convolution kernel is 3 multiplied by 3, and the number of channels is 4; the size of the third full convolution layer convolution kernel is 3 multiplied by 3, and the number of channels is 1; the size of a fifth full convolution layer convolution kernel of the second full convolution network is 3 multiplied by 3, and the number of channels is L; where L represents the number of detection categories.
5. The full convolution network-based single-stage rotating vessel detection method according to claim 1, wherein the loss function L and the class loss function L for constructing the single-stage rotating vessel target detection model O in the step (2b)regRegression loss function LregCentral loss function LcThe specific loss function formulas used are respectively as follows:
ffocalloss=-(1-pt)γlog(pt)
Figure FDA0003528161780000041
Figure FDA0003528161780000042
wherein f isfocallossRepresents the focal loss function, fGIoURepresenting the GIoU loss function, LcRepresenting the centenness loss function; p is a radical oftRepresenting the confidence of the generated label, gamma is a hyperparameter, and [0,5 ] is taken],ApArea of candidate frame representing feature point generation, AgRepresenting the real boxed area containing the feature points, I represents ApAnd AgThe overlap area, l, t, r, b, respectively represents the distance from the center point to the left, top, right, and bottom sides of the regression box label.
6. The full convolution network-based single-stage rotating ship detection method according to claim 1, wherein weights of the tth iteration detection network and the feature extraction sub-network in the step (3a) are ω and ω, respectively1And omega2Wherein ω is1Network weights trained on ImageNet are adopted
Figure FDA0003528161780000051
Carry out initialization, omega2Initialization is performed using the He initialization method.
7. The full convolution network-based single-stage rotating ship detection method according to claim 1, wherein in the step (3c), the detection network and the weight ω of the feature extraction sub-network are set to be zero1tAnd omega2tThe update formulas of (a) are respectively:
Figure FDA0003528161780000052
Figure FDA0003528161780000053
wherein eta represents learning step length, 0.0001-0.1, omega2t' and omega1tRespectively represent omega1tAnd ω2tAs a result of the update, the result of the update,
Figure FDA0003528161780000054
representing partial derivativesAnd (4) calculating.
CN202210198503.9A 2022-03-02 2022-03-02 Single-stage rotating ship detection method based on full convolution network Pending CN114565824A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210198503.9A CN114565824A (en) 2022-03-02 2022-03-02 Single-stage rotating ship detection method based on full convolution network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210198503.9A CN114565824A (en) 2022-03-02 2022-03-02 Single-stage rotating ship detection method based on full convolution network

Publications (1)

Publication Number Publication Date
CN114565824A true CN114565824A (en) 2022-05-31

Family

ID=81716166

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210198503.9A Pending CN114565824A (en) 2022-03-02 2022-03-02 Single-stage rotating ship detection method based on full convolution network

Country Status (1)

Country Link
CN (1) CN114565824A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116012719A (en) * 2023-03-27 2023-04-25 中国电子科技集团公司第五十四研究所 Weak supervision rotating target detection method based on multi-instance learning
CN116935168A (en) * 2023-09-13 2023-10-24 苏州魔视智能科技有限公司 Method, device, computer equipment and storage medium for training target detection model

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116012719A (en) * 2023-03-27 2023-04-25 中国电子科技集团公司第五十四研究所 Weak supervision rotating target detection method based on multi-instance learning
CN116935168A (en) * 2023-09-13 2023-10-24 苏州魔视智能科技有限公司 Method, device, computer equipment and storage medium for training target detection model
CN116935168B (en) * 2023-09-13 2024-01-30 苏州魔视智能科技有限公司 Method, device, computer equipment and storage medium for target detection

Similar Documents

Publication Publication Date Title
CN112308019B (en) SAR ship target detection method based on network pruning and knowledge distillation
CN111563473B (en) Remote sensing ship identification method based on dense feature fusion and pixel level attention
Zhang et al. Balance learning for ship detection from synthetic aperture radar remote sensing imagery
CN110472627B (en) End-to-end SAR image recognition method, device and storage medium
CN111179217A (en) Attention mechanism-based remote sensing image multi-scale target detection method
CN111738112B (en) Remote sensing ship image target detection method based on deep neural network and self-attention mechanism
CN109101897A (en) Object detection method, system and the relevant device of underwater robot
CN111079739B (en) Multi-scale attention feature detection method
CN110796048B (en) Ship target real-time detection method based on deep neural network
CN111753677B (en) Multi-angle remote sensing ship image target detection method based on characteristic pyramid structure
CN112395987B (en) SAR image target detection method based on unsupervised domain adaptive CNN
CN110647802A (en) Remote sensing image ship target detection method based on deep learning
CN114565824A (en) Single-stage rotating ship detection method based on full convolution network
CN111783523A (en) Remote sensing image rotating target detection method
CN116168240A (en) Arbitrary-direction dense ship target detection method based on attention enhancement
CN115937659A (en) Mask-RCNN-based multi-target detection method in indoor complex environment
Xiao et al. FDLR-Net: A feature decoupling and localization refinement network for object detection in remote sensing images
CN115965862A (en) SAR ship target detection method based on mask network fusion image characteristics
Fan et al. A novel sonar target detection and classification algorithm
CN111950357A (en) Marine water surface garbage rapid identification method based on multi-feature YOLOV3
CN114463624A (en) Method and device for detecting illegal buildings applied to city management supervision
CN113486819A (en) Ship target detection method based on YOLOv4 algorithm
CN109284752A (en) A kind of rapid detection method of vehicle
Huang et al. A deep learning approach to detecting ships from high-resolution aerial remote sensing images
CN116630808A (en) Rotary ship detection method based on remote sensing image feature extraction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination