CN112668440B - SAR ship target detection method based on regression loss of balance sample - Google Patents

SAR ship target detection method based on regression loss of balance sample Download PDF

Info

Publication number
CN112668440B
CN112668440B CN202011544100.2A CN202011544100A CN112668440B CN 112668440 B CN112668440 B CN 112668440B CN 202011544100 A CN202011544100 A CN 202011544100A CN 112668440 B CN112668440 B CN 112668440B
Authority
CN
China
Prior art keywords
layer
training
module
loss function
sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011544100.2A
Other languages
Chinese (zh)
Other versions
CN112668440A (en
Inventor
王英华
杨振东
刘宏伟
唐天顾
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN202011544100.2A priority Critical patent/CN112668440B/en
Publication of CN112668440A publication Critical patent/CN112668440A/en
Application granted granted Critical
Publication of CN112668440B publication Critical patent/CN112668440B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Image Analysis (AREA)
  • Complex Calculations (AREA)

Abstract

The invention discloses an SAR ship target detection method based on balance sample regression loss, and mainly solves the problem that a trained network model is low in ship target detection performance due to imbalance of difficult and easy samples in the conventional deep learning method. The implementation scheme is as follows: 1) Acquiring ship data, and dividing the ship data into training data and testing data; 2) Selecting a fast-RCNN network as a training network model; 3) Improving an original loss function of the training network to form a new total loss function; 4) Sending the training data into the network selected in the step 2), and training the network by using a new total loss function to obtain a finally trained network model; 5) And sending the test data to the trained network model to obtain a ship target detection result. The method can better extract the depth characteristics of the ship target, improves the detection performance of the ship target, and can be used for detecting the ship target.

Description

SAR ship target detection method based on regression loss of balance sample
Technical Field
The invention belongs to the technical field of radars, and mainly relates to an SAR image ship target detection method which can be used for subsequent ship target identification and classification.
Background
The synthetic aperture radar is an active imaging sensor and has all-weather, all-time and high-resolution data acquisition capability. And has the characteristics of multiple frequency bands, multiple polarization, variable visual angle, penetrability and the like. At present, the SAR is widely applied to the fields of military reconnaissance, geological survey, topographic mapping and cartography, disaster prediction, marine application, scientific research and the like, and has wide research and application prospects. The Automatic Target Recognition (ATR) of the SAR image is one of important applications of the SAR image. The basic SAR image automatic target recognition ATR system generally comprises three stages of target detection, target identification and target recognition. The performance of target detection will directly affect the performance and efficiency of the identification and classification stages, so it is very important for the research of SAR target detection.
The traditional SAR image target detection algorithm is mainly a constant false alarm rate CFAR method, which determines a detection threshold value according to a pre-established clutter statistical model, but the models have the defects of limited application scenes and low detection efficiency. The current popular deep learning method can automatically obtain the characteristics of data through training for target detection, can ensure the accuracy and the detection speed under the condition of sufficient data, and is more suitable for different scenes, so that the method attracts general attention in researching SAR image ship target detection through the deep learning method.
Jianweii Li et al arranges and releases SSDD SHIP data sets IN AN article published by IEEE (SHIP DETECTION IN SAR IMAGES BASED ON AN IMPROVED FASTER R-CNN) and provides a method for detecting by using AN IMPROVED FASTER-RCNN network.
Yuanyuan Wang et al improve the RetinaNet Detection network in an article Automatic Ship Detection Based on RetinaNet Using Multi-Resolution Gaofen-3 image published by remote sensing, and apply the RetinaNet Detection network to high-Resolution No. 3 SAR image Ship data to obtain better Detection performance. The method solves the problem of unbalance of difficult and easy samples in classification by using a Focal loss classification loss function, however, the unbalance problem in regression is not considered, and the detection result still needs to be further improved.
Disclosure of Invention
The invention aims to provide a SAR ship target detection method based on balance sample regression loss, which aims to solve the problem that a trained network model is poor in ship target detection effect due to imbalance of difficult and easy samples in the conventional deep learning method and improve the ship target detection performance.
The technical scheme of the invention is as follows: firstly, SSDD data is divided into training data and testing data, then a fast-RCNN deep neural network model is trained by using the training data, the network uses an improved regression loss function, and after the model is converged, the trained neural network is applied to the testing data to obtain a final ship detection result, wherein the implementation steps comprise the following steps:
(1) Acquiring an SSDD ship data set, and dividing the SSDD ship data set into training data phi according to the proportion of 8 x And test data phi c
(2) Selecting a training network model omega formed by sequentially connecting a shared basic module VGG, a region selection module RPN and a region refinement module Fast-RCNN;
(3) Constructing an improved loss function;
(3a) Regressing the RPN blocks of the network in losses
Figure BDA0002855456390000021
The function is improved as
Figure BDA0002855456390000022
A function of the form:
Figure BDA0002855456390000023
Figure BDA0002855456390000024
wherein j is the jth training sample of the RPN module, t n Location information predicted for the RPN module for the training samples,
Figure BDA0002855456390000025
n belongs to { x, y, w, h } and is the position information of the target frame corresponding to the training sample, a is a hyperparameter, and a =2;
(3b) Regressing network Fast-RCNN modules in loss
Figure BDA0002855456390000026
The function is improved as
Figure BDA0002855456390000027
A function of the form:
Figure BDA0002855456390000028
Figure BDA0002855456390000031
wherein p is the p-th training sample of Fast-RCNN module, e m The position information predicted for the Fast-RCNN module for the training samples,
Figure BDA0002855456390000032
and m belongs to { x, y, w, h } for the position information of the target frame corresponding to the training sample.
(3c) Obtaining the total loss function J after the training network is improved according to the improved regression loss functions of (3 a) and (3 b) s It is represented as follows:
Figure BDA0002855456390000033
wherein, J s1 、λ rpn And N s1 The classification loss function, the loss balance constant and the number of training samples, J, of the RPN module s3 、λ fast And N s2 For the classification loss function, loss balance constant and training sample number of the Fast-RCNN module,
Figure BDA0002855456390000034
is the label of whether the jth sample of the RPN block is a positive sample,
Figure BDA0002855456390000035
is a label for the p-th sample of the Fast-RCNN module.
(4) Will train data phi x Inputting the loss function into a well-constructed training network model omega, and using a total loss function J of the network s Training the network model omega until the loss function is converged to obtain a trained network model omega';
(5) Ship test data phi c And inputting the data into a finally trained network model omega' to obtain a detection result of the ship.
Compared with the prior art, the invention has the following advantages:
the method aims at the problem that regression loss is difficult and samples are unbalanced when the existing deep learning target detection method Faster-RCNN is applied to SAR ship target detection, improves a regression loss function, can better extract the depth characteristics of a ship target, and improves the ship target detection performance.
Drawings
FIG. 1 is a flow chart of an implementation of the present invention;
FIG. 2 is a diagram of a training network model architecture used in the present invention;
FIG. 3 is a graph of the gradient of the regression loss function designed in the present invention.
Detailed Description
The embodiments and effects of the present invention will be described in detail below with reference to the accompanying drawings:
referring to fig. 1, the implementation steps of the invention are as follows:
step 1, acquiring an SSDD ship data set, and dividing training data and test data.
The SSDD ship data set is shot by a plurality of radar satellites and formed by arranging and marking by experts, and the training data phi is divided into the training data phi according to the proportion of 8 x And test data phi c
And 2, selecting a training network model omega.
The existing target detection networks comprise a YOLO series, an SSD network, a Faster-RCNN network and the like, and the classical Faster-RCNN network is selected as a training network model omega for detection in the example.
Referring to fig. 2, the training network model Ω selected in this example is formed by sequentially connecting a shared basic module VGG, a region selection module RPN, and a region refinement module Fast-RCNN, and each module has the following structure:
2.1 VGG base module:
the module comprises 5 rolled blocks and 4 average pooling layers, namely a first rolled block V1 → a first average pooling layer P1 → a second rolled block V2 → a second average pooling layer P2 → a third rolled block V3 → a third average pooling layer P3 → a fourth rolled block V4 → a fourth average pooling layer P4 → a fifth rolled block V5, and the parameter settings and relationships of the layers are as follows:
2.1 a) convolution block V1, which is formed by two identical convolution blocks in cascade, each convolution block consisting of a two-layer structure, i.e. the first layer being a convolution layer
Figure BDA0002855456390000041
The second layer is a ReLU activation function layer
Figure BDA0002855456390000042
i denotes the ith convolution block, i =1,2, where:
first layer of the convolution layer
Figure BDA0002855456390000043
Its convolution kernel K 1 The window size of (3X 3), the sliding step length S 1 Is 1, the filling mode is SAME, and is used for convolving the input and outputting 64 characteristic maps
Figure BDA0002855456390000044
Activating a function layer as a second layer
Figure BDA0002855456390000045
The input of (1);
second ReLU activation function layer
Figure BDA0002855456390000046
For the upper layer
Figure BDA0002855456390000047
The output of the layer is mapped nonlinearly, and the nonlinear mapping formula is as follows:
ReLU(x)=max(0,x)
where x is the input and ReLU (x) is the output, the input and output dimensions of the layer are the same.
2.1 b) average pooling layer P1 for down-sampling the input with a down-sampling kernel U 1 Has a window size of 2 x 2, a sliding step length V 1 Is 2, 64 feature maps Y are output 1 As input to the volume block V2 layer.
2.1 c) a convolution block V2 consisting of two identical convolution blocks in cascade, each convolution block consisting of a two-layer structure, i.e. the first layer being a convolution layer
Figure BDA0002855456390000048
The second layer is a ReLU activation function layer
Figure BDA0002855456390000049
i denotes the i-th convolution block, i =1,2, where:
first layer of convolutional layer
Figure BDA00028554563900000410
Its convolution kernel K 2 Has a window size of 3 x 3, a sliding step length S 2 The filling mode is SAME, and the filling mode is used for convolving the input and outputting 128 feature maps
Figure BDA00028554563900000411
As a second layerLayer of activation function
Figure BDA00028554563900000412
The input of (2);
second ReLU activation function layer
Figure BDA0002855456390000051
For aligning the upper layer
Figure BDA0002855456390000052
The output of the layer is mapped nonlinearly, and the nonlinear mapping formula is as follows:
ReLU(x)=max(0,x)
where x is the input and ReLU (x) is the output, the input and output dimensions of the layer are the same.
2.1 d) average pooling layer P2 for down-sampling the input with down-sampling kernel U 2 Has a window size of 2 x 2, a sliding step length V 2 Is 2, 128 feature maps Y are output 2 As input to the volume block V3.
2.1 e) a convolution block V3 consisting of three identical convolution blocks in cascade, each convolution block consisting of a two-layer structure, i.e. the first layer being a convolution layer
Figure BDA0002855456390000053
The second layer is a ReLU activation function layer
Figure BDA0002855456390000054
i denotes the ith convolution block, i =1,2,3, where:
first layer of convolutional layer
Figure BDA0002855456390000055
Its convolution kernel K 3 Has a window size of 3 x 3, a sliding step length S 3 The filling mode is SAME, and is used for convolving input and outputting 256 characteristic graphs
Figure BDA0002855456390000056
Activating a function layer as a second layer
Figure BDA0002855456390000057
The input of (2);
second ReLU activation function layer
Figure BDA0002855456390000058
For aligning the upper layer
Figure BDA0002855456390000059
The output of the layer is mapped nonlinearly, and the nonlinear mapping formula is as follows:
ReLU(x)=max(0,x)
where x is the input and ReLU (x) is the output, the input and output dimensions of the layer are the same.
2.1 f) average pooling layer P3 for down-sampling the input, with down-sampling kernel U 3 Has a window size of 2 x 2, a sliding step length V 3 To 2, 256 feature maps Y are output 3 As input to the volume block V4.
2.1 g) convolution block V4, which is cascaded with three identical convolution blocks, each convolution block consisting of a two-layer structure, i.e. the first layer being a convolution layer
Figure BDA00028554563900000510
The second layer is a ReLU activation function layer
Figure BDA00028554563900000511
i denotes the ith convolution block, i =1,2,3, where:
first layer of convolutional layer
Figure BDA00028554563900000512
Its convolution kernel K 4 Has a window size of 3 x 3, a sliding step length S 4 Is 1, the filling mode is SAME, and is used for convolving the input and outputting 512 feature maps
Figure BDA00028554563900000513
Activating a function layer as a second layer
Figure BDA00028554563900000514
The input of (1);
second ReLU activation function layer
Figure BDA00028554563900000515
For aligning the upper layer
Figure BDA00028554563900000516
The output of the layer is mapped nonlinearly, and the nonlinear mapping formula is as follows:
ReLU(x)=max(0,x)
where x is the input and ReLU (x) is the output, the input and output dimensions of the layer are the same.
2.1 h) an average pooling layer P4 for down-sampling the input with a down-sampling kernel U 4 Has a window size of 2 x 2, a sliding step length V 4 To 2, 512 feature maps Y are output 4 As input to the volume block V5.
2.1 i) convolution block V5, which is formed by three identical convolution blocks in cascade, each convolution block consisting of a two-layer structure, i.e. the first layer being a convolution layer
Figure BDA0002855456390000061
The second layer is a ReLU activation function layer
Figure BDA0002855456390000062
i denotes the ith convolution block, i =1,2,3, where:
first layer of convolutional layer
Figure BDA0002855456390000063
Its convolution kernel K 5 Has a window size of 3 x 3, a sliding step length S 5 The filling mode is SAME, and the filling mode is used for convolving the input and outputting 512 feature maps
Figure BDA0002855456390000064
Activating a function layer as a second layer
Figure BDA0002855456390000065
The input of (1);
second layerReLU activation function layer
Figure BDA0002855456390000066
For aligning the upper layer
Figure BDA0002855456390000067
The output of the layer is mapped nonlinearly, and the nonlinear mapping formula is as follows:
ReLU(x)=max(0,x)
where x is the input and ReLU (x) is the output, the input and output dimensions of the layer are the same.
Last convolution block ReLU activation function layer
Figure BDA0002855456390000068
And the output is the shared feature F extracted by the VGG basic layer.
2.2 RPN module:
the layers in turn being composed of a shared convolution layer C 1 Activation function layer C 2 Two parallel classification branches C 3 And regression convolutional layer C 4 The composition, each layer parameter setting and relation are as follows:
shared convolution layer C 1 Convolution kernel K of 6 The window size of (3X 3), the sliding step length S 6 Is 1, the filling mode is SAME, and is used for convolving the input and outputting 512 feature maps Y 6 As a second layer activation function layer C 2 The input of (1);
second layer ReLU activation function layer C 2 For the upper layer C 1 The output of the layer is mapped nonlinearly, and the nonlinear mapping formula is as follows:
ReLU(x)=max(0,x)
where x is the input and ReLU (x) is the output, the input and output dimensions of the layer are the same.
The classification branch is formed by a classification convolution layer C 31 Softmax classifier layer C 32 The composition, each layer parameter setting and relation are as follows:
classified convolutional layer C 31 Convolution kernel K of it 7 Has a window size of 1 × 1, a sliding step S 7 Is 1, the filling mode isSAME for convolving input and outputting 18 characteristic maps Y 7
Softmax classifier layer C 32 For winding up the layers C to be classified 31 The obtained 18 feature maps Y 7 And inputting the classification probability vector p into two types of Softmax classifiers.
Regression convolutional layer C 4 Convolution kernel K of 8 The window size of (2) is 1 x 1, the sliding step length S 8 The filling mode is SAME, and the filling mode is used for convolving the input and outputting 36 feature maps b which represent 4 offsets t respectively predicted by the RPN module for 9 preset frames x ,t y ,t w ,t h
According to regression convolution layer C 4 The output b of the step (a) determines the position of the initial detection frame, screens the detection frame through the classification probability vector p, judges the frame with the score meeting the threshold value as a candidate frame and uses the candidate frame as the input of a next refinement module Fast-RCNN module.
2.3 Fast-RCNN Module:
this layer comprises ROIpooling layer, first full articulamentum F1, the full articulamentum F2 of second, two categorised branch roads F3 that parallel and the full articulamentum F4 of regression in proper order, and each layer parameter sets up and the relation as follows:
a ROIploling layer R for down-sampling the input feature maps of different sizes into the same size, which divides the input feature map into 7 × 7 blocks, selects the maximum value as the output of the current feature block, and outputs 512 feature maps Y 8 ,Y 8 The size is 512 × 7 × 7 as input to the first fully-connected layer F1.
The first full-connection layer F1 is provided with 4096 neurons and is used for carrying out nonlinear mapping on the characteristics output by the previous ROIploling layer and outputting a 4096-dimensional column vector;
the second full connection layer F2 is provided with 4096 neurons and is used for carrying out nonlinear mapping on the column vector output by the last full connection layer F1 and outputting a 4096-dimensional column vector;
the classification branch is composed of a classification full-connection layer F31 and a Softmax classifier layer F32 in sequence, and the parameter setting and relation of each layer are as follows:
a classification fully-connected layer F31, which is provided with 2 neurons and is used for carrying out nonlinear mapping on the column vector output by the last fully-connected layer F2 and outputting a 2-dimensional column vector as the input of a Softmax classifier layer F32;
the Softmax classifier layer F32 is used for inputting the 2-dimensional column vectors obtained by classifying the full connection layer F31 into two types of Softmax classifiers to obtain a classification probability vector T, and classifying the candidate frames according to the probability values;
a regression full-connected layer F4 with 4 neurons for performing nonlinear mapping on the output column vector of the previous full-connected layer F2 and outputting a 4-dimensional regression column vector C representing the regression offset e of the candidate frame x ,e y ,e w ,e h
And 3, constructing an improved loss function.
The regression loss in the original loss function for training the network model omega uses smooth L1 The function has the problem that the regression loss is difficult and easy to sample unbalance during network training, and the network detection performance is influenced, so that the regression loss function needs to be improved, and a new total loss function is constructed.
The new total loss function is composed of the sum of the loss function of the RPN module and the loss function of the Fast-RCNN module, and is constructed as follows:
3.1 Constructing an RPN module loss function:
3.1.1 Preset frame information and target frame information: the RPN module is configured with nine preset frames for each feature point of the shared feature F extracted by the VGG base layer, the nine frames are obtained by three aspect ratios and three dimensions, wherein the width: the height e {1, 1 a ,y a ) Is a center coordinate of a preset frame, w a Is the width of the frame, h a Height of the frame (x) * ,y * ) Is the center coordinate of the target frame, w * Is the width of the target frame, h * Is the height of the target frame;
3.1.2 Calculating the intersection ratio IOU value of each preset frame and all target frames according to the preset frame information and the target frame information:
Figure BDA0002855456390000081
wherein, A is a preset frame, B is a target frame, A ^ B is the intersection area of the preset frame and the target frame, and A ^ B is the intersection area of the preset frame and the target frame;
3.1.3 Setting an IOU threshold, dividing sample classes of a preset frame:
setting the IOU lower threshold v 1 Is 0.3, the upper threshold value v 2 Is 0.7.
If the IOU of the preset frame and a certain target frame is more than or equal to the upper limit threshold value v 2 Or if the IOU of the preset frame and a certain target frame is the largest of all the preset frames and the IOU of the target frame, the preset frame is a positive sample, and the target frame is allocated to the preset frame;
if the IOU of the preset frame and all the target frames is less than or equal to the lower limit threshold v 1 If the IOU of the preset frame and a certain target frame is not the largest of all the preset frames and the IOU of the target frame, the preset frame is a negative sample;
if the IOU of the preset frame and all the target frames is larger than the lower limit threshold value v 1 And are all less than the upper threshold v 2 If the IOU of the preset frame and the IOU of a certain target frame are not the largest of all the preset frames and the IOU of the target frame, the preset frame is a useless sample and does not participate in training;
3.1.4 According to the preset frame information and the corresponding target frame information, the target offset is calculated
Figure BDA0002855456390000082
Figure BDA0002855456390000091
Figure BDA0002855456390000092
Figure BDA0002855456390000093
Figure BDA0002855456390000094
3.1.5 Raw loss function of RPN block:
obtaining a classification probability vector p and a regression feature map b by using an RPN module, wherein the classification probability vector p is the probability of the preset frame as the target and the background, and the regression feature map b is the offset information t predicted by the network for each preset frame x ,t y ,t w ,t h Obtaining a raw loss function J 'of the RPN module' rpn
J' rpn =J s1rpn J s2
Figure BDA0002855456390000095
Figure BDA0002855456390000096
Wherein, J s1 As a cross entropy loss function, N s1 For the total number of training samples of the RPN module, N when training using a batch gradient descent algorithm s1 The number of samples 256 for a batch is taken,
Figure BDA0002855456390000097
for the jth sample corresponding to the kth class of labels,
Figure BDA0002855456390000098
a probability of predicting a jth sample as a kth class for the RPN network; j. the design is a square s2 In order to be a function of the regression loss,
Figure BDA0002855456390000099
if the jth sample is a label of the positive sample, and the label is 1, the preset frame is the positive sample, and the regression loss is calculated(ii) a When the label is 0, the preset frame is a negative sample, and the regression loss is 0; t is t n Is the position information of the preset frame,
Figure BDA00028554563900000910
n is the position information of the target frame and belongs to { x, y, w, h }, lambda rpn Is an equilibrium constant.
Figure BDA00028554563900000911
The function is:
Figure BDA00028554563900000912
3.1.6 The original loss function is improved):
referring to fig. 3, the loss of the regression part of the RPN module is counted, the loss gradient value generated by a simple sample is obviously larger than the loss gradient value generated by a difficult sample, and the pair is used for solving the problem of unbalance of the difficult and easy sample in ship detection
Figure BDA00028554563900000913
The function is improved to increase the gradient of the hard sample loss, the
Figure BDA00028554563900000914
With improved function
Figure BDA00028554563900000915
Comprises the following steps:
Figure BDA0002855456390000101
wherein a is a hyperparameter, a =2, and the improved RPN module loss J is obtained rpn Comprises the following steps:
Figure BDA0002855456390000102
3.2 Build Fast-RCNN module loss function.
3.2.1 Regression feature map b) predicted from RPN, which represents the offset t of the candidate box x ,t y ,t w ,t h And calculating the position information x, y, w, h of the candidate frame:
t x =(x-x a )/w a ,
t y =(y-y a )/h a ,
t w =log(w/w a ),
t h =log(h/h a );
wherein, (x, y) is the center coordinate of the candidate frame, w is the width of the candidate frame, and h is the height of the frame;
3.2.2 Set the IOU threshold, partition the sample class of candidate frames:
setting the IOU threshold v 3 Is 0.5.
Calculating the IOU of the candidate frame and the target frame according to the position coordinates of the candidate frame, and if the IOU of the candidate frame and all the target frames is less than a threshold value v 3 The candidate box is a negative sample;
if the intersection ratio of the candidate frame and a certain target frame is larger than the threshold value v 3 The candidate box is a positive sample, and the target box is allocated to the candidate box;
3.2.3 Calculate a target offset from the candidate box information and the target box information
Figure BDA0002855456390000103
Figure BDA0002855456390000104
Figure BDA0002855456390000105
Figure BDA0002855456390000106
Figure BDA0002855456390000107
3.2.4 Raw loss function of Fast-RCNN module:
further position regression and classification are carried out on the front 2000 candidate frame judged to be high in target probability according to the classification probability vector p obtained by the RPN module by using a Fast-RCNN module to obtain a classification probability vector T and a regression column vector C, wherein the classification probability vector is the probability that the candidate frame is judged to be a target and a background, and the regression column vector C is the prediction offset e of the candidate frame x ,e y ,e w ,e h Obtaining a loss function J 'of Fast-RCNN module' fast
J' fast =J s3fast J s4
Figure BDA0002855456390000111
Figure BDA0002855456390000112
Wherein, J s3 As a cross-entropy loss function, N s2 Total number of training samples for Fast-RCNN Module, N when training using batch gradient descent Algorithm s2 The number of samples taken for a batch is 128,
Figure BDA0002855456390000113
for the p-th sample corresponding to the kth class of labels,
Figure BDA0002855456390000114
a probability of predicting the p-th sample as a k-th class for the Fast RCNN network; j. the design is a square s4 In order to return the loss of the power,
Figure BDA0002855456390000115
if the p-th sample is a label of a positive sample, the candidate box is a positive sample when the label is 1, and the regression loss is calculated, and if the label is 0, the candidate box is a negative sample, and the regression loss is calculatedIs 0, λ fast Is an equilibrium constant.
Figure BDA0002855456390000116
The function is:
Figure BDA0002855456390000117
3.2.5 Improve the original loss function:
similarly, for solving the problem of unbalanced sample difficulty in ship detection, the pair
Figure BDA0002855456390000118
With improved function
Figure BDA0002855456390000119
The function is:
Figure BDA00028554563900001110
wherein a is a hyper-parameter, a =2, and the loss J of the improved Fast-RCNN module fast Comprises the following steps:
Figure BDA00028554563900001111
3.3 ) improved regression loss function J according to 3.1) rpn And 3.2) improved regression loss function J fast To obtain the total loss function J after the training network is improved s
Figure BDA0002855456390000121
Step 4, using the loss function J constructed in the step 3 s And training the network model omega to obtain a trained network model omega'.
4.1 Will train data Φ x Input to trainingTraining in the network model omega, training one picture at a time, and calculating the loss function J of the network according to the label of the sent picture s A value of (d);
4.2 According to step 4.1) loss function J s Calculating the gradient of the network loss function, and balancing the gradient generated by the difficult and easy sample on the regression loss through the improved loss function;
4.3 According to the gradient generated by the loss function calculated in 4.2) to the network, continuously updating the weight to the direction of reducing the loss function by using a random gradient descent algorithm, and propagating the error of an output layer forward by using a back propagation algorithm to update each layer parameter of the network model omega;
4.4 Step 4.1) -step 4.3) are executed in a loop until the loss function converges, and a well-trained network model Ω' is obtained.
Step 5, testing ship data phi c And inputting the data into a finally trained network model omega' to obtain a detection result of the ship.
The effects of the present invention can be further illustrated by the following experimental data:
first, experimental conditions
1) Experimental data
The SSDD data set which is sorted and issued by the naval and aviation university of the people's liberation army of China is used in the experiment, the data comprises multi-size ship targets and various imaging conditions, such as different resolutions, sea conditions and sensor types, and the diversity of data set samples enables the trained detector to have better robustness. Data set various imaging conditions are shown in table 1.
Table 1 ship data imaging conditions
Figure BDA0002855456390000122
Figure BDA0002855456390000131
In Table 1, RADARSAT-2 is a RADARSAT-2 satellite transmitted in Canada, TERRASAR-X is a TERRASAR-X satellite transmitted in Germany, and SENTINEL-1 is a SENTINEL-1 satellite transmitted in the European Union; HH is the polarization mode of horizontal transmitting and receiving of the satellite, VV is the polarization mode of vertical transmitting and receiving of the satellite, HV is the polarization mode of horizontal transmitting and vertical receiving of the satellite, and VH is the polarization mode of vertical transmitting and horizontal receiving of the satellite.
2) Criteria for evaluation
The experiment is repeated for five times, and the average value of the average accuracy of the detection results of the five times of experiments and the average value of the detection rate are taken to evaluate the experiment results.
Second, the contents of the experiment
The experimental data were compared by the method of the present invention and the two methods in the prior art, and the results of the performance parameter comparison are shown in table 2.
TABLE 2 comparison of Performance parameters of the inventive method with those of the prior art
Comparison method Average rate of accuracy Detection rate
SmoothL1 93.59% 94.38%
BalanceL1 93.83% 93.99%
The invention 94.88% 95.68%
In table 2: smooth L1 uses Smooth for the fast-RCNN network regression loss function L1 A method for detecting ship data;
balance L1 as a function of regression loss for the fast-RCNN network A Balance proposed in the article Libra R-CNN, forwards Balanced Learning for Objectdetection L1 A method for detecting ship data by a regression loss function;
as can be seen from table 2, compared with the existing method, the method of the present invention achieves a better detection effect, because the loss function designed by the method of the present invention can better solve the problem of imbalance of difficult and easy samples, and the network can more accurately learn the characteristics of various samples, the present invention obtains a better detection result than the existing method.
While the foregoing is directed to embodiments of the present invention, other and further embodiments of the invention may be devised without departing from the basic scope thereof, and the scope thereof is determined by the claims that follow.

Claims (7)

1. A SAR ship target detection method based on balance sample regression loss is characterized by comprising the following steps:
(1) Acquiring an SSDD ship data set, and dividing the SSDD ship data set into training data phi according to the proportion of 8 x And test data phi c
(2) Selecting a training network model omega formed by sequentially connecting a shared basic module VGG, a region selection module RPN and a region refinement module Fast-RCNN;
(3) Constructing an improved loss function;
(3a) In regressing losses of RPN blocks of the network
Figure FDA0002855456380000019
The function is improved as
Figure FDA00028554563800000110
A function of the form:
Figure FDA0002855456380000011
Figure FDA0002855456380000012
wherein j is the jth training sample of the RPN module, t n Location information predicted for the training samples by the RPN module,
Figure FDA0002855456380000013
the position information of a target frame corresponding to the training sample is n belongs to { x, y, w, h }, a is a hyper-parameter, and a =2;
(3b) Regressing network Fast-RCNN modules in loss
Figure FDA0002855456380000014
The function is improved as
Figure FDA0002855456380000015
A function of the form:
Figure FDA0002855456380000016
Figure FDA0002855456380000017
wherein p is the p-th training sample of Fast-RCNN module, e m The position information predicted for the Fast-RCNN module for the training samples,
Figure FDA0002855456380000018
the position information of a target frame corresponding to the training sample is m belongs to { x, y, w, h };
(3c) Obtaining the total loss function J after the training network is improved according to the improved regression loss functions of (3 a) and (3 b) s It is expressed as follows:
Figure FDA0002855456380000021
wherein, J s1 、λ rpn And N s1 The classification loss function, the loss balance constant and the number of training samples, J, of the RPN module s3 、λ fast And N s2 For the classification loss function, the loss balance constant and the number of training samples of the Fast-RCNN module,
Figure FDA0002855456380000022
is the label of whether the jth sample of the RPN block is a positive sample,
Figure FDA0002855456380000023
whether the p sample of the Fast-RCNN module is a label of a positive sample or not;
(4) Will train data phi x Inputting the total loss function J of the network into a training network model omega s Training the network model omega until the loss function is converged to obtain a trained network model omega';
(5) Ship test data phi c And inputting the data into a finally trained network model omega' to obtain a detection result of the ship.
2. The method of claim 1, wherein the VGG, the shared basis module for the network model in (2), comprises 5 volume blocks and 4 average pooling layers, in order, first volume block V1 → first average pooling layer P1 → second volume block V2 → second average pooling layer P2 → third volume block V3 → third average pooling layer P3 → fourth volume block V4 → fourth average pooling layer P4 → fifth volume block V5, and the layer parameters are set and related as follows:
the 4 pooling layers have the same structure and are used for down-sampling input, the window size of a down-sampling kernel is 2 multiplied by 2, the sliding step length is 2, the number of output feature maps is consistent with the input, and the output feature maps are used as the input of the next volume block;
the first convolution block V1 and the second convolution block V2 have the same structure and are formed by cascading two identical convolution blocks, wherein each convolution block is composed of a convolution layer L i1 And ReLU activation function layer L i2 Two-layer structure composition, i denotes the ith convolution block, i =1,2;
the third convolution block V3, the fourth convolution block V4 and the fifth convolution block V5 have the same structure and are formed by cascading three identical convolution blocks, and each convolution block is of a two-layer structure, namely the first layer is a convolution layer T j1 The second layer is a ReLU activation function layer T j2 J denotes the jth volume block, j =1,2,3.
3. The method of claim 1, wherein the region selection module RPN of the network model in (2) is sequentially selected from the shared convolution layer C 1 Activation function layer C 2 Two parallel classification branches C 3 And regression convolutional layer C 4 Composition of the classification branch C 3 Sequentially from the classified convolution layer C 31 And Softmax classifier layer C 32 A component for obtaining a classification probability vector p; the regression convolutional layer C 4 For convolution of the input to obtain 36 position prediction characteristic maps b.
4. The method according to claim 1, wherein in (2) the regional refinement module Fast-RCNN of the network model consists of a ROI posing layer, a fully-connected layer F1, a fully-connected layer F2, two juxtaposed classification legs F3 and a regressive fully-connected layer F4 in sequence, the classification legs F3 consisting of a fully-connected layer F31 and a Softmax classifier layer F32 in sequence for deriving the classification probability vector t, the regressive fully-connected layer F4 outputting a 4-dimensional regressive column vector C representing the regression offset e of the candidate box x ,e y ,e w ,e h
5. According to the claimThe method of claim 1, the classification penalty function J of the RPN block of (3 c) s1 Expressed as follows:
Figure FDA0002855456380000031
wherein N is s1 For the total number of training samples of the RPN module, N when training using a batch gradient descent algorithm s1 Taking the number of samples 256 for a batch,
Figure FDA0002855456380000032
for the jth sample corresponding to the kth class of labels,
Figure FDA0002855456380000033
probability of predicting the jth sample as class k for the RPN network.
6. The method according to claim 1, wherein the classification loss function J of Fast-RCNN module in (3 c) s3 Expressed as follows:
Figure FDA0002855456380000034
wherein N is s2 Total number of training samples for Fast-RCNN Module, N when training Using batch gradient descent Algorithm s2 The number of samples taken for a batch is 128,
Figure FDA0002855456380000035
for the p-th sample corresponding to the kth class of labels,
Figure FDA0002855456380000036
the probability of predicting the p-th sample as class k for the Fast-RCNN module.
7. The method of claim 1, wherein (4) the training network model Ω is trained by:
4a) Sending the training data into a network model omega for training, training one picture at a time, and calculating a network loss function J according to a label sent into the picture s A value of (d);
4b) Calculating the loss function gradient of the network, and balancing the gradient generated by the difficult and easy sample on regression through the improved loss function;
4c) According to the gradient generated by the loss function calculated in 4 b) on the network, continuously updating the weight in the direction of reducing the loss function by using a random gradient descent algorithm, and propagating the error of the output layer forward by using a back propagation algorithm to update each layer parameter of the network model omega;
4d) Loop through 4 a) -4 c) until the loss function J) is reached s And converging to obtain a trained network model omega'.
CN202011544100.2A 2020-12-24 2020-12-24 SAR ship target detection method based on regression loss of balance sample Active CN112668440B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011544100.2A CN112668440B (en) 2020-12-24 2020-12-24 SAR ship target detection method based on regression loss of balance sample

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011544100.2A CN112668440B (en) 2020-12-24 2020-12-24 SAR ship target detection method based on regression loss of balance sample

Publications (2)

Publication Number Publication Date
CN112668440A CN112668440A (en) 2021-04-16
CN112668440B true CN112668440B (en) 2023-02-10

Family

ID=75409594

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011544100.2A Active CN112668440B (en) 2020-12-24 2020-12-24 SAR ship target detection method based on regression loss of balance sample

Country Status (1)

Country Link
CN (1) CN112668440B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113536936B (en) * 2021-06-17 2022-10-11 中国人民解放军海军航空大学航空作战勤务学院 Ship target detection method and system
CN113469088B (en) * 2021-07-08 2023-05-12 西安电子科技大学 SAR image ship target detection method and system under passive interference scene
CN113537170A (en) * 2021-09-16 2021-10-22 北京理工大学深圳汽车研究院(电动车辆国家工程实验室深圳研究院) Intelligent traffic road condition monitoring method and computer readable storage medium
CN115049884B (en) * 2022-08-15 2022-10-25 菲特(天津)检测技术有限公司 Broad-sense few-sample target detection method and system based on fast RCNN
CN115346125B (en) * 2022-10-18 2023-03-24 南京金瀚途科技有限公司 Target detection method based on deep learning
CN115641510B (en) * 2022-11-18 2023-08-08 中国人民解放军战略支援部队航天工程大学士官学校 Remote sensing image ship detection and identification method

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110674714A (en) * 2019-09-13 2020-01-10 东南大学 Human face and human face key point joint detection method based on transfer learning
CN111027454A (en) * 2019-12-06 2020-04-17 西安电子科技大学 SAR (synthetic Aperture Radar) ship target classification method based on deep dense connection and metric learning
CN111460984A (en) * 2020-03-30 2020-07-28 华南理工大学 Global lane line detection method based on key point and gradient balance loss
CN111626200A (en) * 2020-05-26 2020-09-04 北京联合大学 Multi-scale target detection network and traffic identification detection method based on Libra R-CNN
CN111860264A (en) * 2020-07-10 2020-10-30 武汉理工大学 Multi-task instance level road scene understanding algorithm based on gradient equilibrium strategy
CN112016467A (en) * 2020-08-28 2020-12-01 展讯通信(上海)有限公司 Traffic sign recognition model training method, recognition method, system, device and medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3596449A4 (en) * 2017-03-14 2021-01-06 University of Manitoba Structure defect detection using machine learning algorithms

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110674714A (en) * 2019-09-13 2020-01-10 东南大学 Human face and human face key point joint detection method based on transfer learning
CN111027454A (en) * 2019-12-06 2020-04-17 西安电子科技大学 SAR (synthetic Aperture Radar) ship target classification method based on deep dense connection and metric learning
CN111460984A (en) * 2020-03-30 2020-07-28 华南理工大学 Global lane line detection method based on key point and gradient balance loss
CN111626200A (en) * 2020-05-26 2020-09-04 北京联合大学 Multi-scale target detection network and traffic identification detection method based on Libra R-CNN
CN111860264A (en) * 2020-07-10 2020-10-30 武汉理工大学 Multi-task instance level road scene understanding algorithm based on gradient equilibrium strategy
CN112016467A (en) * 2020-08-28 2020-12-01 展讯通信(上海)有限公司 Traffic sign recognition model training method, recognition method, system, device and medium

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Automatic Ship Detection Based on RetinaNet UsingMulti-Resolution Gaofen-3 Imagery;Yuanyuan Wang等;《Remote Sensing》;20190331;全文 *
Gradient Harmonized Single-Stage Detector;Buyu Li等;《Computer Vision and Pattern Recognition 》;20181113;全文 *
Ship detection in SAR images based on an improved faster R-CNN;Jianwei Li等;《 2017 SAR in Big Data Era: Models, Methods and Applications》;20171130;全文 *
一种基于改进Faster RCNN的校园车辆检测方法;李航等;《沈阳师范大学学报(自然科学版)》;20200215(第01期);全文 *

Also Published As

Publication number Publication date
CN112668440A (en) 2021-04-16

Similar Documents

Publication Publication Date Title
CN112668440B (en) SAR ship target detection method based on regression loss of balance sample
CN109086700B (en) Radar one-dimensional range profile target identification method based on deep convolutional neural network
CN110472627B (en) End-to-end SAR image recognition method, device and storage medium
CN111027454B (en) SAR ship target classification method based on deep dense connection and metric learning
CN110533631A (en) SAR image change detection based on the twin network of pyramid pondization
CN112395987B (en) SAR image target detection method based on unsupervised domain adaptive CNN
CN110084159A (en) Hyperspectral image classification method based on the multistage empty spectrum information CNN of joint
CN107256396A (en) Ship target ISAR characteristics of image learning methods based on convolutional neural networks
CN107798345B (en) High-spectrum disguised target detection method based on block diagonal and low-rank representation
CN112699717A (en) SAR image generation method and generation device based on GAN network
Bragilevsky et al. Deep learning for Amazon satellite image analysis
CN113486819A (en) Ship target detection method based on YOLOv4 algorithm
CN110263644A (en) Classifying Method in Remote Sensing Image, system, equipment and medium based on triplet's network
CN111832580A (en) SAR target identification method combining few-sample learning and target attribute features
CN110069987B (en) Single-stage ship detection algorithm and device based on improved VGG network
CN110414494A (en) SAR image classification method with ASPP deconvolution network
Gui et al. A scale transfer convolution network for small ship detection in SAR images
CN117710783A (en) SAR long tail target detection method based on double-drive equalization loss and recessive characteristic enhancement
CN117765418A (en) Unmanned aerial vehicle image matching method
CN113409381A (en) Dual-polarization channel fusion ship size estimation method based on CNN
Zhang et al. Ship detection and recognition in optical remote sensing images based on scale enhancement rotating cascade R-CNN networks
Li et al. SAR image object detection based on improved cross-entropy loss function with the attention of hard samples
CN114049551B (en) ResNet 18-based SAR raw data target identification method
CN115797663A (en) Space target material identification method under complex illumination condition
CN113361439B (en) SAR image ship target identification method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant