CN112668440B

CN112668440B - SAR ship target detection method based on regression loss of balance sample

Info

Publication number: CN112668440B
Application number: CN202011544100.2A
Authority: CN
Inventors: 王英华; 杨振东; 刘宏伟; 唐天顾
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2020-12-24
Filing date: 2020-12-24
Publication date: 2023-02-10
Anticipated expiration: 2040-12-24
Also published as: CN112668440A

Abstract

The invention discloses an SAR ship target detection method based on balance sample regression loss, and mainly solves the problem that a trained network model is low in ship target detection performance due to imbalance of difficult and easy samples in the conventional deep learning method. The implementation scheme is as follows: 1) Acquiring ship data, and dividing the ship data into training data and testing data; 2) Selecting a fast-RCNN network as a training network model; 3) Improving an original loss function of the training network to form a new total loss function; 4) Sending the training data into the network selected in the step 2), and training the network by using a new total loss function to obtain a finally trained network model; 5) And sending the test data to the trained network model to obtain a ship target detection result. The method can better extract the depth characteristics of the ship target, improves the detection performance of the ship target, and can be used for detecting the ship target.

Description

SAR ship target detection method based on regression loss of balance sample

Technical Field

The invention belongs to the technical field of radars, and mainly relates to an SAR image ship target detection method which can be used for subsequent ship target identification and classification.

Background

The synthetic aperture radar is an active imaging sensor and has all-weather, all-time and high-resolution data acquisition capability. And has the characteristics of multiple frequency bands, multiple polarization, variable visual angle, penetrability and the like. At present, the SAR is widely applied to the fields of military reconnaissance, geological survey, topographic mapping and cartography, disaster prediction, marine application, scientific research and the like, and has wide research and application prospects. The Automatic Target Recognition (ATR) of the SAR image is one of important applications of the SAR image. The basic SAR image automatic target recognition ATR system generally comprises three stages of target detection, target identification and target recognition. The performance of target detection will directly affect the performance and efficiency of the identification and classification stages, so it is very important for the research of SAR target detection.

The traditional SAR image target detection algorithm is mainly a constant false alarm rate CFAR method, which determines a detection threshold value according to a pre-established clutter statistical model, but the models have the defects of limited application scenes and low detection efficiency. The current popular deep learning method can automatically obtain the characteristics of data through training for target detection, can ensure the accuracy and the detection speed under the condition of sufficient data, and is more suitable for different scenes, so that the method attracts general attention in researching SAR image ship target detection through the deep learning method.

Jianweii Li et al arranges and releases SSDD SHIP data sets IN AN article published by IEEE (SHIP DETECTION IN SAR IMAGES BASED ON AN IMPROVED FASTER R-CNN) and provides a method for detecting by using AN IMPROVED FASTER-RCNN network.

Yuanyuan Wang et al improve the RetinaNet Detection network in an article Automatic Ship Detection Based on RetinaNet Using Multi-Resolution Gaofen-3 image published by remote sensing, and apply the RetinaNet Detection network to high-Resolution No. 3 SAR image Ship data to obtain better Detection performance. The method solves the problem of unbalance of difficult and easy samples in classification by using a Focal loss classification loss function, however, the unbalance problem in regression is not considered, and the detection result still needs to be further improved.

Disclosure of Invention

The invention aims to provide a SAR ship target detection method based on balance sample regression loss, which aims to solve the problem that a trained network model is poor in ship target detection effect due to imbalance of difficult and easy samples in the conventional deep learning method and improve the ship target detection performance.

The technical scheme of the invention is as follows: firstly, SSDD data is divided into training data and testing data, then a fast-RCNN deep neural network model is trained by using the training data, the network uses an improved regression loss function, and after the model is converged, the trained neural network is applied to the testing data to obtain a final ship detection result, wherein the implementation steps comprise the following steps:

(1) Acquiring an SSDD ship data set, and dividing the SSDD ship data set into training data phi according to the proportion of 8 _x And test data phi _c ；

(2) Selecting a training network model omega formed by sequentially connecting a shared basic module VGG, a region selection module RPN and a region refinement module Fast-RCNN;

(3) Constructing an improved loss function;

(3a) Regressing the RPN blocks of the network in losses

The function is improved as

A function of the form:

wherein j is the jth training sample of the RPN module, t _n Location information predicted for the RPN module for the training samples,

n belongs to { x, y, w, h } and is the position information of the target frame corresponding to the training sample, a is a hyperparameter, and a =2;

(3b) Regressing network Fast-RCNN modules in loss

The function is improved as

A function of the form:

wherein p is the p-th training sample of Fast-RCNN module, e _m The position information predicted for the Fast-RCNN module for the training samples,

and m belongs to { x, y, w, h } for the position information of the target frame corresponding to the training sample.

(3c) Obtaining the total loss function J after the training network is improved according to the improved regression loss functions of (3 a) and (3 b) _s It is represented as follows:

wherein, J _s1 、λ _rpn And N _s1 The classification loss function, the loss balance constant and the number of training samples, J, of the RPN module _s3 、λ _fast And N _s2 For the classification loss function, loss balance constant and training sample number of the Fast-RCNN module,

is the label of whether the jth sample of the RPN block is a positive sample,

is a label for the p-th sample of the Fast-RCNN module.

(4) Will train data phi _x Inputting the loss function into a well-constructed training network model omega, and using a total loss function J of the network _s Training the network model omega until the loss function is converged to obtain a trained network model omega';

(5) Ship test data phi _c And inputting the data into a finally trained network model omega' to obtain a detection result of the ship.

Compared with the prior art, the invention has the following advantages:

the method aims at the problem that regression loss is difficult and samples are unbalanced when the existing deep learning target detection method Faster-RCNN is applied to SAR ship target detection, improves a regression loss function, can better extract the depth characteristics of a ship target, and improves the ship target detection performance.

Drawings

FIG. 1 is a flow chart of an implementation of the present invention;

FIG. 2 is a diagram of a training network model architecture used in the present invention;

FIG. 3 is a graph of the gradient of the regression loss function designed in the present invention.

Detailed Description

The embodiments and effects of the present invention will be described in detail below with reference to the accompanying drawings:

referring to fig. 1, the implementation steps of the invention are as follows:

step 1, acquiring an SSDD ship data set, and dividing training data and test data.

The SSDD ship data set is shot by a plurality of radar satellites and formed by arranging and marking by experts, and the training data phi is divided into the training data phi according to the proportion of 8 _x And test data phi _c 。

And 2, selecting a training network model omega.

The existing target detection networks comprise a YOLO series, an SSD network, a Faster-RCNN network and the like, and the classical Faster-RCNN network is selected as a training network model omega for detection in the example.

Referring to fig. 2, the training network model Ω selected in this example is formed by sequentially connecting a shared basic module VGG, a region selection module RPN, and a region refinement module Fast-RCNN, and each module has the following structure:

2.1 VGG base module:

the module comprises 5 rolled blocks and 4 average pooling layers, namely a first rolled block V1 → a first average pooling layer P1 → a second rolled block V2 → a second average pooling layer P2 → a third rolled block V3 → a third average pooling layer P3 → a fourth rolled block V4 → a fourth average pooling layer P4 → a fifth rolled block V5, and the parameter settings and relationships of the layers are as follows:

2.1 a) convolution block V1, which is formed by two identical convolution blocks in cascade, each convolution block consisting of a two-layer structure, i.e. the first layer being a convolution layer

The second layer is a ReLU activation function layer

i denotes the ith convolution block, i =1,2, where:

first layer of the convolution layer

Its convolution kernel K ₁ The window size of (3X 3), the sliding step length S ₁ Is 1, the filling mode is SAME, and is used for convolving the input and outputting 64 characteristic maps

Activating a function layer as a second layer

The input of (1);

second ReLU activation function layer

For the upper layer

The output of the layer is mapped nonlinearly, and the nonlinear mapping formula is as follows:

ReLU(x)＝max(0,x)

where x is the input and ReLU (x) is the output, the input and output dimensions of the layer are the same.

2.1 b) average pooling layer P1 for down-sampling the input with a down-sampling kernel U ¹ Has a window size of 2 x 2, a sliding step length V ¹ Is 2, 64 feature maps Y are output ₁ As input to the volume block V2 layer.

2.1 c) a convolution block V2 consisting of two identical convolution blocks in cascade, each convolution block consisting of a two-layer structure, i.e. the first layer being a convolution layer

The second layer is a ReLU activation function layer

i denotes the i-th convolution block, i =1,2, where:

first layer of convolutional layer

Its convolution kernel K ₂ Has a window size of 3 x 3, a sliding step length S ₂ The filling mode is SAME, and the filling mode is used for convolving the input and outputting 128 feature maps

As a second layerLayer of activation function

The input of (2);

second ReLU activation function layer

For aligning the upper layer

ReLU(x)＝max(0,x)

2.1 d) average pooling layer P2 for down-sampling the input with down-sampling kernel U ² Has a window size of 2 x 2, a sliding step length V ² Is 2, 128 feature maps Y are output ₂ As input to the volume block V3.

2.1 e) a convolution block V3 consisting of three identical convolution blocks in cascade, each convolution block consisting of a two-layer structure, i.e. the first layer being a convolution layer

The second layer is a ReLU activation function layer

i denotes the ith convolution block, i =1,2,3, where:

first layer of convolutional layer

Its convolution kernel K ₃ Has a window size of 3 x 3, a sliding step length S ₃ The filling mode is SAME, and is used for convolving input and outputting 256 characteristic graphs

Activating a function layer as a second layer

The input of (2);

second ReLU activation function layer

For aligning the upper layer

ReLU(x)＝max(0,x)

2.1 f) average pooling layer P3 for down-sampling the input, with down-sampling kernel U ³ Has a window size of 2 x 2, a sliding step length V ³ To 2, 256 feature maps Y are output ₃ As input to the volume block V4.

2.1 g) convolution block V4, which is cascaded with three identical convolution blocks, each convolution block consisting of a two-layer structure, i.e. the first layer being a convolution layer

The second layer is a ReLU activation function layer

i denotes the ith convolution block, i =1,2,3, where:

first layer of convolutional layer

Its convolution kernel K ₄ Has a window size of 3 x 3, a sliding step length S ₄ Is 1, the filling mode is SAME, and is used for convolving the input and outputting 512 feature maps

Activating a function layer as a second layer

The input of (1);

second ReLU activation function layer

For aligning the upper layer

ReLU(x)＝max(0,x)

2.1 h) an average pooling layer P4 for down-sampling the input with a down-sampling kernel U ⁴ Has a window size of 2 x 2, a sliding step length V ⁴ To 2, 512 feature maps Y are output ₄ As input to the volume block V5.

2.1 i) convolution block V5, which is formed by three identical convolution blocks in cascade, each convolution block consisting of a two-layer structure, i.e. the first layer being a convolution layer

The second layer is a ReLU activation function layer

i denotes the ith convolution block, i =1,2,3, where:

first layer of convolutional layer

Its convolution kernel K ₅ Has a window size of 3 x 3, a sliding step length S ₅ The filling mode is SAME, and the filling mode is used for convolving the input and outputting 512 feature maps

Activating a function layer as a second layer

The input of (1);

second layerReLU activation function layer

For aligning the upper layer

ReLU(x)＝max(0,x)

Last convolution block ReLU activation function layer

And the output is the shared feature F extracted by the VGG basic layer.

2.2 RPN module:

the layers in turn being composed of a shared convolution layer C ₁ Activation function layer C ₂ Two parallel classification branches C ₃ And regression convolutional layer C ₄ The composition, each layer parameter setting and relation are as follows:

shared convolution layer C ₁ Convolution kernel K of ₆ The window size of (3X 3), the sliding step length S ₆ Is 1, the filling mode is SAME, and is used for convolving the input and outputting 512 feature maps Y ₆ As a second layer activation function layer C ₂ The input of (1);

second layer ReLU activation function layer C ₂ For the upper layer C ₁ The output of the layer is mapped nonlinearly, and the nonlinear mapping formula is as follows:

ReLU(x)＝max(0,x)

The classification branch is formed by a classification convolution layer C ₃₁ Softmax classifier layer C ₃₂ The composition, each layer parameter setting and relation are as follows:

classified convolutional layer C ₃₁ Convolution kernel K of it ₇ Has a window size of 1 × 1, a sliding step S ₇ Is 1, the filling mode isSAME for convolving input and outputting 18 characteristic maps Y ₇ ；

Softmax classifier layer C ₃₂ For winding up the layers C to be classified ₃₁ The obtained 18 feature maps Y ₇ And inputting the classification probability vector p into two types of Softmax classifiers.

Regression convolutional layer C ₄ Convolution kernel K of ₈ The window size of (2) is 1 x 1, the sliding step length S ₈ The filling mode is SAME, and the filling mode is used for convolving the input and outputting 36 feature maps b which represent 4 offsets t respectively predicted by the RPN module for 9 preset frames _x ,t _y ,t _w ,t _h ；

According to regression convolution layer C ₄ The output b of the step (a) determines the position of the initial detection frame, screens the detection frame through the classification probability vector p, judges the frame with the score meeting the threshold value as a candidate frame and uses the candidate frame as the input of a next refinement module Fast-RCNN module.

2.3 Fast-RCNN Module:

this layer comprises ROIpooling layer, first full articulamentum F1, the full articulamentum F2 of second, two categorised branch roads F3 that parallel and the full articulamentum F4 of regression in proper order, and each layer parameter sets up and the relation as follows:

a ROIploling layer R for down-sampling the input feature maps of different sizes into the same size, which divides the input feature map into 7 × 7 blocks, selects the maximum value as the output of the current feature block, and outputs 512 feature maps Y ₈ ，Y ₈ The size is 512 × 7 × 7 as input to the first fully-connected layer F1.

The first full-connection layer F1 is provided with 4096 neurons and is used for carrying out nonlinear mapping on the characteristics output by the previous ROIploling layer and outputting a 4096-dimensional column vector;

the second full connection layer F2 is provided with 4096 neurons and is used for carrying out nonlinear mapping on the column vector output by the last full connection layer F1 and outputting a 4096-dimensional column vector;

the classification branch is composed of a classification full-connection layer F31 and a Softmax classifier layer F32 in sequence, and the parameter setting and relation of each layer are as follows:

a classification fully-connected layer F31, which is provided with 2 neurons and is used for carrying out nonlinear mapping on the column vector output by the last fully-connected layer F2 and outputting a 2-dimensional column vector as the input of a Softmax classifier layer F32;

the Softmax classifier layer F32 is used for inputting the 2-dimensional column vectors obtained by classifying the full connection layer F31 into two types of Softmax classifiers to obtain a classification probability vector T, and classifying the candidate frames according to the probability values;

a regression full-connected layer F4 with 4 neurons for performing nonlinear mapping on the output column vector of the previous full-connected layer F2 and outputting a 4-dimensional regression column vector C representing the regression offset e of the candidate frame _x ,e _y ,e _w ,e _h 。

And 3, constructing an improved loss function.

The regression loss in the original loss function for training the network model omega uses smooth _L1 The function has the problem that the regression loss is difficult and easy to sample unbalance during network training, and the network detection performance is influenced, so that the regression loss function needs to be improved, and a new total loss function is constructed.

The new total loss function is composed of the sum of the loss function of the RPN module and the loss function of the Fast-RCNN module, and is constructed as follows:

3.1 Constructing an RPN module loss function:

3.1.1 Preset frame information and target frame information: the RPN module is configured with nine preset frames for each feature point of the shared feature F extracted by the VGG base layer, the nine frames are obtained by three aspect ratios and three dimensions, wherein the width: the height e {1, 1 _a ,y _a ) Is a center coordinate of a preset frame, w _a Is the width of the frame, h _a Height of the frame (x) ^* ,y ^* ) Is the center coordinate of the target frame, w ^* Is the width of the target frame, h ^* Is the height of the target frame;

3.1.2 Calculating the intersection ratio IOU value of each preset frame and all target frames according to the preset frame information and the target frame information:

wherein, A is a preset frame, B is a target frame, A ^ B is the intersection area of the preset frame and the target frame, and A ^ B is the intersection area of the preset frame and the target frame;

3.1.3 Setting an IOU threshold, dividing sample classes of a preset frame:

setting the IOU lower threshold v ₁ Is 0.3, the upper threshold value v ₂ Is 0.7.

If the IOU of the preset frame and a certain target frame is more than or equal to the upper limit threshold value v ₂ Or if the IOU of the preset frame and a certain target frame is the largest of all the preset frames and the IOU of the target frame, the preset frame is a positive sample, and the target frame is allocated to the preset frame;

if the IOU of the preset frame and all the target frames is less than or equal to the lower limit threshold v ₁ If the IOU of the preset frame and a certain target frame is not the largest of all the preset frames and the IOU of the target frame, the preset frame is a negative sample;

if the IOU of the preset frame and all the target frames is larger than the lower limit threshold value v ₁ And are all less than the upper threshold v ₂ If the IOU of the preset frame and the IOU of a certain target frame are not the largest of all the preset frames and the IOU of the target frame, the preset frame is a useless sample and does not participate in training;

3.1.4 According to the preset frame information and the corresponding target frame information, the target offset is calculated

3.1.5 Raw loss function of RPN block:

obtaining a classification probability vector p and a regression feature map b by using an RPN module, wherein the classification probability vector p is the probability of the preset frame as the target and the background, and the regression feature map b is the offset information t predicted by the network for each preset frame _x ,t _y ,t _w ,t _h Obtaining a raw loss function J 'of the RPN module' _rpn ：

J' _rpn ＝J _s1 +λ _rpn J _s2

Wherein, J _s1 As a cross entropy loss function, N _s1 For the total number of training samples of the RPN module, N when training using a batch gradient descent algorithm _s1 The number of samples 256 for a batch is taken,

for the jth sample corresponding to the kth class of labels,

a probability of predicting a jth sample as a kth class for the RPN network; j. the design is a square _s2 In order to be a function of the regression loss,

if the jth sample is a label of the positive sample, and the label is 1, the preset frame is the positive sample, and the regression loss is calculated(ii) a When the label is 0, the preset frame is a negative sample, and the regression loss is 0; t is t _n Is the position information of the preset frame,

n is the position information of the target frame and belongs to { x, y, w, h }, lambda _rpn Is an equilibrium constant.

The function is:

3.1.6 The original loss function is improved):

referring to fig. 3, the loss of the regression part of the RPN module is counted, the loss gradient value generated by a simple sample is obviously larger than the loss gradient value generated by a difficult sample, and the pair is used for solving the problem of unbalance of the difficult and easy sample in ship detection

The function is improved to increase the gradient of the hard sample loss, the

With improved function

Comprises the following steps:

wherein a is a hyperparameter, a =2, and the improved RPN module loss J is obtained _rpn Comprises the following steps:

3.2 Build Fast-RCNN module loss function.

3.2.1 Regression feature map b) predicted from RPN, which represents the offset t of the candidate box _x ,t _y ,t _w ,t _h And calculating the position information x, y, w, h of the candidate frame:

t _x ＝(x-x _a )/w _a ,

t _y ＝(y-y _a )/h _a ,

t _w ＝log(w/w _a ),

t _h ＝log(h/h _a )；

wherein, (x, y) is the center coordinate of the candidate frame, w is the width of the candidate frame, and h is the height of the frame;

3.2.2 Set the IOU threshold, partition the sample class of candidate frames:

setting the IOU threshold v ₃ Is 0.5.

Calculating the IOU of the candidate frame and the target frame according to the position coordinates of the candidate frame, and if the IOU of the candidate frame and all the target frames is less than a threshold value v ₃ The candidate box is a negative sample;

if the intersection ratio of the candidate frame and a certain target frame is larger than the threshold value v ₃ The candidate box is a positive sample, and the target box is allocated to the candidate box;

3.2.3 Calculate a target offset from the candidate box information and the target box information

3.2.4 Raw loss function of Fast-RCNN module:

further position regression and classification are carried out on the front 2000 candidate frame judged to be high in target probability according to the classification probability vector p obtained by the RPN module by using a Fast-RCNN module to obtain a classification probability vector T and a regression column vector C, wherein the classification probability vector is the probability that the candidate frame is judged to be a target and a background, and the regression column vector C is the prediction offset e of the candidate frame _x ,e _y ,e _w ,e _h Obtaining a loss function J 'of Fast-RCNN module' _fast ：

J' _fast ＝J _s3 +λ _fast J _s4

Wherein, J _s3 As a cross-entropy loss function, N _s2 Total number of training samples for Fast-RCNN Module, N when training using batch gradient descent Algorithm _s2 The number of samples taken for a batch is 128,

for the p-th sample corresponding to the kth class of labels,

a probability of predicting the p-th sample as a k-th class for the Fast RCNN network; j. the design is a square _s4 In order to return the loss of the power,

if the p-th sample is a label of a positive sample, the candidate box is a positive sample when the label is 1, and the regression loss is calculated, and if the label is 0, the candidate box is a negative sample, and the regression loss is calculatedIs 0, λ _fast Is an equilibrium constant.

The function is:

3.2.5 Improve the original loss function:

similarly, for solving the problem of unbalanced sample difficulty in ship detection, the pair

With improved function

The function is:

wherein a is a hyper-parameter, a =2, and the loss J of the improved Fast-RCNN module _fast Comprises the following steps:

3.3 ) improved regression loss function J according to 3.1) _rpn And 3.2) improved regression loss function J _fast To obtain the total loss function J after the training network is improved _s ：

Step 4, using the loss function J constructed in the step 3 _s And training the network model omega to obtain a trained network model omega'.

4.1 Will train data Φ _x Input to trainingTraining in the network model omega, training one picture at a time, and calculating the loss function J of the network according to the label of the sent picture _s A value of (d);

4.2 According to step 4.1) loss function J _s Calculating the gradient of the network loss function, and balancing the gradient generated by the difficult and easy sample on the regression loss through the improved loss function;

4.3 According to the gradient generated by the loss function calculated in 4.2) to the network, continuously updating the weight to the direction of reducing the loss function by using a random gradient descent algorithm, and propagating the error of an output layer forward by using a back propagation algorithm to update each layer parameter of the network model omega;

4.4 Step 4.1) -step 4.3) are executed in a loop until the loss function converges, and a well-trained network model Ω' is obtained.

Step 5, testing ship data phi _c And inputting the data into a finally trained network model omega' to obtain a detection result of the ship.

The effects of the present invention can be further illustrated by the following experimental data:

first, experimental conditions

1) Experimental data

The SSDD data set which is sorted and issued by the naval and aviation university of the people's liberation army of China is used in the experiment, the data comprises multi-size ship targets and various imaging conditions, such as different resolutions, sea conditions and sensor types, and the diversity of data set samples enables the trained detector to have better robustness. Data set various imaging conditions are shown in table 1.

Table 1 ship data imaging conditions

In Table 1, RADARSAT-2 is a RADARSAT-2 satellite transmitted in Canada, TERRASAR-X is a TERRASAR-X satellite transmitted in Germany, and SENTINEL-1 is a SENTINEL-1 satellite transmitted in the European Union; HH is the polarization mode of horizontal transmitting and receiving of the satellite, VV is the polarization mode of vertical transmitting and receiving of the satellite, HV is the polarization mode of horizontal transmitting and vertical receiving of the satellite, and VH is the polarization mode of vertical transmitting and horizontal receiving of the satellite.

2) Criteria for evaluation

The experiment is repeated for five times, and the average value of the average accuracy of the detection results of the five times of experiments and the average value of the detection rate are taken to evaluate the experiment results.

Second, the contents of the experiment

The experimental data were compared by the method of the present invention and the two methods in the prior art, and the results of the performance parameter comparison are shown in table 2.

TABLE 2 comparison of Performance parameters of the inventive method with those of the prior art

Comparison method	Average rate of accuracy	Detection rate
			SmoothL1	93.59％	94.38％
BalanceL1	93.83％	93.99％
			The invention	94.88％	95.68％

In table 2: smooth L1 uses Smooth for the fast-RCNN network regression loss function _L1 A method for detecting ship data;

balance L1 as a function of regression loss for the fast-RCNN network A Balance proposed in the article Libra R-CNN, forwards Balanced Learning for Objectdetection _L1 A method for detecting ship data by a regression loss function;

as can be seen from table 2, compared with the existing method, the method of the present invention achieves a better detection effect, because the loss function designed by the method of the present invention can better solve the problem of imbalance of difficult and easy samples, and the network can more accurately learn the characteristics of various samples, the present invention obtains a better detection result than the existing method.

While the foregoing is directed to embodiments of the present invention, other and further embodiments of the invention may be devised without departing from the basic scope thereof, and the scope thereof is determined by the claims that follow.

Claims

1. A SAR ship target detection method based on balance sample regression loss is characterized by comprising the following steps:

(3) Constructing an improved loss function;

(3a) In regressing losses of RPN blocks of the network

The function is improved as

A function of the form:

wherein j is the jth training sample of the RPN module, t _n Location information predicted for the training samples by the RPN module,

the position information of a target frame corresponding to the training sample is n belongs to { x, y, w, h }, a is a hyper-parameter, and a =2;

(3b) Regressing network Fast-RCNN modules in loss

The function is improved as

A function of the form:

the position information of a target frame corresponding to the training sample is m belongs to { x, y, w, h };

(3c) Obtaining the total loss function J after the training network is improved according to the improved regression loss functions of (3 a) and (3 b) _s It is expressed as follows:

wherein, J _s1 、λ _rpn And N _s1 The classification loss function, the loss balance constant and the number of training samples, J, of the RPN module _s3 、λ _fast And N _s2 For the classification loss function, the loss balance constant and the number of training samples of the Fast-RCNN module,

is the label of whether the jth sample of the RPN block is a positive sample,

whether the p sample of the Fast-RCNN module is a label of a positive sample or not;

(4) Will train data phi _x Inputting the total loss function J of the network into a training network model omega _s Training the network model omega until the loss function is converged to obtain a trained network model omega';

2. The method of claim 1, wherein the VGG, the shared basis module for the network model in (2), comprises 5 volume blocks and 4 average pooling layers, in order, first volume block V1 → first average pooling layer P1 → second volume block V2 → second average pooling layer P2 → third volume block V3 → third average pooling layer P3 → fourth volume block V4 → fourth average pooling layer P4 → fifth volume block V5, and the layer parameters are set and related as follows:

the 4 pooling layers have the same structure and are used for down-sampling input, the window size of a down-sampling kernel is 2 multiplied by 2, the sliding step length is 2, the number of output feature maps is consistent with the input, and the output feature maps are used as the input of the next volume block;

the first convolution block V1 and the second convolution block V2 have the same structure and are formed by cascading two identical convolution blocks, wherein each convolution block is composed of a convolution layer L _i1 And ReLU activation function layer L _i2 Two-layer structure composition, i denotes the ith convolution block, i =1,2;

the third convolution block V3, the fourth convolution block V4 and the fifth convolution block V5 have the same structure and are formed by cascading three identical convolution blocks, and each convolution block is of a two-layer structure, namely the first layer is a convolution layer T _j1 The second layer is a ReLU activation function layer T _j2 J denotes the jth volume block, j =1,2,3.

3. The method of claim 1, wherein the region selection module RPN of the network model in (2) is sequentially selected from the shared convolution layer C ₁ Activation function layer C ₂ Two parallel classification branches C ₃ And regression convolutional layer C ₄ Composition of the classification branch C ₃ Sequentially from the classified convolution layer C ₃₁ And Softmax classifier layer C ₃₂ A component for obtaining a classification probability vector p; the regression convolutional layer C ₄ For convolution of the input to obtain 36 position prediction characteristic maps b.

4. The method according to claim 1, wherein in (2) the regional refinement module Fast-RCNN of the network model consists of a ROI posing layer, a fully-connected layer F1, a fully-connected layer F2, two juxtaposed classification legs F3 and a regressive fully-connected layer F4 in sequence, the classification legs F3 consisting of a fully-connected layer F31 and a Softmax classifier layer F32 in sequence for deriving the classification probability vector t, the regressive fully-connected layer F4 outputting a 4-dimensional regressive column vector C representing the regression offset e of the candidate box _x ,e _y ,e _w ,e _h 。

5. According to the claimThe method of claim 1, the classification penalty function J of the RPN block of (3 c) _s1 Expressed as follows:

wherein N is _s1 For the total number of training samples of the RPN module, N when training using a batch gradient descent algorithm _s1 Taking the number of samples 256 for a batch,

for the jth sample corresponding to the kth class of labels,

probability of predicting the jth sample as class k for the RPN network.

6. The method according to claim 1, wherein the classification loss function J of Fast-RCNN module in (3 c) _s3 Expressed as follows:

wherein N is _s2 Total number of training samples for Fast-RCNN Module, N when training Using batch gradient descent Algorithm _s2 The number of samples taken for a batch is 128,

for the p-th sample corresponding to the kth class of labels,

the probability of predicting the p-th sample as class k for the Fast-RCNN module.

7. The method of claim 1, wherein (4) the training network model Ω is trained by:

4a) Sending the training data into a network model omega for training, training one picture at a time, and calculating a network loss function J according to a label sent into the picture _s A value of (d);

4b) Calculating the loss function gradient of the network, and balancing the gradient generated by the difficult and easy sample on regression through the improved loss function;

4c) According to the gradient generated by the loss function calculated in 4 b) on the network, continuously updating the weight in the direction of reducing the loss function by using a random gradient descent algorithm, and propagating the error of the output layer forward by using a back propagation algorithm to update each layer parameter of the network model omega;

4d) Loop through 4 a) -4 c) until the loss function J) is reached _s And converging to obtain a trained network model omega'.