CN110766058A

CN110766058A - Battlefield target detection method based on optimized RPN (resilient packet network)

Info

Publication number: CN110766058A
Application number: CN201910965047.4A
Authority: CN
Inventors: 肖秦琨; 邓雪亚
Original assignee: Xian Technological University
Current assignee: Xi'an Keduoduo Information Technology Co.,Ltd.
Priority date: 2019-10-11
Filing date: 2019-10-11
Publication date: 2020-02-07
Anticipated expiration: 2039-10-11
Also published as: CN110766058B

Abstract

The invention relates to a battlefield target detection method based on an optimized RPN (resilient packet network), which comprises the following steps of 1, constructing a tank armor target data set, and respectively labeling a training data set and a tank armor target on a test data set; 2. initializing a model on the ImageNet data set to train the VGG-16 network; 3. generating a sharing characteristic graph; 4. obtaining target candidate areas with different sizes and proportions; 5. obtaining candidate regions through an RPN, respectively calculating errors between the candidate regions obtained on the two convolution layer characteristic graphs and a real frame, selecting a candidate frame with the minimum error, and finally selecting a candidate region with high accuracy from the candidate region with the minimum error as an optimized target candidate region; 6. and finishing the judgment of the target type and the regression correction of the target boundary box. The invention effectively improves the effectiveness of extracting the candidate region from the small target and the target with shielding influence, thereby improving the precision of battlefield target detection.

Description

Battlefield target detection method based on optimized RPN (resilient packet network)

Technical Field

The invention belongs to the technical field of battlefield target detection, and particularly relates to a battlefield target detection method based on an optimized RPN (resilient packet network).

Background

At present, the detection of tank armor targets in battlefield still uses a method of manual discovery and aiming to calibrate the targets, and then carries out target tracking to achieve the purpose of accurate striking. The accurate detection of the battlefield target is a necessary premise for realizing the accurate attack of the battlefield target. The complexity of battlefield environment and the automatic detection of battlefield targets are the difficult problems of realizing the intellectualization of the tank armored vehicle combat system. In the detection and research of battlefield targets, the research methods in recent years are mainly divided into: based on an artificial model detection algorithm; a detection method based on R-CNN; a Fast R-CNN-based detection method and a Fast R-CNN-based detection method.

The method of the artificial model only contains information such as color histogram, textural features and the like of the image, does not have deep feature abstraction capability, does not accurately depict the target, causes inaccurate generated candidate frames and does not achieve ideal target detection results; the R-CNN detection method adopts the SS search method to extract the candidate frames, the calculated amount is large, the time consumption problem exists, the extracted candidate frames are all zoomed to a uniform size, and the original information of the image is easy to lose due to the characteristic extraction; the method for extracting the candidate box by the Fast R-CNN algorithm is the same as the R-CNN method, the time consumption problem exists in the extraction of the suggested region, the network training is not end-to-end, and the quality of the candidate box cannot be improved by the back propagation algorithm; the Faster R-CNN algorithm adopts the RPN network to extract the candidate frame, so that the problem of time consumption for extracting the candidate frame is solved, but the detection precision of the small target and the shielded target is not high in the battlefield target detection, and the false detection or the accurate detection precision of the small target and the shielded target is not high due to the fact that the characteristic extraction is not comprehensive enough and influences are generated on the candidate frame. Due to the difference of the visual field ranges, targets with different scales can appear, the sizes of the targets are different, the detection precision is greatly influenced, and the problem that the detection precision is not high can appear in small targets. Factors such as target occlusion and dust and smoke also influence the extraction of the candidate frame, resulting in low target detection accuracy.

Disclosure of Invention

The invention aims to provide a battlefield target detection method based on an optimized RPN network, which solves the problems that the extraction of a candidate frame is not accurate, and the detection accuracy of small targets and targets with occlusion is not accurate.

The technical scheme adopted by the invention is as follows:

a battlefield target detection method based on an optimized RPN network is implemented according to the following steps:

step 1: constructing a tank armor target data set conforming to the PASCAL VOC data set format, and respectively labeling a training data set and a tank armor target on a test data set;

step 2: initializing a model on the ImageNet data set to train the VGG-16 network;

and step 3: inputting a training data set of a target image into a convolutional neural network, and respectively extracting target features on convolutional layer conv3-3 and conv5-3 layers to generate a shared feature map;

and 4, step 4: respectively sliding windows with different sizes on the shared characteristic diagram generated by the two convolution layers by using a candidate region extraction network to obtain target candidate regions with different sizes and proportions;

and 5: obtaining candidate regions through an RPN, respectively calculating errors between the candidate regions obtained on the two convolution layer characteristic graphs and a real frame, selecting a candidate frame with the minimum error, and finally selecting a candidate region with high accuracy from the candidate region with the minimum error as an optimized target candidate region;

step 6: and (5) sending the optimized target candidate region obtained in the step (5) and the shared characteristic graph corresponding to the convolution layer into a detection network to finish target class judgment and regression correction of a target boundary frame.

Further, step 1 is specifically implemented according to the following steps:

(11) firstly, the size of the target object is judged, and the larger value of the width and the height of the target object is assumed to be recorded as P_maxThe pixels divide the targets into three types of targets according to the size of the targets in the visual field, and the size standard of the target classification is as follows:

(12) calibrating targets for targets with different sizes: and respectively calibrating the training data set and the test data set of the tank and the armored vehicle.

Further, step 3 is specifically implemented according to the following steps:

(31) the convolutional neural network is a main network of the target detection network, target features are extracted, and shared convolutional layers are generated, wherein the input of each convolutional layer is as follows:

Z^l＝W^lX^l-1+b^l(2)

(32) the output of the l-th layer convolution is:

X^l＝f(Z^l)＝f(W^lX^l-1+b^l) (3)

(33) the total error of the convolutional layer is:

continuously optimizing parameters W and b of the neural network by a gradient descent method;

(34) the parameters W and b are respectively graded according to equation (5) to obtain:

where e represents the inner product of two vectors,

representing a derivation symbol, and calculating values of parameters W and b;

(35) and continuously adjusting parameters of the network to enable the extraction of the target characteristics to be more accurate, obtaining the output of the convolution layer, namely the characteristics of the target, through the function of the activation function of the formula (3) on the convolution result, connecting the characteristics of the target through a full connection layer to form a shared characteristic diagram, and obtaining the shared characteristic diagram by the convolution layer conv3-3 and the conv5-3 layers respectively.

Further, step 4 is specifically implemented according to the following steps:

(41) setting different sliding windows on the shared characteristic graphs of the convolution conv3-3 layer and the convolution conv5-3 layer respectively, setting the sliding window with the size of 5 multiplied by 5 on the conv3-3 layer by the RPN network, and setting the sliding windows with the sizes of 7 multiplied by 7 and 9 multiplied by 9 on the conv5-3 layer;

(42) anchor frames with different scales and proportions are arranged on the sliding window, so that W multiplied by H multiplied by k anchor frames can be obtained;

(43) generating region candidate frames through sliding windows, wherein candidate regions generated by the conv3-3 and conv5-3 layers are respectively represented by a suggestion region 1 and a suggestion region 2, features in each sliding window are mapped to corresponding low-dimensional features, the low-dimensional features are subjected to a ReLU activation function to obtain vectors, the vectors are input into two convolution layers respectively, namely a candidate region classification judgment layer (cls) and a candidate region position regression layer (reg), the classification cls layer represents the probability value that each candidate region is a target, the output of the probability value is 2k, and the reg layer represents the position regression coordinates of k frames, and the output of the regression coordinates is 4 k.

Further, step 5 is specifically implemented according to the following steps:

(51) respectively calculating error values between all target candidate boxes and real boxes of convolution conv3-3 and conv5-3 layers, namely, the minimum value of a loss function, and specifically implementing the following steps:

(511) the boundaries of the prediction box and the anchor box of 4 coordinates fall back:

the boundaries of the anchor frame and the real frame with 4 coordinates fall back:

wherein: x, y, w and h represent the center coordinates and width and height of the frame; x, x_aAnd x^*Respectively representing the coordinates of the prediction frame, the anchor frame and the real frame; y, y_aAnd y^*Respectively representing the coordinates of the prediction frame, the anchor frame and the real frame; w, w_a，w^*Respectively representing the widths of the prediction frame, the anchor frame and the real frame; h, h_a，h^*Respectively represent a prediction frame,The height of the anchor frame and the real frame;

(512) the loss function is used for judging the error between the candidate frame and the real frame, continuously training network adjusting parameters by adopting a gradient descent method, and defining the minimum loss function as follows for the same loss functions of convolution conv3-3 and conv5-3 layers:

wherein the content of the first and second substances,{p_iand t_iThe outputs of the cls layer and the reg layer are respectively, and i represents the serial number of the anchor frame; j represents the number of convolutional layers; t is t_iRepresents a predicted offset; t is t_i ^*Representing the offset of the anchor frame and the real frame;

(513) continuously training an RPN (resilient packet network), and optimizing the value of a loss function to the minimum to obtain a target candidate region with high accuracy; calculating the minimum loss function L of the convolution conv3-3 layer according to the formula (17)₃({p_i},{t_i}) and the minimum loss function L corresponding to the conv5-3 layers₅({p_i},{t_i})；

(52) Comparing the minimum values of the two layers of loss functions to select a smaller loss function, and expressing the minimum value of the two loss functions by L, wherein the minimum value is expressed as:

L＝min{L₃({p_i},{t_i}),L₅({p_i},{t_i})} (21)

the L value is obtained as the candidate region after the final optimization, that is, the candidate region with the highest accuracy, which is the minimum loss function on the two convolutional layers.

Further, step 6 is specifically implemented according to the following steps:

(61) inputting the target candidate region after RPN network optimization and the shared feature map of the corresponding convolution layer into a detection network, and extracting region features at the ROI pooling layer;

(62) inputting the region characteristics into a subsequent softmax layer, performing classification judgment and regression correction of a target boundary on each target candidate region, detecting loss functions of targets in the network, including a target classification loss function and a position boundary loss function, wherein the loss functions of the network are as follows:

wherein L 'represents a detection network loss function, a target classification loss function L'_clsComprises the following steps:

wherein | S₊I represents the number of positive samples, | S_-I represents the number of negative samples, a boundary regression loss function is the same as the RPN network boundary loss function, and the loss function is continuously optimized to continuously correct the target classification judgment and the regression of the boundary frame to obtain an optimized target classification and boundary regression value;

(63) through training of a large number of data sets, parameters of the network are continuously adjusted by adopting a gradient descent method, and finally the total loss of the network is minimized, wherein the total loss function is L^*The representation, i.e. the total loss function of the network, is:

L^*＝L+L′ (24)

training the network to L^*The minimum value is reached;

(64) and carrying out target detection on the test data set by using the trained detection network.

Compared with the prior art, the invention has the advantages that:

the improved RPN is used for screening the candidate regions, the optimized candidate regions are selected, the generation of invalid candidate regions by the RPN is reduced, the effectiveness of extracting the candidate regions from small targets and targets with shielding influence is effectively improved, and the precision of battlefield target detection is further improved. Because the convolution conv3-3 and conv5-3 layers respectively generate the shared characteristic graph and are combined with the optimized RPN network for use, the invention has high detection precision on small targets and occlusion targets.

Description of the drawings:

FIG. 1 is an overall framework of the present invention;

fig. 2 is an RPN network structure of the present invention.

The specific implementation mode is as follows:

the present invention will be described in detail below with reference to the drawings and examples.

Referring to fig. 1, a battlefield target detection method based on an optimized RPN network is specifically implemented according to the following steps:

step 1: constructing a tank armor target data set conforming to the PASCAL VOC data set format, and respectively labeling the tank armor targets on the training data set and the test data set, wherein the method specifically comprises the following steps:

(11) first, the size of the target object is determined. Let the larger value of the width and height of the target object be noted as P_maxThe pixels can divide the targets into three types of targets according to the size of the targets in the visual field, and the size standard of the target classification is as follows:

(12) and calibrating the targets for targets with different sizes. Respectively calibrating a training data set and a testing data set of the tank and the armored vehicle;

and step 3: inputting a training data set of a target image into a convolutional neural network, extracting target features on convolutional layer conv3-3 and conv5-3 layers respectively, and generating a shared feature map, wherein the method specifically comprises the following steps:

Z^l＝W^lX^l-1+b^l(2)

wherein Z, W, X are all in matrix form, l represents the l-th layer convolution, Z^lDenotes the l-th layerInput of convolution, W^lRepresents the weight from layer l-1 to layer l, X^l-1Represents the output of the l-1 th layer convolution, b^lRepresents the bias of the l-th layer;

(302) the output of the l-th layer convolution is:

X^l＝f(Z^l)＝f(W^lX^l-1+b^l) (3)

wherein f (-) represents an activation function;

(303) the total error of the convolutional layer is:

wherein | x | purple₂Represents a 2 norm of x, i.e., is

Continuously optimizing parameters W and b of the neural network by a gradient descent method, and substituting the formula (3) into the formula (4) to obtain:

where e represents the inner product of two vectors,

expressing the derivation sign, and solving the values of the parameters W and b, wherein the specific derivation steps are as follows:

(341) order:

then, formula (8) is substituted for formula (6) and formula (7), respectively:

(342) the parameters W and b are required to be obtained, and only delta is required to be obtained^lThat is, there are according to the chain-type derivation rule:

according to the formula (2):

derived from equation (12):

formula (13) is substituted for formula (11) to obtain:

finding delta^lThe gradients of the parameters W and b are obtained by substituting (9) and (10).

Referring to fig. 2, in the improved RPN network structure, different sliding windows are set on feature maps of convolution conv3-3 and conv5-3 layers to obtain candidate regions, errors between each layer and a real frame are respectively calculated to select smaller values, and then a candidate region with high accuracy is selected from the smaller values to serve as an optimized candidate region. The specific implementation steps are as follows 4 and 5:

and 4, step 4: respectively sliding windows with different sizes on the shared characteristic graphs generated by the two convolution layers by using a candidate region extraction network (RPN) to obtain target candidate regions with different sizes and proportions, and specifically implementing the following steps:

(41) different sliding windows are respectively arranged on the shared feature maps of the convolution conv3-3 layer and the convolution conv5-3 layer, the conv3-3 layer is provided with a sliding window with the size of 5 x 5, the conv5-3 layer is provided with sliding windows with the sizes of 7 x 7 and 9 x 9, so that the conv3-3 layer and the conv5-3 convolution layer are selected to obtain the convolution features of the target image, because the layer3 is mainly the texture feature of the learning target, the layer5 learns the characteristic feature and all features of the target object, and the small target and the occlusion object are provided with the sliding window with the size of 5 x 5 on the low-layer convolution, so that the feature extraction of the small target and the occlusion target is facilitated;

(42) arranging anchor frames with different scales and proportions on a sliding window, and designing 512 by the invention²，256²，128²And 64²The anchor frame of size, every anchor frame sets up three kinds of proportion sizes, is 1 respectively: 2. 2: 1 and 1: 1, obtaining 12 anchor frames at each pixel position, expressing the number of the anchor frames obtained by each pixel by k, and obtaining W multiplied by H multiplied by k anchor frames for a convolution characteristic diagram with the size of W multiplied by H;

(43) generating region candidate frames through sliding windows, wherein candidate regions generated by the conv3-3 layer and the conv5-3 layer are respectively represented by a suggestion region 1 and a suggestion region 2, features in each sliding window are mapped to corresponding low-dimensional features, the low-dimensional features are subjected to a ReLU activation function to obtain vectors, the vectors are input into two convolution layers respectively, namely a candidate region classification judgment layer (cls) and a candidate region position regression layer (reg), the classification cls layer represents the probability value that each candidate region is a target, the output of the probability value is 2k, the reg layer represents that position regression coordinates of k frames are output, the output of the position regression coordinates is 4k, and the target candidate region containing the position regression coordinates and category judgment is obtained.

And 5: obtaining candidate regions through an RPN, respectively calculating errors between the candidate regions obtained on the two convolutional layer characteristic graphs and a real frame, selecting a candidate frame with the minimum error, and finally selecting a candidate region with high accuracy from the candidate region with the minimum error as an optimized target candidate region, wherein the method specifically comprises the following steps:

(51) calculating error values between all target candidate boxes and real boxes of convolution conv3-3 and conv5-3 layers, namely, the minimum value of a loss function;

the step (51) is specifically implemented according to the following steps:

wherein: x, y, w and h represent the center coordinates and width and height of the frame; x, x_aAnd x^*Respectively representing the coordinates of the prediction frame, the anchor frame and the real frame; y, y_aAnd y^*Respectively representing the coordinates of the prediction frame, the anchor frame and the real frame; w, w_a，w^*Respectively representing the widths of the prediction frame, the anchor frame and the real frame; h, h_a，h^*Respectively representing the heights of the prediction frame, the anchor frame and the real frame;

wherein the content of the first and second substances,

{p_iand t_iI denotes the number of anchor frames, j denotes the convolution layer of the second layer, t_iIndicates the predicted offset, t_i ^*Representing the offset of the anchor frame and the real frame, wherein the classification loss function is:

the regression loss function is:

wherein the R function is defined as:

two moieties of formula (17) are represented by N_clsAnd N_regNormalizing the classification loss term and the regression loss term and weighting by a balance parameter lambda, defaulting to N_clsSet to 512, N_regSetting the number of the anchor frames;

(52) Comparing the minimum values of the two layers of loss functions to select a smaller loss function, namely corresponding to the candidate region after final optimization, and expressing the minimum value of the two by L, wherein the minimum value is expressed as:

L＝min{L₃({p_i},{t_i}),L₅({p_i},{t_i})} (21)

Step 6: sending the optimized target candidate region obtained in the step 5 and the shared feature map of the corresponding convolution layer into a detection network, wherein the detection network consists of an ROI (region of interest) pooling layer and a target classification and regression layer, extracting region features from the optimized candidate region, continuously adjusting parameters of the network, inputting a test data set into the detection network, and finishing target class judgment and regression correction of a target boundary frame, and the method is implemented according to the following steps:

(62) inputting the region characteristics into a subsequent softmax layer, and performing category judgment and regression correction of a target boundary on each target candidate region. Loss functions of objects also exist in the detection network, and the loss functions also comprise object classification loss functions and position boundary loss functions. The loss function of the detection network is:

wherein | S₊I represents the number of positive samples, | S_-And | represents the number of negative samples. The boundary regression loss function is the same as the RPN network boundary loss function, and the loss function is continuously optimized to continuously correct the target classification discrimination and the regression of the boundary frame to obtain the optimized target classification and boundary regression value;

L^*＝L+L′ (24)

training the network to L^*The minimum value is reached;

It will be apparent to those skilled in the art that various modifications and adaptations can be made without departing from the principles of the invention and these are intended to be within the scope of the invention.

Claims

1. A battlefield target detection method based on an optimized RPN network is characterized by being implemented according to the following steps:

2. The optimized RPN network-based battlefield target detection method as claimed in claim 1, wherein step 1 is specifically implemented according to the following steps:

3. The method for detecting the battlefield target based on the optimized RPN network as claimed in claim 1 or 2, wherein step 3 is implemented according to the following steps:

Z^l＝W^lX^l-1+b^l(2)

(32) the output of the l-th layer convolution is:

X^l＝f(Z^l)＝f(W^lX^l-1+b^l) (3)

(33) the total error of the convolutional layer is:

where e represents the inner product of two vectors,

representing a derivation symbol, and calculating values of parameters W and b;

4. The optimized RPN network-based battlefield target detection method as claimed in claim 3, wherein step 4 is specifically implemented according to the following steps:

5. The optimized RPN network-based battlefield target detection method as claimed in claim 4, wherein step 5 is specifically implemented according to the following steps:

wherein the content of the first and second substances,

{p_iand t_iThe outputs of the cls layer and the reg layer are respectively, and i represents the serial number of the anchor frame; j represents the number of convolutional layers; t is t_iRepresents a predicted offset;

representing the offset of the anchor frame and the real frame;

L＝min{L₃({p_i},{t_i}),L₅({p_i},{t_i})} (21)

6. The optimized RPN network-based battlefield target detection method as claimed in claim 5, wherein step 6 is implemented according to the following steps:

L^*＝L+L′ (24)

training the network to L^*The minimum value is reached;