CN108537286B

CN108537286B - Complex target accurate identification method based on key area detection

Info

Publication number: CN108537286B
Application number: CN201810345899.9A
Authority: CN
Inventors: 王田; 李玮匡; 李嘉锟; 陶飞
Original assignee: Beihang University
Current assignee: Beihang University
Priority date: 2018-04-18
Filing date: 2018-04-18
Publication date: 2020-11-24
Anticipated expiration: 2038-04-18
Also published as: CN108537286A

Abstract

The invention relates to a complex target accurate identification method based on key area detection, which comprises the following steps: the method comprises the steps of performing fusion training on the whole neural network by using a cross training method, extracting target features by using a convolutional neural network, detecting key regions of a complex target by using a detection sub-network and taking an anchor square frame as reference, pooling the key regions into feature maps with fixed sizes by using regional standard pooling, classifying the key regions by using a classification sub-network, and fusing classification results of all key regions so as to achieve accurate identification of the target. The whole network comprises a key area detection sub-network and a key area classification sub-network, the key area with distinguishing degree of the complex target is detected by the detection sub-network, then the key area is classified by the classification sub-network, and the classification result of each area is fused to identify the whole target. The two sub-networks share the features extracted by the VGG convolutional neural network, so that the complex target is identified quickly and accurately.

Description

Complex target accurate identification method based on key area detection

Technical Field

The invention relates to an image processing technology, in particular to a complex target accurate identification method based on key area detection.

Background

The classification and identification of complex objects is an important and fundamental task in the field of computer vision. Most parts of different kinds of complex targets are often the same or similar, and the differences are often reflected in some local key areas, so that a large amount of interference and redundant information exist in the images of the complex targets. The existing classification and identification methods for complex targets have the problem of low accuracy because the interference and redundant information in the complex target images cannot be removed. In order to realize accurate classification and identification of complex targets, the research of a complex target accurate identification method based on key region detection is of great significance.

Disclosure of Invention

In view of this, the present invention provides a method for accurately identifying a complex target in a high-accuracy basic key region detection, which greatly improves the detection accuracy and ensures the rapidity of identification.

In order to achieve the purpose, the technical scheme provided by the invention is as follows: a complex target accurate identification method based on key area detection is realized by the following steps:

step 1, reading a complex target picture, a coordinate label of a key area of a complex target and a complex target classification label in a training sample in a database, and performing fusion training on a complex target accurate identification network by using a cross training method.

And 2, taking the complex target picture to be recognized as the input of the complex target accurate recognition network trained in the step 1, and extracting features through a VGG convolutional neural network to obtain a feature map of the complex target picture to be recognized.

Step 3, inputting the feature map obtained in the step 2 into a key area detection sub-network, sliding the feature map by using the sub-network with the size of 3 multiplied by 3, detecting the key area of the complex target picture by using the anchor frame as reference, and giving out the keyPrediction blocks of regions and probability P of being a critical region_is,P_not；

And 4, filtering the detected regions with higher overlapping degree by adopting non-maximum inhibition, and only keeping the possibility P of being the key region when the ratio of the intersection part area and the union part area of different prediction blocks exceeds a specified threshold IOU _ threshold_isThe largest prediction block, while the other blocks are filtered;

step 5, setting the possibility P of the key area_isWill be the critical area probability P_isMapping the area larger than the set threshold value P _ threshold to a feature map extracted by the VGG network;

step 6, performing area standard pooling on the areas mapped on the feature maps obtained in the step 5, and pooling the detected areas with different sizes into feature maps with fixed sizes;

step 7, taking the feature map with fixed size obtained in the step 6 as the input of a classification sub-network, accurately classifying the feature map by using the classification sub-network, and normalizing the classification result by using a softmax function to obtain the probability of classifying the key region;

and 8, taking the mean value of the classified corresponding probabilities of the key regions obtained in the step 7 for the same complex target corresponding to the same picture, and fusing to obtain an accurate identification result of the complex target type.

In step 1, the whole network cross training process is as follows:

step 11, fine tuning is carried out on the basis of taking ImageNet database pictures as training samples and taking weights of VGG networks trained aiming at classification tasks as initial weights;

step 12, reading the complex target picture and the coordinate label of the key area corresponding to the complex target picture, and training the sub-network for detecting the key area, wherein the loss function of the training is L ═ L_P+L_regWherein L is_PProbability P of detecting whether sub-network output is a key area for the key area_is,P_notCross entropy with the true value of the tag, L_regIs a critical areaThe square sum of the coordinate offset of the detection area output by the domain detection subnetwork and the coordinate offset of the actual key area in the label;

step 13, reading the complex target picture and the classification label corresponding to the complex target picture, training a classification sub-network, wherein the training loss function is the cross entropy between the network output classification result and the actual label result;

step 14 repeats steps 12 and 13 several times, cross-training the key area detection sub-network and the classification sub-network until the network is stable.

In step 3, the method for detecting the key area is as follows:

step 31, using a sliding window with the size of 3 × 3, sliding on the feature map obtained in step 2, and obtaining a 512-dimensional vector at each position;

step 32, setting 9 anchor boxes as reference at the position of each sliding window, setting the length-width ratio of the anchor boxes to be three proportions according to 1:2, 1:1 and 2:1, and setting the area size to be 128²、256²、512²The pixel has three sizes, and the center point of the anchor frame is the center of the sliding window;

step 33, outputting 9 vectors of 6 dimensions from the 512-dimensional vectors obtained at each sliding window position through a full-connection network; each vector represents the offset d of the coordinates of the center point, the length and the width of the detection area with respect to a reference anchor block_x,d_y,d_l,d_wAnd whether it is a critical area probability P_is,P_notWherein: d_x＝(x-x_a)/l_a，d_y＝(y-y_a)/w_a， d_l＝log(l/l_a)，d_w＝log(w/w_a) X, y, l, w represent coordinates of the center point of the detected region, length and width, x_a,y_a,l_a,w_aRepresenting the coordinates, length and width, P, of the center point of the reference anchor region_is,P_notPerforming normalization processing by using a softmax function;

step 34 offset d from the net regression_x,d_y,d_l,d_wCoordinate with center point of anchor box, length and width x_a,y_a,l_a,w_aAnd calculating the actual central point coordinates, length and width x, y, l and w of the detection area.

In step 6, the method for standard pooling of the regions is as follows:

step 61, representing the size of the region to be pooled as m × n, dividing the region to be pooled into 7 × 7 small lattices with the size of about m/7 × n/7, and rounding up approximately according to rounding up when m/7 or n/7 cannot be rounded up;

in step 62, in each small cell divided in step 61, features in the small cells are pooled into 1 × 1 dimension by using the maximum pooling method, and thus, feature regions of different sizes are pooled into a 7 × 7 dimension fixed-size feature map.

In summary, the method for accurately identifying a complex target based on key area detection according to the present invention includes: the method comprises the steps of performing fusion training on the whole neural network by using a cross training method, extracting target features by using a convolutional neural network, detecting key regions of a complex target by using a detection sub-network and taking an anchor square frame as reference, pooling the key regions into feature maps with fixed sizes by using regional standard pooling, classifying the key regions by using a classification sub-network, and fusing classification results of all key regions so as to achieve accurate identification of the target. The whole network comprises a key area detection sub-network and a key area classification sub-network, the key area with distinguishing degree of the complex target is detected by the detection sub-network, then the key area is classified by the classification sub-network, and the classification result of each area is fused to identify the whole target. The two sub-networks share the features extracted by the VGG convolutional neural network, so that the complex target is identified quickly and accurately.

Compared with the prior art, the invention has the advantages that:

(1) precision: many different complex objects tend to be similar in most places, while their differences tend to be in locally critical areas. The traditional target identification method takes the whole picture as the input of a classification network, and the whole picture contains a large amount of redundant information and interference information, which limits the accuracy of target identification. The method uses the detection sub-network to detect the key area firstly, then uses the classification sub-network to identify the key area, and fuses the identification results of all the key areas to achieve the effect of accurate target identification.

(2) Rapidity: the invention adopts a deep neural network to extract the characteristics of an original image, and a detection sub-network and a classification sub-network share the characteristics extracted by the same neural network. In the training process, the whole network is trained by adopting a cross training method. In the testing process, the detection sub-network and the classification sub-network share the features extracted by the same neural network, so that the parameter quantity and the calculated quantity of the network are greatly reduced, and the rapid target identification effect can be achieved.

Drawings

Fig. 1 is a schematic flow chart of the implementation of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings and specific embodiments.

As shown in fig. 1, the present invention specifically implements the following steps:

step 1, reading a complex target picture, a coordinate label of a key area corresponding to the complex target picture and a classification label corresponding to the complex target picture in a training sample in a database, and performing fusion training on a complex target accurate identification network by using a cross training method;

step 2, taking the complex target picture to be recognized as the input of the complex target accurate recognition network trained in the step 1, and extracting features through a VGG convolutional neural network to obtain a feature map of the complex target picture to be recognized;

step 3, inputting the feature map obtained in the step 2 into a key area detection sub-network, sliding the feature map by using the sub-network with the size of 3 multiplied by 3, detecting a key area of the complex target picture by using the anchor frame as a reference, and giving a prediction frame of the key area and the possibility P of whether the key area is the key area_is,P_not；

In step 1, the whole network cross training process is as follows:

step 12, reading the complex target picture and the coordinate label of the key area corresponding to the complex target picture, and training the sub-network for detecting the key area, wherein the loss function of the training is L ═ L_P+L_regWherein L is_PProbability P of detecting whether sub-network output is a key area for the key area_is,P_notCross entropy with the true value of the tag, L_regThe square sum of the coordinate offset of the detection area output by the sub-network for detecting the key area and the coordinate offset of the actual key area in the label;

In step 3, the method for detecting the key area is as follows:

step 33, obtaining 512 dimensions of each sliding window positionOutputting 9 vectors with 6 dimensions through a full-connection network; each vector represents the offset d of the coordinates of the center point, the length and the width of the detection area with respect to a reference anchor block_x,d_y,d_l,d_wAnd whether it is a critical area probability P_is,P_notWherein: d_x＝(x-x_a)/l_a，d_y＝(y-y_a)/w_a， d_l＝log(l/l_a)，d_w＝log(w/w_a) X, y, l, w represent coordinates of the center point of the detected region, length and width, x_a,y_a,l_a,w_aRepresenting the coordinates, length and width, P, of the center point of the reference anchor region_is,P_notPerforming normalization processing by using a softmax function;

In step 6, the process of area standard pooling is as follows:

In summary, the above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A complex target accurate identification method based on key area detection is characterized by comprising the following steps:

2. The method for accurately identifying the complex target based on the key area detection as claimed in claim 1, wherein: in the step 1, the cross training process is as follows:

3. The method for accurately identifying the complex target based on the key area detection as claimed in claim 1, wherein: the step 3 specifically includes:

step 32, setting 9 anchor boxes as reference at the position of each sliding window, setting the length-width ratio of the anchor boxes to be three proportions according to 1:2, 1:1 and 2:1, and setting the area sizeIs 128²、256²、512²The pixel has three sizes, and the center point of the anchor frame is the center of the sliding window;

step 33, outputting 9 vectors of 6 dimensions from the 512-dimensional vectors obtained at each sliding window position through a full-connection network; each vector represents the offset d of the coordinates of the center point, the length and the width of the detection area with respect to a reference anchor block_x,d_y,d_l,d_wAnd whether it is a critical area probability P_is,P_notWherein: d_x＝(x-x_a)/l_a，d_y＝(y-y_a)/w_a，d_l＝log(l/l_a)，d_w＝log(w/w_a) X, y, l, w represent coordinates of the center point of the detected region, length and width, x_a,y_a,l_a,w_aRepresenting the coordinates, length and width, P, of the center point of the reference anchor region_is,P_notPerforming normalization processing by using a softmax function;

4. The method for accurately identifying the complex target based on the key area detection as claimed in claim 1, wherein: in step 6, the process of area standard pooling is as follows: