CN109271852A

CN109271852A - A kind of processing method that the pedestrian detection based on deep neural network identifies again

Info

Publication number: CN109271852A
Application number: CN201810888879.6A
Authority: CN
Inventors: 张磊; 何贞苇; 刘方驿
Original assignee: Chongqing University
Current assignee: Chongqing University
Priority date: 2018-08-07
Filing date: 2018-08-07
Publication date: 2019-01-25

Abstract

The invention discloses a kind of processing methods that the pedestrian detection based on deep neural network identifies again, it is on the basis of Faster-RCNN object detection network, according to the following steps improve: step 1, improve Faster-RCNN:1), Faster-RCNN is added to an additional region recommendation network, 2) it, is finally added to the full articulamentum of 256 neurons in improved network for extracting the relevant feature of pedestrian's identity, and adds calculating of the characteristic storage module for loss function；3) On-line matching loss function OLP and the preferential loss function HEP of difficult sample, are added to after the full articulamentum identified again for pedestrian；The training of Faster-RCNN after step 2, improvement；The test of Faster-RCNN after step 3, improvement.The solution have the advantages that: it is integrated together first is that pedestrian detection is identified again with pedestrian, improves the performance that the pedestrian based on pedestrian detection network identifies again；Second is that improving the accuracy of pedestrian's search mission.

Description

A kind of processing method that the pedestrian detection based on deep neural network identifies again

Technical field

The invention belongs to pedestrian detections and pedestrian to identify field again.

Background technique

With sharply increasing for camera head monitor data, pedestrian detection with pedestrian come into being again by identification technology.Pedestrian's inspection Survey technology is mainly used in intelligent driving, auxiliary drives and the related fieldss such as intelligent monitoring, and pedestrian again widely answer by identification technology It is monitored for criminal investigation, field of image search." pedestrian detection " main purpose is to whether there is pedestrian in detection image or video, Without judging the pedestrian and whether other pedestrians belong to the same pedestrian, and " pedestrian identifies again " is also known as that " pedestrian searches Rope ", main purpose are to judge whether some pedestrian in some camera once appeared in other cameras, that is, need by Some pedestrian's feature is compared with other pedestrian's features, judges whether to belong to the same pedestrian.Solving pedestrian's search mission When, existing method identifies pedestrian detection and pedestrian again as two separation the step of progress, and identification method is all again by pedestrian at present It is based on the pedestrian image extracted.

In the pedestrian detection of actual monitored, face's effective information of pedestrian can not be captured, usually using the whole of pedestrian Body information is identified again.And in pedestrian again identification process, due to the posture of pedestrian, illumination, camera angle etc. is multiple The influence of factor may make the aspect ratio of different pedestrians increasingly similar with the feature of a group traveling together, cause pedestrian in this way and identify again The problems such as there are erroneous detections.

Learning better feature representation is a kind of relatively effective mode, and the concept of deep learning is derived from artificial neural network Research.Multilayer perceptron containing more hidden layers is exactly a kind of deep learning structure.Deep learning is formed by combining low-level feature More abstract high-level characteristic indicates, to solve complicated computer vision problem.

Depth convolutional network is exactly the machine learning model under a kind of supervised learning, and the basic step of training and test is such as Under:

1, prepare data, prepare training and test data with corresponding label；

2, ready training data is sent into network to be trained, utilizes stochastic gradient descent (SGD) when training Network parameter is optimized.According to Bouvrie, J..Notes on convolutional neural networks. is (deep Spend the explanation of convolutional network) BP algorithm recorded in Neural Nets. can parameter to each layer in deep neural network into Row derivation (calculates).Assuming that the loss function of network are as follows:WhereinFor loss function, f () is The function of the fitting of neural network, xⁱ, w is respectively the parameter of input sample and neural network, yⁱFor the label of sample.Each Sample seeks partial derivative to w to update the parameter of network；

3, after training to network convergence, network is inputted using test set sample, calculates the output result of network simultaneously And be compared with true tag, the result of network may finally be tested out.

" Faster-RCNN:Towards Real-Time Object Detection with Region Proposal Networks " (" Faster-RCNN: carrying out real-time object detection using region recommendation network ") Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun, International Conference on Neural Information Processing Systems.MIT Press, 2015:91-99 describe Faster-RCNN object detection network, it is one Object detection network structure based on deep learning, after inputting a picture, Faster-RCNN object detection network can be with Export object category belonging to detection block coordinate and detection block.Firstly, picture of the network according to input, RPN sub-network can generate It is a large amount of that frame proposal is recommended to be used for subsequent detection and classification task, then, recommend the ROIpooling meeting of pool area layer Extract relevant feature and (whether being examined object) identified and classified to detection object, and to the detection block of object into Row amendment.

The present invention is improved for Faster-RCNN object detection network, to realize that pedestrian detection identifies again with pedestrian, Improve the accuracy that pedestrian identifies again.

" pedestrian detection identifies again " described in present patent application refers to the integration that pedestrian detection and pedestrian identify again, input It include the pictures of same target pedestrian, the position coordinates for exporting pedestrian in picture for two, and to each row detected People's output is one 256 dimension pedestrian identification feature again；The function of pedestrian detection is scanned for target pedestrian, and output is suggested mentioning Frame is taken, the function that pedestrian identifies again is to carry out feature extraction to the proposed extraction frame of pedestrian detection output and compare.

Summary of the invention

The technical problem to be solved by the invention is to provide a kind of pedestrian detections based on deep neural network to identify again Processing method, pedestrian detection is identified again with pedestrian and is integrated together by it, not only convenient for executing pedestrian's identification mission again, but also is promoted The performance that pedestrian based on pedestrian detection network identifies again improves the accuracy of pedestrian's search mission.

Insight of the invention is that constructing a kind of pedestrian detection end to end and pedestrian identifies the network structure of combination, institute again Call it is end-to-end, be exactly by pedestrian detection network and pedestrian again identification mission by depth Network integration to together, directly from picture Target person is found in scene without artificial cutting image, according to the detection part of network in Faster-RCNN model, life At window is suggested, it is put into subsequent network and carries out feature extraction and metric learning is carried out to feature by loss function.

In order to solve the above technical problems, the present invention is on the basis of Faster-RCNN object detection network, utilization is following Step improves network structure:

Step 1 improves Faster-RCNN

1) Faster-RCNN, is added to an additional region recommendation network RPN, makes improved Faster-RCNN net Network can input two pictures simultaneously, and obtain the corresponding recommendation region of each picture；

2), in network finally, the full articulamentum for being added to 256 neurons is relevant for extracting pedestrian's identity Feature, and add calculating of the characteristic storage module for loss function；

3) it, is added to On-line matching loss function OLP after the full articulamentum identified again for pedestrian and difficult sample is excellent First loss function HEP learns for knowing another characteristic again to pedestrian；

The training of Faster-RCNN after step 2, improvement

Two picture inputs containing identical pedestrian are improved Faster-RCNN network, recommend net using the region of two-way Network RPN extracts the recommendation frame proposals of two road networks respectively, recycles and recommends pool area layer ROIpooling convolutional layer The upper feature for recommending frame corresponding position is sent into full articulamentum, and full articulamentum is to recommending frame to carry out further screening, amendment, simultaneously Pedestrian's identification feature again is extracted, loss function HEP addition On-line matching loss function OLP preferential with difficult sample is for extracting Pedestrian behind the 256 full articulamentums of dimension of identification feature, supervises the study of whole network again；

The test of Faster-RCNN after step 3, improvement

A test picture is inputted into improved Faster-RCNN, is calculated, is obtained using trained network parameter Final pedestrian detection result and pedestrian identify required feature again out.

Compared with the existing methods, the invention has the following advantages that

1, pedestrian detection and pedestrian are identified integration by the present invention again, provide a kind of new solution for pedestrian's search；

2, the present invention improves the accuracy rate of pedestrian's search.

Detailed description of the invention

Detailed description of the invention of the invention is as follows:

Fig. 1 is the structure simplification figure for improving Faster-RCNN；

Fig. 2 is the schematic diagram of OLP loss function.

Specific embodiment

Present invention will be further explained below with reference to the attached drawings and examples:

The present invention is improved according to the following steps on the basis of Faster-RCNN object detection network:

Step 1 improves Faster-RCNN

It is used as input as shown in Figure 1, improving Faster-RCNN and receiving two pictures comprising common identity pedestrian simultaneously, The region recommendation network RPN that two pictures respectively enter the shared weighting parameter of two-way is calculated, while obtaining two pictures Recommend frame, then the corresponding feature of recommendation frame region of two pictures is respectively fed to pool area layer ROIpooling is recommended to carry out Pondization operation, the full articulamentum treated feature being sent into after ROIpooling is further processed, final to utilize The full articulamentum of one 256 dimension is extracted knows another characteristic for pedestrian again, while exporting the coordinate of detection block and corresponding to detection block Score.

Improving Faster-RCNN is to have 3 points of improvement on the basis of Faster-RCNN object detection network:

1) RPN of single channel, is improved to the RPN that two-way shares weight；

2), after recommending pool area layer, it is added to the full articulamentum of 256 new neurons, for extracting pedestrian The correlated characteristic identified again；

3), after for extracting the full articulamentum that pedestrian identifies again, setting On-line matching loss function OLP and difficult sample Preferential loss function HEP, for supervising the training of pedestrian's identification feature again.

The training of Faster-RCNN after step 2, improvement

The picture with identical pedestrian is matched according to the detection block of data set mark and corresponding label first, is being instructed In experienced each iterative process, the picture matched is inputted in pairs in improved Faster-RCNN；

Two pictures after the recommendation network of region, obtain the coordinate of the recommendation frame of two pictures respectively, utilize respectively Recommend pool area layer convolutional layer feature corresponding to recommendation frame, be input to full articulamentum, is used for using the extraction of full articulamentum Pedestrian knows another characteristic again；

Loss function HEP On-line matching loss function OLP preferential with difficult sample knows another characteristic again to pedestrian and instructs Practice, extracts the feature that can be used for pedestrian's identification mission again.

As shown in Fig. 2, the costing bio disturbance of OLP experienced three steps:

1), being used in newly added full articulamentum, pedestrian knows another characteristic again and its corresponding pedestrian's identity label ID is deposited It stores up inside characteristic storage module；

2) reference sample and positive sample, are found in the pedestrian in full articulamentum again identification feature, from characteristic storage module The middle label sample different from reference sample of finding is as negative sample；

3) positive sample and the corresponding feature of negative sample, searched out by two above step carries out loss function calculating, OLP The calculating of loss function is as follows:

In formula,The feature of i-th of reference sample is represented,The feature of corresponding positive sample is represented,From network It extracts,The feature for representing negative sample is extracted, n from characteristic storage module_jThe number of negative sample is represented, m is reference sample Number, K be negative sample number, d () represent calculate two features between COS distance.

In this loss function, a sample is regarded as in each recommendation region generated by RPN network.

Gradient is calculated to loss function, available following formula:

In formula:

L=1,2 ..., K

The feature for representing negative sample is extracted, n from characteristic storage module_lRepresent the number of negative sample.

Using the back-propagation algorithm in neural network, present invention utilizes the modes of stochastic gradient descent SGD to update Parameter in network.

The loss function of HEP is as follows:

HEP loss function is further learnt pedestrian in the way of classification and knows another characteristic again, and the present invention is according to data set The identity ID of the pedestrian marked classifies come the area-of-interest generated to RPN, final to be divided into N+1 class altogether, In, N represents the number of the identity for the pedestrian contained in data set, and the one kind added is then background classes.In each iteration, from Middle selection C class (C≤N+1) carries out costing bio disturbance, it is assumed that the category set of C class composition is L, and the classification L being selected is by following three A step is determined:

1), all ID existing in input picture are selected as classification to be selected, is put into L；

2) it, for each sample, choosesIn the sample nearest from positive sample, corresponded to Classification be put into L；

3), if the class number in set L is randomly chosen other ID still less than C, and is stored in category set In L；The then expression formula of HEP loss function are as follows:

In formula, m is the number of reference sample, and C is the class number chosen, and 1 () is indicated if the formula of bracket class is full Foot, this result are 1, otherwise are 0；Label represents the label (its corresponding classification) of reference sample；Indicate what network was exported The score for belonging to kth class of i-th of reference sample,Indicate point for belonging to jth class for i-th of reference sample that network is exported Number.

Likewise, can use HEP damage using the back-propagation algorithm and stochastic gradient descent SGD algorithm of neural network Function is lost to be updated the parameter of Faster-RCNN.

It is right respectively in loss functionWithDerivation has then for single sample:

BP is carried out to gradient reversely to return, and weight parameter is updated using stochastic gradient descent SGD, Ke Yigeng The final argument of new network.

Under the collective effect of the relevant loss function of detection of OLP, HEP and Faster-RCNN itself, entirely Faster-RCNN is able to training.

The test of Faster-RCNN after step 3, improvement

It inputs after a test picture, via trained parameter, available final detection block coordinate and its right The feature of the detected pedestrian answered；Calculate pedestrian corresponding to detection block in different pictures again identification feature COS distance simultaneously It is compared, the maximum two pedestrian detection frames of COS distance can determine whether from the same pedestrian.

Embodiment:

1, data set

Using CUHK-SYSU data set, a picture for sharing 18184 different scenes in data set.The street notebook data Ji You The picture clapped on picture and film marks, and is relatively suitble to the training and survey of the detection of pedestrian and the identification mission again of pedestrian Examination.

2, experimental setup

Training set has 11206 pictures, and the pedestrian of the different identity marked comprising 5532, test set has 6978 figures Piece, the pedestrian of the different identity marked comprising 2900.

In the training process, the image that we input is to being to be matched based on 5532 pedestrian's identity being marked to picture It is right, ultimately form 16000 images pair.

3, training test method

Training stage: will be trained in the pairs of input network of the image matched, and every two pairs of samples calculate primary ladder The average value of degree carries out the parameter in a SGD update network.Terminate to obtain the final result of network after iteration 60000 times.

Test phase: we will test picture and input trained network model, detect the position of pedestrian and extraction pair The feature answered, evaluation method are carried out according to the evaluation method of CUHK-SYSU, are calculated mAP (mean Average Precision) With Top-1 index.

MAP index and AP hereafter, Recall index are recorded in The pascal visual object classes (voc) challenge. (challenge of Pascal VOC object category) Everingham, M., Gool, L.V., Williams, C.K.I.,Winn,J.,&Zisserman,A.(2010).International Journal of Computer Vision, 88(2),303-338.

Recognition accuracy compares

In order to verify effectiveness of the invention, the present embodiment combines different pedestrian detections and pedestrian's recognition methods conduct again Comparison of the invention, there are four types of the pedestrian detections for comparing: CCF, ACF, Faster-RCNN (CNN), GT；

Existing pedestrian again recognition methods have it is following several:

1, three kinds of pedestrians identification feature extracting method DSIFT, BoW, LOMO and four kinds of characteristic measure methods again Euclidean, KISSME, Cosine, XQDA combination；

2, two pedestrian detection and identifying system OIM and NPSM models again end to end.

Four kinds of pedestrian detection method foundations:

1, " Convolutional Channel features " (convolution channels feature) Yang, B., Yan, J., Lei, Z. the CCF method recorded in, &Li, S.Z. (2015)；

2, " Fast feature pyramids for object the detection " (swift nature for object detection Pyramid) Dollar, P., Appel, R., Belongie, S., &Perona, P. (2014) .IEEE Transactions on Pattern Analysis&Machine Intelligence, 36 (8), the ACF method recorded in 1532-45.；

3, Faster-RCNN (abbreviation CNN)；

4, GT (the detection target manually extracted).

Three kinds of pedestrians identification feature extracting method foundation again:

1, " Unsupervised Salience Learning for Person Re-identification. " (is used for The unsupervised significant inquiry learning that pedestrian identifies again) Zhao, Rui, W.Ouyang, and X.Wang.IEEE Conference on Computer Vision and Pattern Recognition IEEE Computer Society,2013:3586-3593. The DSIFT method of middle record；

2, " Scalable Person Re-identification:A Benchmark " (pedestrian of upgrading identify again appoint Business: a new data set) Zheng, L., Shen, L., Tian, L., Wang, S., Wang, J., &Tian, Q. (2016) .IEEE it is recorded in International Conference on Computer Vision (pp.1116-1124) .IEEE. BoW method；

3、“Person re-identification by Local Maximal Occurrence Representation and metric learning " (is indicated and the pedestrian side of identification again of metric learning based on local maxima Method) Liao, S., Hu, Y., Zhu, X., &Li, S.Z. (2015) .IEEE Conference on Computer Vision and The LOMO method recorded in Pattern Recognition (Vol.8, pp.2197-2206) .IEEE.

Four kinds of characteristic measure method foundations:

1, Euclidean (Euclidean distance)；

2, " Large scale metric learning from equivalence constraints. " (based on etc. The metric learning of valence constraint) Hirzer, M. (2012) .IEEE Conference on Computer Vision and The KISSME method recorded in Pattern Recognition (pp.2288-2295) .IEEE Computer Society.；

3, Cosine (COS distance)；

4、Liao,S.,Hu,Y.,Zhu,X.,&Li,S.Z.(2015).IEEE Conference on Computer The XQDA method recorded in Vision and Pattern Recognition (Vol.8, pp.2197-2206) .IEEE.

Two pedestrian detection and identifying system foundations again end to end:

1、“Joint Detection and Identification Feature Learning for Person Search " (while knowing another characteristic again with pedestrian for detecting to carry out pedestrian retrieval) .Xiao, T., Li, S., Wang, B., Lin,L.,&Wang,X.(2017).Computer Vision and Pattern Recognition(pp.3376-3385) .IEEE. the OIM model recorded in；

2, " Neural person search machines " (pedestrian retrieval machine neural network based) .Liu, H., The NPSM recorded in Feng, J., Jie, Z., Jayashree, K., Zhao, B., &Qi, M., et al. (2017) .493-501. Model.

Training test the results are shown in Table 1:

Table 1, the present invention are compared with other again recognition methods

Table 2, the present invention are compared with OIM detection effect

Method	AP (%)	Recall (%)
			OIM	74.9	79.1
The present invention	79.6	82.2

As can be seen from Table 1 and Table 2: of the invention (I-net) is expert at the effect obtained on personal data collection than existing pedestrian It is good with the effect of recognition methods again to detect.

Table 3, several loss functions combined performance compare

Lose type	MAP (%)	Top-1 (%)
			On-line matching loss function	73.6	76.2
On-line matching loss function+softmax	79.0	81.2
			On-line matching loss function+preferential the loss function of hardly possible sample	79.5	81.5

The performance of the memory module storage number of features of table 4, OLP compares

As can be seen from tables 3 and 4 that the present invention can using On-line matching loss function+preferential loss function of hardly possible sample To obtain better effect.

Claims

1. a kind of processing method that the pedestrian detection based on deep neural network identifies again, in Faster-RCNN object detection net On the basis of network, characterized in that further comprising the steps of:

Step 1 improves Faster-RCNN

1) Faster-RCNN, is added to an additional region recommendation network RPN, enables improved Faster-RCNN network Enough while two pictures of input, and obtain the corresponding recommendation region of each picture；

2), in the full articulamentum for being finally added to 256 neurons of network for extracting the relevant feature of pedestrian's identity, And add calculating of the characteristic storage module for loss function；

3) it, is added to On-line matching loss function OLP after the full articulamentum identified again for pedestrian and difficult sample preferentially damages Function HEP is lost, is learnt for knowing another characteristic again to pedestrian；

The training of Faster-RCNN after step 2, improvement

Two picture inputs containing identical pedestrian are improved Faster-RCNN network, utilize the region recommendation network RPN of two-way The recommendation frame proposals of two road networks is extracted respectively, is recycled and is recommended pool area layer ROIpooling that convolutional layer is above pushed away The feature for recommending frame corresponding position is sent into full articulamentum, and full articulamentum extracts simultaneously to recommending frame to carry out further screening, amendment Identification feature, loss function HEP addition On-line matching loss function OLP preferential with difficult sample are being used to extract pedestrian pedestrian again Again behind the 256 full articulamentums of dimension of identification feature, the study of whole network is supervised；

The test of Faster-RCNN after step 3, improvement

A test picture is inputted into improved Faster-RCNN, is calculated, is obtained most using trained network parameter Whole pedestrian detection result and pedestrian identifies required feature again.

2. the processing method that the pedestrian detection according to claim 1 based on deep neural network identifies again, characterized in that The costing bio disturbance of On-line matching loss function OLP undergo the following three steps:

1), being used in newly added full articulamentum, pedestrian knows another characteristic again and its corresponding pedestrian's identity label ID storage is arrived Inside characteristic storage module；

2) reference sample and positive sample, are found in the pedestrian in full articulamentum again identification feature, are sought from characteristic storage module The sample for looking for label different from reference sample is as negative sample；

3) positive sample and the corresponding feature of negative sample, searched out by two above step carries out loss function calculating,

The calculating of OLP loss function is as follows:

In formula,The feature of i-th of reference sample is represented,The feature of corresponding positive sample is represented,It is extracted from network,The feature for representing negative sample is extracted, n from characteristic storage module_jThe number of negative sample is represented, m is of reference sample Number, K are the number of negative sample, and d () represents the COS distance calculated between two features.

3. the processing method that the pedestrian detection according to claim 2 based on deep neural network identifies again, characterized in that The parameter in network is updated using stochastic gradient descent SGD, it is as follows to calculate gradient formula to OLP loss function:

In formula:

4. the processing method that the pedestrian detection according to claim 3 based on deep neural network identifies again, characterized in that

The identity ID mono- of pedestrian is divided into N+1 class, in each iteration, C class is selected to carry out costing bio disturbance, C class composition from N+1 class Category set be L, the classification L being selected determined by following three steps:

2) it, for each sample, choosesIn the sample nearest from positive sample, by its corresponding class L is not put into it；

3), if the class number in set L is randomly chosen other ID still less than C, and is stored in category set L； The then expression formula of HEP loss function are as follows:

In formula, m is the number of reference sample, and C is the class number chosen, and 1 () is indicated if the formula of bracket class meets, this As a result it is 1, otherwise is 0；Label represents the label of reference sample；Indicate belonging to for i-th of reference sample that network is exported The score of kth class,Indicate the score for belonging to jth class for i-th of reference sample that network is exported.