CN109583518A

CN109583518A - A kind of pedestrian detection method based on multithread region recommendation network

Info

Publication number: CN109583518A
Application number: CN201811602537.XA
Authority: CN
Inventors: 雷建军; 郭亭佚; 侯春萍; 陈越; 彭勃; 王梦园
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2018-12-26
Filing date: 2018-12-26
Publication date: 2019-04-05

Abstract

The invention discloses a kind of pedestrian detection methods based on multithread region recommendation network, it the described method comprises the following steps: obtaining visibility region mode, it guides network to extract pedestrian's feature under different visibility region modes by multiple visibility regions, is merged above-mentioned pedestrian's feature using multi-streaming feature converged network；Fused pedestrian's feature is input to generate in a global recommendation regional network structure with this and recommends region；Further classified using the random forest method excavated comprising difficult example to above-mentioned recommendation region, forms final recommendation results.The present invention can effectively promote deep learning network for blocking the detection effect of pedestrian, different responses can be reflected to different visibility region situations using multithread visibility region guidance network, to extract the depth characteristic of specific region, network can be effectively promoted to the extraction effect of feature under different visibility region modes；While realizing pedestrian detection end to end, the detection effect for blocking pedestrian is promoted.

Description

A kind of pedestrian detection method based on multithread region recommendation network

Technical field

The present invention relates to image procossings, technical field of computer vision, more particularly to a kind of multithread region that is based on to recommend net The pedestrian detection method of network.

Background technique

The purpose of pedestrian detection is to position the position of pedestrian from video or image data, is to realize that pedestrian chases after The basis of the Computer Vision Tasks such as track, road abnormity early warning and monitoring scene.With technologies such as automatic Pilot and intelligent monitorings It rises, pedestrian detection becomes a big research hotspot.Due to target occlusion, target scale variation, complicated weather conditions and more shootings The factors such as angle influence, and pedestrian detection method faces lot of challenges in practical applications.ASSOCIATE STATISTICS is studies have shown that physical condition The ratio that lower occlusion issue occurs is more than 70%, seriously affects the accuracy of pedestrian detection method.Therefore, pedestrian detection is solved In occlusion issue have important practical significance.

Pedestrian detection belongs to a subdivision field of conventional object detection, and development also relies on grinding for object detecting method Study carefully progress, according to the basic framework of existing method, three mains direction of studying can be divided into: being similar to tradition VJ (Viola- Jones) the pedestrian detection method based on feature extraction and based on random forest classification of object detecting method；Based on units of variance Model (DPM, Deformable Part based Model), for the pedestrian detection method of pedestrian detection problem optimization；It is based on The pedestrian detection method of deep learning.Wherein preceding two class belongs to conventional method, and third class belongs to deep learning method.

VJ detection framework determines region using building sliding window, extracts in region special characteristic and carries out classification and sentences It is disconnected, establish the general process of object detecting method development.Dollar et al. propose integrating channel property detector (ICF, Integral Channel Feature detector) use histogram of gradients (HOG, Histogram of Oriented Gradient) and LUV color (L indicates brightness, and U and V indicate coloration) is used as feature, extracts image using sliding window method Feature, and advanced features are extracted on the basis of the low-level features of extraction using a series of filter groups, finally by training one A decision forest, using the advanced features extracted in window to recommending region to classify, to realize the detection of pedestrian.DPM Method can obtain the root detector of object entirety and the component inspection of object various pieces under conditions of Weakly supervised by training Device is surveyed, and whole detection is modified using the result of element detector, to cope with the deformation and partial occlusion problem of object. It blocks and deformation problems, has been widely used in pedestrian detection method since it is preferably solved.Idrees et al. is in DPM On the basis of explore the constraint relationship between adjacent testing result, further improved by way of a kind of iteration optimization Detection effect in crowd scene.

In recent years, deepening continuously with deep learning theoretical research, depth characteristic is used in Detection task, with region Convolutional neural networks RCNN (Regions with CNN features), fast area convolutional neural networks Fast-RCNN and super It is huge that fast region convolutional neural networks Faster-RCNN is that the serial of methods of representative has obtained the accuracy of testing result Promotion.The above method is two stage method, and the first stage generates candidate region, and second stage sentences candidate region It is disconnected.Wherein the Faster-RCNN method of newest proposition propose it is a kind of for generate recommend region region recommendation network (RPN, Region Proposal Network), and recommendation region is screened using a sorter network.Two stage method inspection It is higher to survey precision, but it is relatively time-consuming.In order to increase the timeliness of detection method, high speed detector YOLO and single-stage are singly seen The detection method of the single phases such as detector SSD is suggested.Such method, which is equivalent to, to be increased on the basis of RPN for region class Other judgement, to realize pedestrian detection end to end.Cai et al. is inspired by the conventional method based on decision forest, is being passed Depth characteristic is incorporated on the basis of system feature.First stage is generated by sliding window method recommends region, and second stage uses random gloomy Woods is classified, and has probed into a kind of feasible difficult example method for digging, and the accuracy of detection is effectively promoted.

Although the pedestrian detection method currently based on convolutional neural networks obtains preferable performance under some databases, But its Detection accuracy still needs to be further improved.Meanwhile existing method is being coped with multiple dimensioned, deformation, is being blocked and complex background Etc. still needing to advanced optimize on common test problems.

Summary of the invention

The present invention provides a kind of pedestrian detection method based on multithread region recommendation network, the present invention is directed to existing detection Method detects the problem of blocking pedestrian's scarce capacity, proposes under the conditions of guiding network extraction difference visible using multiple visibility regions Pedestrian's feature, features described above is merged using Fusion Features network, and by fused feature input one region push away It recommends sub-network and generates and recommend region, while realizing pedestrian detection end to end, promote the detection effect for blocking pedestrian, be detailed in It is described below:

A kind of pedestrian detection method based on multithread region recommendation network, the described method comprises the following steps:

Visibility region mode is obtained, guides network to extract the pedestrian under different visibility region modes by multiple visibility regions Feature is merged above-mentioned pedestrian's feature using multi-streaming feature converged network；

Fused pedestrian's feature is input to generate in a global recommendation regional network structure with this and recommends region；

Further classified using the random forest method excavated comprising difficult example to above-mentioned recommendation region, is formed final Recommendation results.

Wherein, the step of acquisition visibility region mode specifically:

One complete pedestrian rectangular area is divided into 6 × 3 grid, adjacent lattice forms a rectangular block and is referred to as For the sub- rectangular area of the complete pedestrian rectangular area；

To indicate the not to be blocked sub- rectangular area of part is defined as visible sub- rectangular area, each visible sub- rectangular area Respectively represent a kind of visibility region mode.

Further, the method includes 5 kinds of visibility region modes, it is respectively as follows:

First mode: 2 × 3 grid is for indicating that pedestrian's neck or less is blocked；Second mode: 3 × 3 grid is used It is blocked in expression pedestrian's lower part of the body；The third mode: 4 × 3 grid is for indicating that pedestrian shank or less is blocked；4th mould Formula: 5 × 3 grid is for indicating that pedestrian foot is blocked；5th mode: 6 × 3 grid is for indicating that pedestrian is not hidden Gear.

It is described to guide network to extract the spy of the pedestrian under different visibility region modes by multiple visibility regions when specific implementation The step of sign specifically:

Preceding 5 videos of Caltech data set are chosen for training, therefrom sample 40K images as training set, rear 5 A video therefrom samples 4K images as test set for testing；

Training data part carries out exposure mask using each visibility region mode, what every kind of visibility region pattern drill obtained Visibility region guidance network can extract pedestrian's feature that the visibility region mode corresponds to visible sub- rectangular area.

Further, the sample used in neural metwork training includes:

Original complete pedestrian sample is cut into the positive sample of corresponding visibility region mode and from the non-pedestrian in image The negative sample for the identical quantity that region generates at random.

Wherein, the multi-streaming feature converged network is made of one Concat layers and a convolutional layer,

Described Concat layers pedestrian's feature for each visibility region guidance network output in parallel, the convolutional layer are used for Simultaneously dimensionality reduction is further merged to pedestrian's feature.

Concat layers of expression formula described above is as follows:

Wherein, N indicates the visibility region pattern count used, and f () indicates feature, and C indicates the channel (dimension) of eigenmatrix, D indicates the number of active lanes of eigenmatrix under corresponding visibility region mode, and H and W respectively indicate the length and width of eigenmatrix, → before Outputting and inputting for this layer is indicated afterwards.

The expression formula of convolutional layer described above is as follows:

The beneficial effect of the technical scheme provided by the present invention is that:

1, the present invention can effectively promote deep learning network for blocking the detection effect of pedestrian, using multithread visual field Domain guidance network can reflect different responses to different visibility region situations, so that the depth for extracting specific region is special Sign, can effectively promote network to the extraction effect of feature under different visibility region modes；

2, the present invention uses multithread depth characteristic converged network, it will be seen that regional guidance network extracts feature and merged simultaneously A new global area recommendation network is inputted after dimensionality reduction, improves the accuracy of region recommendation, this method is in higher friendship and compares (IOU) still there is good detection performance in the case where.

Detailed description of the invention

Fig. 1 is a kind of flow chart of the pedestrian detection method based on multithread region recommendation network provided by the invention；

Fig. 2 is the schematic diagram of visibility region mode；

Fig. 3 is present invention figure compared with the effect of other methods under identical IOU value.

Specific embodiment

To make the object, technical solutions and advantages of the present invention clearer, embodiment of the present invention is made below further Ground detailed description.

Embodiment 1

The embodiment of the present invention proposes a kind of based on multithread region recommendation network (MSRPN, Multi-stream Region Proposal Network) pedestrian detection method, referring to Fig. 1, method includes the following steps:

101: obtaining visibility region mode, guide network to extract under different visibility region modes by multiple visibility regions Pedestrian's feature is merged above-mentioned pedestrian's feature using multi-streaming feature converged network；

102: fused pedestrian's feature being input in a global recommendation regional network structure of similar RPN and is given birth to this At recommendation region；

103: using the random forest method excavated comprising difficult example^[1]Further classified to above-mentioned recommendation region, is formed Final recommendation results.

For pedestrian detection problem, target only has a kind of classification, therefore the overall situation has recommended the output of Local Area Network just Can be used as the classification results of two classification pedestrian detections, but in order to further enhance recommendation effect, this method use comprising The random forest method that difficult example is excavated^[1]Further classified to the recommendation region generated MSRPN.

Wherein, the step of acquisition visibility region mode in step 101 specifically:

In conclusion 101- step 103 can effectively promote deep learning network to the embodiment of the present invention through the above steps For blocking the detection effect of pedestrian, different visibility region situations can be reflected using multithread visibility region guidance network Different responses can effectively promote network under different visibility region modes to extract the depth characteristic of specific region The extraction effect of feature.

Embodiment 2

The scheme in embodiment 1 is further introduced below with reference to specific calculation formula, example, it is as detailed below Description:

Firstly, determining for several terms being hereafter involved in is given below for the ease of being best understood to this method Justice:

1, visibility region pattern definition are as follows: the statistical result of pedestrian's visibility region in image；

2, visibility region guidance network (techniques known) is defined as: extracting has the depth of regiospecificity special The convolutional neural networks of sign；

3, visible sub- rectangular area is defined as: the quantized result of visibility region mode；

4, complete pedestrian rectangular area is defined as: the region comprising entire pedestrian target position；

5, multithread region recommendation network (techniques known) is defined as: include multiple visibility regions guidance network For generating the convolutional neural networks for recommending region；

6, multi-streaming feature converged network is defined as: merge the convolution for the depth characteristic that different visibility region guidance networks generate Neural network module；

7, global to recommend region net definitions are as follows: to generate the convolutional neural networks of complete pedestrian rectangular area recommendation results.

One, the building of multithread visibility region guidance network

In real roads scene, there are certain rules for the circumstance of occlusion of pedestrian, if it is possible to it is utilized well, The precision of pedestrian detection can be effectively promoted in the case where calculating cost is constant.

This method has counted the ratio that selected data concentrates different circumstance of occlusion to occur in real-world conditions first, selects Several circumstance of occlusion of existing maximum probability carry out quantitative research, as emphasis feature extraction object.By every kind of circumstance of occlusion, Such as: it is blocked below pedestrian's neck, only the lower part of the body is blocked, be only blocked below shank situation etc., as we The visibility region mode to be studied of method, as shown in Fig. 2, successively including mode 1- mode 5 in Fig. 2, mode 1 indicates pedestrian's neck It is blocked below；Mode 2 indicates that pedestrian's lower part of the body is blocked；Mode 3 indicates that pedestrian shank or less is blocked；4 table of mode Show that pedestrian foot is blocked；Mode 5 indicates that pedestrian is not blocked；Shield portions in above-mentioned mode 1- mode 4 are with filling Lines indicate.

In order to quantify visibility region mode corresponding to every kind of circumstance of occlusion, this method is by a complete pedestrian rectangular area It is divided into 6 × 3 grid, adjacent several lattices can form a rectangular block, the referred to as son in the complete rectangular region Rectangular area.To indicate the not to be blocked sub- rectangular area of part is named as visible sub- rectangular area, therefore 6 × 3 grid can To form 126 kinds of visible sub- rectangular areas, each visible sub- rectangular area respectively represents a kind of visibility region mode.

Such as: mode 1 in Fig. 2,2 × 3 grid expression are not blocked, i.e. the sub- rectangular area of 2 × 3 grid composition As a visible sub- rectangular area, the expression of 4 × 3 grid is blocked, and shield portions are indicated with filling lines, at once person neck with Under be blocked, 6 × 3 whole grid of mode 1 is as a complete pedestrian rectangular area.Similarly, mode 2,3 × 3 in Fig. 2 Grid expression be not blocked, i.e. the sub- rectangular area of 3 × 3 grid composition as a visible sub- rectangular area, be left 3 × 3 grid expression is blocked, and shield portions are indicated with filling lines, i.e. pedestrian's lower part of the body is blocked, mode 5 in Fig. 2, and 6 × 3 Grid expression be not blocked, 6 × 3 grid composition sub- rectangular area as a visible sub- rectangular area.And so on, This method repeats no more mode 3 and mode 4.

Caltech data set is one of common data set of pedestrian detection, provides evaluation subset abundant and evaluation side Method.Therefore, this method chooses Caltech data set and carries out the statistics of pedestrian's circumstance of occlusion, and the data set is used for subsequent Training and test.

The ratio that this method first occurs the every kind of circumstance of occlusion counted on Caltech data set in real-world conditions Example is ranked up, and for the effect of balancing algorithms expense and method, only chooses and first five higher kind pedestrian of ratio occurs and block feelings Condition (includes: that 4/6 part is blocked, 3/6 part is blocked under pedestrian, 2/6 part is blocked, 1/6 under pedestrian under pedestrian under pedestrian Part is blocked not to be blocked with pedestrian), visibility region mode corresponding to this five kinds of circumstance of occlusion is carried out as shown in Fig. 2 Quantitative research trains five corresponding visibility region guidance networks with this.

By depth characteristic can learning characteristic influenced, this method use multithread region recommendation network, wherein it is every one stream needle To a kind of circumstance of occlusion, corresponding visible sub- rectangular area is deep after individually training one can effectively extract circumstance of occlusion quantization The visibility region for spending feature guides network, and so-called visibility region guides network, and referring to can for different visibility region modes The depth network of different responses is reflected, the visibility region guidance network that wherein each pattern training obtains can be extracted preferably The depth characteristic of visible sub- rectangular area is corresponded to the mode.

Above-mentioned multithread visibility region guides network in order to obtain, and this method is same using different visibility region pattern drills The depth network structure similar to RPN, the method that training method follows Faster-RCNN.In the method, it chooses Preceding 5 videos of Caltech data set therefrom sample 40K images as training set, rear 5 videos are for surveying for training Examination therefrom samples 4K images as test set.Training the depth network when, training data part using it is above-mentioned can See that region mode carries out exposure mask, i.e., each visibility region mode for selected 5 kinds, sample include: by original complete line The identical number that proper manners are originally cut into the positive sample of corresponding visibility region mode and generate at random from the non-pedestrian region in image The negative sample of amount.Characteristic extraction part is as the visibility region proposed in this programme in the depth network structure that selection training obtains Guide network.

Two, multithread depth characteristic merges

After training, each visibility region guidance network respectively has a certain specific visibility region mode preferable Characteristic response.For the feature (i.e. above-mentioned 5 kinds of visibility region modes) in summary extracted, this method uses a feature All features that converged network extracts back merge, and the global area that fused feature is new as one The input of recommendation network structure, thus generates last recommendation region, and Fig. 1 gives this method in multi-streaming feature fusion part Flow chart.

Eigen converged network is made of one Concat layers and a convolutional layer.Concat layers in parallel each visible The output feature of regional guidance network, formula are as follows:

It is found that the Concat layers of output result by each visibility region guidance network is according to port number parallel connection.In order to solve Characteristic dimension raising brought by simple parallel connection cause subsequent network parameter it is excessive caused by over-fitting and it is each solely Vertical visibility region guidance network extracts the otherness between feature, this method using one 1 × 1 convolution operation to feature into Simultaneously dimensionality reduction, formula are as follows for the fusion of one step:

1 × 1 convolution operation used in this method is equivalent to the weighted average for calculating each dimension, plays dimensionality reduction Effect, thus the over-fitting for avoiding subsequent network parameter excessive and occurring.

Three, pedestrian area is recommended

By features described above converged network, available one can react the feature square of a variety of visibility region features very well Thus battle array exports last recommendation region then as the input of a new global area recommendation network, and after feeding Classify in continuous classifier.

In training, to avoid the visibility region of trained completion that network is guided to be affected, backpropagation can only be passed It is delivered at 1 × 1 convolution operation in Fusion Features network, the network before will not influence convolution operation.

Four, pedestrian area is classified

For object detecting method, classifier is intended to differentiate the classification for recommending object in region, and being must not in Detection task The part that can lack.Although target object only has a kind of classification in pedestrian's test problems, the recommendation region of RPN network output is It can be used as testing result, but the processing of categorized device can be further improved the accuracy of testing result, therefore this method selects Retain classifier.

In the method, wherein should using the random forest grader excavated comprising difficult example to recommending region to differentiate Random forest contains 2048 binary trees.Training data is by true value (GT, Ground-Truth) and multithread region recommendation network The recommendation region that MSRPN is generated collectively constitutes, and feature can obtain after being extracted and merged by above-mentioned mentioned visibility region guidance network. It in training, chooses and recommends the result in region with GT registration greater than 0.7 as positive sample, be used as negative sample less than 0.5.

Embodiment 3

Below with reference to Fig. 3, the scheme in Examples 1 and 2 is described further, described below:

As shown in figure 3, detection performance comparison of the Experimental comparison between this method and two kinds of mainstream pedestrian detection methods, Using the relationship between the Loss Rate (MR) of logarithmic mean and the wrong positive sample number (FPPI) of every figure as measurement index, i.e., MR-FPPI evaluation criterion.

Experimental data chooses credible pedestrian's individual subset in Caltech data set, and setting hands over and is 0.5 than IOU, with this Verify the performance of proposed method.As can be seen from Figure 3 the average Loss Rate of this method is minimum, i.e., using the flat of this method missing inspection Equal positive sample number is minimum, shows that the effect of this method is optimal.

Bibliography

[1]Cai Z,Saberian M,Vasconcelos N.Learning complexity-aware cascades for deep pedestrian detection[C].IEEE International Conference on Computer Vision.IEEE,2015:3361-3369.

It will be appreciated by those skilled in the art that attached drawing is the schematic diagram of a preferred embodiment, the embodiments of the present invention Serial number is for illustration only, does not represent the advantages or disadvantages of the embodiments.

The foregoing is merely presently preferred embodiments of the present invention, is not intended to limit the invention, it is all in spirit of the invention and Within principle, any modification, equivalent replacement, improvement and so on be should all be included in the protection scope of the present invention.

Claims

1. a kind of pedestrian detection method based on multithread region recommendation network, which is characterized in that the described method comprises the following steps:

Visibility region mode is obtained, guides network to extract the spy of the pedestrian under different visibility region modes by multiple visibility regions Sign, is merged above-mentioned pedestrian's feature using multi-streaming feature converged network；

Further classified using the random forest method excavated comprising difficult example to above-mentioned recommendation region, forms final recommendation As a result.

2. a kind of pedestrian detection method based on multithread region recommendation network according to claim 1, which is characterized in that institute State the step of obtaining visibility region mode specifically:

One complete pedestrian rectangular area is divided into 6 × 3 grid, adjacent lattice forms a rectangular block and is referred to as to be somebody's turn to do The sub- rectangular area of complete pedestrian rectangular area；

The sub- rectangular area for the part that indicates not to be blocked is defined as visible sub- rectangular area, each visible sub- rectangular area difference Represent a kind of visibility region mode.

3. a kind of pedestrian detection method based on multithread region recommendation network according to claim 2, which is characterized in that institute Stating method includes 5 kinds of visibility region modes, is respectively as follows:

First mode: 2 × 3 grid is for indicating that pedestrian's neck or less is blocked；Second mode: 3 × 3 grid is used for table Show that pedestrian's lower part of the body is blocked；The third mode: 4 × 3 grid is for indicating that pedestrian shank or less is blocked；Fourth mode: 5 × 3 grid is for indicating that pedestrian foot is blocked；5th mode: 6 × 3 grid is for indicating that pedestrian is not blocked.

4. a kind of pedestrian detection method based on multithread region recommendation network according to claim 1, which is characterized in that institute State the step of guiding network to extract pedestrian's feature under different visibility region modes by multiple visibility regions specifically:

Preceding 5 videos of Caltech data set are chosen for training, therefrom sample 40K images as training set, rear 5 views Frequency therefrom samples 4K images as test set for testing；

Training data part carries out exposure mask using each visibility region mode, and every kind of visibility region pattern drill obtains visible Regional guidance network can extract pedestrian's feature that the visibility region mode corresponds to visible sub- rectangular area.

5. a kind of pedestrian detection method based on multithread region recommendation network according to claim 4, which is characterized in that The sample used when neural metwork training includes:

Original complete pedestrian sample is cut into the positive sample of corresponding visibility region mode and from the non-pedestrian region in image The negative sample of the identical quantity generated at random.

6. a kind of pedestrian detection method based on multithread region recommendation network according to claim 1, which is characterized in that institute Multi-streaming feature converged network is stated to be made of one Concat layers and a convolutional layer,

Described Concat layers pedestrian's feature for each visibility region guidance network output in parallel, the convolutional layer are used for row People's feature further merges and dimensionality reduction.

7. a kind of pedestrian detection method based on multithread region recommendation network according to claim 6, which is characterized in that institute It is as follows to state Concat layers of expression formula:

F_concat:

Wherein, N indicates the visibility region pattern count used, and f () indicates feature, and C indicates the channel of eigenmatrix, and D indicates to correspond to The number of active lanes of eigenmatrix under visibility region mode, H and W respectively indicate the length and width of eigenmatrix, → front and back indicate should Layer is output and input.

8. a kind of pedestrian detection method based on multithread region recommendation network according to claim 7, which is characterized in that institute The expression formula for stating convolutional layer is as follows:

F_1*1conv: