CN104036250B

CN104036250B - Video pedestrian detection and tracking

Info

Publication number: CN104036250B
Application number: CN201410266099.XA
Authority: CN
Inventors: 管业鹏; 许瑞岳; 李雨龙
Original assignee: University of Shanghai for Science and Technology
Current assignee: University of Shanghai for Science and Technology
Priority date: 2014-06-16
Filing date: 2014-06-16
Publication date: 2017-11-10
Anticipated expiration: 2034-06-16
Also published as: CN104036250A

Abstract

The present invention relates to a kind of video pedestrian detection and tracking.This method is that excellent local characteristic is respectively provided with time domain and spatial domain according to wavelet transformation, based on video frame-to-frame differences, utilize multi-scale wavelet characteristic, extract foreground moving object, rigid body consistency for the important component of human body and is had according to the number of people, by to different people head's target sample learning and training, foreground moving object in video scene is classified and detected, determine number of people target, and based on the otherness of number of people color character, using particle filter and dynamic tracking chain, the number of people is tracked.The inventive method does not need specific hardware supported and scene condition to constrain, and method is easy, flexibly, easily realizes.

Description

Video pedestrian detection and tracking

Technical field

The present invention relates to a kind of video pedestrian detection and tracking, for digital image analysis with understanding.Belong to intelligence Technical field of information processing.

Background technology

With the rapid growth of urban population and increasingly complicated, Mass disturbance, riot, attack of terrorism etc. of town environment Cities and towns happen suddenly social security events, drastically influence cities and towns public safety, and the generation often for the social security events that happen suddenly, very In big degree with human body behavioral activity tight association.How human body behavioral activity is effectively determined, and to its abnormal, suspicious actions certainly Dynamic identification, it will help Security Officer is timely, handles crisis rapidly, and safe precaution ability is substantially improved, harmonious, flat so as to build The social environment of peace, oneself turns into an important topic of current international community.Effectively to determine human body behavioral activity, its key is How effective detection is with tracking the pedestrian position in video scene.

Human body is as non-rigid, and metamorphosis is various, and easily blocks, and video scene change is complicated various, leads Cause effective video pedestrian detection very difficult with tracking.Main method includes at present：(1) number of people curvature measuring and geometry are used Signature tracking method, the method foundation number of people curvature measuring number of people, and according to the geometric properties of the number of people, to number of people target following, the party Method easily confirms as the number of people for the object of similar curvature, and tracking effect is undesirable, and false drop rate is high；(2) based on particle filter Detection and tracing, this method are only capable of tracking single target, and when target occlusion, it is big with losing possibility；(3) texture analysis Method, the method computation complexity is high, and generalization ability is low, is calculated when especially tracking multiple destination objects time-consuming serious.

The content of the invention

It is an object of the invention to calculate complicated, reliability in time energy for existing video pedestrian detection and tracking It is low, detection with tracking object it is single, and to dynamic scene change sensitivity, noise jamming greatly, difficulty meet human body behavioral activity and When analysis with understand require, there is provided a kind of video pedestrian detection and tracking, according to the number of people be human body important component And there is rigid body consistency, by different people head's target sample learning with training, to the foreground moving pair in video scene As being classified and being detected, determine number of people target, and based on the otherness of number of people color character, using particle filter and dynamic with Track chain, realize effective tracking to the number of people under numerous conditions.

To reach above-mentioned purpose, idea of the invention is that：Excellent office is respectively provided with time domain and spatial domain according to wavelet transformation Portion's feature, based on video frame-to-frame differences, using multi-scale wavelet characteristic, foreground moving object is extracted, is human body according to the number of people Important component and there is rigid body consistency, by different people head's target sample learning and training, in video scene Foreground moving object classified and detected, number of people target is determined, and based on the otherness of number of people color character, using particle Filtering and dynamic tracking chain, are tracked to the number of people.

Conceived according to foregoing invention, the present invention uses following technical proposals：

A kind of video pedestrian detection and tracking, it is characterised in that comprise the following steps that：

1) pedestrian detection and tracking image capturing system are started：Gather video image；

2) foreground moving object is split

By the current frame image and previous frame image subtraction of camera acquisition, prospect fortune is partitioned into using small wave converting method Dynamic subject area；

3) sample learning and training；

4) number of people target detection；

5) number of people target following；

6）Pedestrian's identity coherence confirms.

Above-mentioned steps 2) concrete operation step it is as follows：

（1）Current frame imageI _t(x,y) and previous frame imageI _t-1(x,y) subtract each other, obtain difference imageD(x,y)：

D(x,y)= I _t(x,y)- I _t-1(x,y)；

（2）Difference image multi-scale wavelet transformation：

；

Wherein,DFor difference image,h,vFilter operator in respectively horizontal, vertical direction,For convolution；

（3）The determination in foreground moving object region：Determine difference image multi-scale wavelet transformationEThreshold valueT ₁, willEValue is high InT ₁All pixels composition region, be defined as foreground moving object region.

Above-mentioned steps 3) concrete operation step it is as follows：

（1）According to step 2）, the number of people Haar features of collection different human body Moving Objects, the number of composition number of people training sample According to setD _i=H _i, and Haar features of human body limb and trunk, form the tag set of the non-number of peopleC _i=T _i}；

（2）Selection sort device, to above-mentioned data acquisition systemD _iAnd tag setC _iThe sample set of composition（D _i,C _i）Supervised Educational inspector practises, and adjusts parameter in grader, classifying quality is reached optimal.

Above-mentioned steps 4) concrete operation step it is as follows：

（1）According to step 2), the Haar features of foreground moving object are gathered, form test data setAD _i=AH _i}；

（2）Grader and its parameter according to determined by step 3), to test data setAD _iDiscriminant classification is carried out, really Determine number of people target.

Above-mentioned steps 5) concrete operation step it is as follows：

(1) color space conversion：By the red of RGB color spaceR, it is greenG, it is blueBThree-component, determine the color of HSV color spaces Adjust componentH, saturation degree componentSAnd luminance componentV：

Wherein,

V = max(R, G, B)

(2) number of people feature histogram is built：The number of people target according to determined by step 4), using in HSV color spaces Chrominance componentH, saturation degree componentS, establish each componentmThe color histogram of level, and using the brightness in HSV color spaces ComponentV, establishnLevel gray gradient histogram, and then establish confluent colours (H、S) histogramWith brightness (V) gray scale ladder Spend histogramNumber of people feature histogramq ^r：

Wherein,CFor normalization coefficient,

(3) number of people target following：The number of people feature histogram built according to step (2)q ^r, using particle filter, to field Number of people target following in scape.

Above-mentioned steps 6) concrete operation step it is as follows：

(1) dynamic tracking chain is set：It is if tracking number of people target in scenen, the number of people target that is tracked according to step 5), Dynamic tracking chain is setT _i(i=1,…,n)；

(2) the distance between dynamic tracking chain calculates：According to step (1), dynamic tracking chain is calculatedT _iBetween it is European away from Fromd _ij(i=1,…,n,j=1,...,n)；

(3) number of people target whether shadowing：According to step (2), if the distance between dynamic tracking chaind _ijLess than basis The certain threshold value for the number of people that step 5) is predicted present frameT ₂, then hidden by the number of people target that dynamically tracking chain is tracked Gear, conversely, number of people target occlusion terminates or in the absence of blocking；

(4) dynamic tracking chain and the incidence matrix of testing result are established：Chain is tracked according to the dynamic of step (1)T _i(i= 1,…,n) result and the number of people object detection results according to step 4)H _j(j=1,…,m,mFor the number of people number of detection), establish and close Join matrixM _ij= D(T _i, H _j), whereinDFor distance metric operator；

(5) incidence matrix minimum value calculates：According to incidence matrix determined by step (4)M _ij, determine the minimum of the matrix ValueD _m(I is not equal to j)；

(6) relational matrix is built：According to minimum value determined by step (5)D _m, obtain and whether associated with dynamic tracking chain Relational matrixR _j：

(7) number of people detection is merged with tracking result：According to relational matrix determined by step (6)R _jIfR _j<1, then currently Nobody head's mark is associated with dynamic tracking chain, shows that unmanned head's mark or the number of people target previously tracked have been left in scene Scene；IfR _j=1, then the number of people target in present frame and the dynamic tracking chain associate, use weight forw ₁(0< w ₁<1) detection As a result it is with weightw ₂(w ₂=1- w ₁) tracking result, obtain current number of people target location；IfR _j>1, show to deposit in present frame Associated in multiple number of people targets with dynamic tracking chain, now using number of people feature histogramq ^rNumber of people target differentiation is carried out, and Weight, which is respectively adopted, isw ₁(0< w ₁<1) testing result is with weightw ₂(w ₂=1- w ₁) tracking result, it is determined that current each Number of people target location.

The principle of the present invention is as follows：In the inventive solutions, it is respectively provided with according to wavelet transformation in time domain and spatial domain Excellent local characteristic, based on video frame-to-frame differences, using multi-scale wavelet characteristic, foreground moving object is extracted, according to the number of people Important component for human body and there is rigid body consistency, by different people head's target sample learning and training, to regarding Foreground moving object in frequency scene is classified and detected, and determines number of people target, and based on the otherness of number of people color character, Using particle filter and dynamic tracking chain, the number of people is tracked.

If at a certain moment, adjacent two field pictures are obtained respectivelyf(t _n-1, x, y),f(t _n, x, y), by two images by Pixel seeks difference, obtains difference imageDiff(x, y)：

Wherein,DiffR,DiffG,DiffBDifference image red, green, blue three-component is corresponded to respectively, |f| it isfAbsolute value.

Based on above-mentioned neighbor frame difference, using wavelet transformation, foreground moving object region is partitioned into.According to two dimensional imageI（x,y）In yardstick 2^jWithkWavelet transformation on direction：

Then existx,yWavelet function on direction is represented by：

In formula,For smoothing filter function.

Thus it can determine that imageI(x, y) through functionAfter smothing filtering, the wavelet transformation under different scale is：

If gradient amplitudeReach local maximum along following gradient direction, then the point in image (x, y) it is multiple dimensioned side Edge point

Accordingly, it may be determined that the marginal point under different scale.Because noise is sensitive to dimensional variation, therefore, sought using above-mentioned Seek local amplitude maximum, it is impossible to effectively suppress noise.Effectively to overcome this influence, it is higher than certain threshold by seeking gradient amplitude Value method, replacement seek local amplitude maximum, determine the marginal point of different scale.

Wherein,H, vFilter operator in respectively horizontal, vertical direction,TFor threshold value,For convolution operator.

Consider number of people feature mode spaceX, comprisingmIndividual patternx _iTraining set and corresponding class label, and it is false It is set to two class classification problems.In each layerkIn, the importance of sample uses weight setD _k(i) reflection, and meet。

In two classification problems, the study of Weak Classifier makes object functionMinimize：

Wherein, P [] is the empirical probability based on training sample observation.

Weight is updated as the following formulaD _k(i)：

Wherein,Z _kFor normalization factor, meet,

Final grader is by owningkIndividual Weak Classifier is considering its weightRear weight a majority of the votes cast determines.

The number of people target location based on determined by above-mentioned grader, using the chrominance component in HSV color spacesH, saturation Spend componentS, establish each componentmThe color histogram of level, and using the luminance component in HSV color spacesV, establishnLevel Gray gradient histogram.On this basis, establish confluent colours (H、S) histogramWith brightness (V) shade of gray Nogata FigureNumber of people feature histogramq ^r：

Wherein,CFor normalization coefficient,

IfX _t,Z _tRespectivelytThe number of people dbjective state and observation at moment, then after number of people tracking problem being converted into solution Test probabilityp(X _t|Z _1:t), wherein,Z _1:t =(Z ₁,…,Z _t) be totThe owner head obtained untill moment marks observation.

Using one group of particle for carrying weightClose approximation posterior probabilityp(X _t|Z _1:t), wherein, For particle, possible number of people dbjective state is represented,For the weight of particle.

New particle is produced by resampling function, and the functional dependence is in number of people dbjective state and observation, i.e.,

New particle is updated using following weight：

And new particle is produced by following state transition function：

X _t=F _t(X _t-1,U _t)

Wherein,U _tFor system noise,F _tFor people head's target motion state.

To make above-mentioned number of people testing result consistent with tracking result, caused occlusion issue during human motion is overcome And keep blocking the identity coherence after terminating, it is if tracking number of people target in scenen, and dynamic tracking chain is setT _i(i= 1,…,n), calculate dynamic tracking chainT _iBetween Euclidean distanced _ij(i=1,…,n,j=1,...,n), ifd _ijIt is pre- less than present frame Certain threshold value of the number of people is surveyed, the number of people target for showing now to be tracked by dynamic tracking chain is blocked, conversely, dynamic tracking chain The number of people target occlusion tracked terminates or in the absence of blocking.

Chain is tracked based on dynamicT _i(i=1,…,n) result and current all mark inspections number of people objective resultsH _j(j=1,…,m,mFor the number of people number of detection), establish dynamic tracking chain and the incidence matrix of testing result dataM _ij= D(T _i, H _j), whereinDFor Distance metric operator.

ByM _ij, determine the minimum value of the matrixD _m (I is not equal to j), and according toD _m, obtain is as follows with dynamic tracking chain The relational matrix of no association：

IfR _j<1, then it is associated with dynamic tracking chain currently without number of people target, show unmanned head's mark or first in scene The number of people target of preceding tracking has left scene；IfR _j=1, then the number of people target in present frame and the dynamic tracking chain associate, use Weight isw ₁(0< w ₁<1) testing result is with weightw ₂(w ₂=1- w ₁) tracking result, obtain current number of people target location； IfR _j>1, show that multiple number of people targets in present frame be present associates with dynamic tracking chain, now using number of people feature histogramq ^r Number of people target differentiation is carried out, and weight is respectively adopted and isw ₁(0< w ₁<1) testing result is with weightw ₂(w ₂=1- w ₁) with Track result, it is determined that current each number of people target location.

The present invention compared with prior art, has following obvious prominent substantive distinguishing features and remarkable advantage：This Invention is respectively provided with excellent local characteristic in time domain and spatial domain according to wavelet transformation, more using small echo based on video frame-to-frame differences Dimensional properties, foreground moving object is extracted, rigid body consistency for the important component of human body and is had according to the number of people, by right Different people head's target sample learning and training, are classified and are detected to the foreground moving object in video scene, determine people The head marks, and based on the otherness of number of people color character, using particle filter and dynamic tracking chain, the number of people is tracked, transported Easy, flexible, easily realization is calculated, when solving video pedestrian detection with tracking, it is desirable to detect and tracking object is single, to dynamically Scene changes are sensitive, noise jamming is big, computing is complicated, and specific hardware supported and scene condition constraint；Improve and regard Frequency pedestrian detection and the robustness of tracking, the pedestrian detection being suitable under complex background condition and tracking.

Brief description of the drawings

Fig. 1 is the flowsheet of the inventive method.

Fig. 2 is the original current frame image of video of one embodiment of the invention.

Fig. 3 is the two-value foreground moving object area image being partitioned into Fig. 2 examples.

Fig. 4 is the foreground moving object area image being partitioned into Fig. 2 examples.

Fig. 5 is the number of people testing result in Fig. 2 examples（Rectangular box）.

Fig. 6 is the number of people tracking result in Fig. 2 examples.

Embodiment

Details are as follows for the preferred embodiments of the present invention combination accompanying drawing：

Embodiment one：

Referring to Fig. 1, this video pedestrian detection and tracking, it is characterised in that comprise the following steps that：

2) foreground moving object is split

By the current frame image and previous frame image subtraction of camera acquisition, using small wave converting method, prospect is partitioned into Moving Objects region；

3) sample learning and training；

4) number of people target detection；

5) number of people target following；

6）Pedestrian's identity coherence confirms.

Embodiment two：

The original current frame image of this example to the image shown in Fig. 2 as shown in Fig. 2 carry out consecutive frame difference, and carry out small Ripple multi-scale transform, foreground moving object segmentation is carried out, obtain two-value foreground moving object region as shown in figure 3, according to the number of people Important component for human body and there is rigid body consistency, by different people head's target sample learning and training, to regarding Foreground moving object in frequency scene is classified and detected, and determines number of people target, and based on the otherness of number of people color character, Using particle filter and dynamic tracking chain, the number of people is tracked；Concrete operation step is as follows：

2) foreground moving object is split：Concrete operation step is as follows：

（1）By camera acquisition such as Fig. 2 current frame imageI ₁(x, y) and previous frame imageI ₂(x, y) subtract each other, obtain To difference imageD(x,y)：D(x,y)= I _t(x,y)- I _t-1(x,y)；

（2）Difference image multi-scale wavelet transformation：

；

（3）The determination in foreground moving object region：Determine difference image multi-scale wavelet transformationEThreshold valueT, willEValue is high InTAll pixels composition region, be defined as foreground moving object region.

Fig. 3 is the two-value foreground moving object region through above-mentioned gained, and Fig. 4 is the foreground moving object being partitioned into.

3) sample learning and training：The number of people Haar features of different human body Moving Objects are gathered, form number of people training sample Data acquisition systemD _i=H _i, and Haar features of human body limb and trunk, form the tag set of the non-number of peopleC _i=T _i, adopt With SVMs and Radial basis kernel function is selected, to above-mentioned data acquisition systemD _iAnd tag setC _iThe sample set of composition（D _i,C _i）Learnt and trained, constantly change the penalty factor parameter in Radial basis kernel functionγ, correct recognition rata is reached most It is high；

4) number of people target detection：To the foreground moving object shown in Fig. 3, Haar features are gathered, form test data setAD _i=AH _i, using fixed penalty factor parameterγThe support vector cassification based on Radial basis kernel function is carried out to differentiate, Determine number of people target.Rectangular box in Fig. 5 show the number of people position through above-mentioned gained；

5) number of people target following

Concrete operation step is as follows：

Wherein,

V = max(R, G, B)

(2) number of people feature histogram is built：To the number of people target of Fig. 5 examples, using the chrominance component in HSV color spacesH, saturation degree componentS, establishHWithSThe color histogram that each 8 grades of component, and using the luminance component in HSV color spacesV, establish 8 grades of gray gradient histograms, and then establish confluent colours (H、S) histogramWith brightness (V) shade of gray is straight Fang TuNumber of people feature histogramq ^r：

Wherein,CFor normalization coefficient,

(3) number of people target following：According to constructed number of people feature histogramq ^r, using particle filter, in scene Number of people target following.Number of people upper values in Fig. 6 are through the number of people sequence number obtained by above-mentioned tracking.

6）Pedestrian's identity coherence confirms

Concrete operation step is as follows：

(1) dynamic tracking chain is set：If tracking number of people target 2 in scene, the number of people target tracked according to Fig. 5 examples, Dynamic tracking chain is setT _i(i=1,2)；

(2) the distance between dynamic tracking chain calculates：According to step (1), dynamic tracking chain is calculatedT _iBetween it is European away from Fromd _ij(i=1,2,j=1,2)；

(3) number of people target whether shadowing：According to step (2), dynamic tracks the distance between chaind _ijNot less than basis The 75% of the number of people size that step 5) is predicted present frame, show that number of people target occlusion terminates or in the absence of blocking；

(4) dynamic tracking chain and the incidence matrix of testing result are established：Chain is tracked according to the dynamic of step (1)T _i(i=1, 2) number of people testing result of result and Fig. 4 examplesH _j(j=1,2), incidence matrix is establishedM _ij= D(T _i, H _j), whereinDFor Euclidean away from Accorded with from metric operations；

(7) number of people detection is merged with tracking result：According to relational matrix determined by step (6)R _j=2, it is special using the number of people Levy histogramq ^rNumber of people target differentiation is carried out, and the testing result that weight is 0.5 and the tracking knot that weight is 0.5 is respectively adopted Fruit, it is determined that current each number of people target location.

Claims

1. a kind of video pedestrian detection and tracking, it is characterised in that comprise the following steps that：

2) foreground moving object is split

By the current frame image and previous frame image subtraction of camera acquisition, using small wave converting method, foreground moving is partitioned into Subject area；

3) sample learning and training；

4) number of people target detection；

5) number of people target following；

6) pedestrian's identity coherence confirms, concretely comprises the following steps：

(1) dynamic tracking chain is set：If it is n that number of people target is tracked in scene, the number of people target tracked according to step 5), set Dynamic tracking chain T_i, i=1 ..., n；

(2) the distance between dynamic tracking chain calculates：According to step (1), dynamic tracking chain T is calculated_iBetween Euclidean distance d_ij, I=1 ..., n, j=1 ..., n；

(3) number of people target whether shadowing：According to step (2), if the distance between dynamic tracking chain d_ijLess than according to step 5) the certain threshold k for the number of people predicted present frame, the then number of people target tracked by dynamic tracking chain are blocked, instead It, number of people target occlusion terminates or in the absence of blocking；

(4) dynamic tracking chain and the incidence matrix of testing result are established：Chain T is tracked according to the dynamic of step (1)_i, i=1 ..., n And the number of people object detection results H according to step 4) as a result_j, j=1 ..., m, m is the number of people number of detection, establishes incidence matrix M_ij =D (T_i,H_j), wherein D is distance metric operator；

(5) incidence matrix minimum value calculates：According to incidence matrix M determined by step (4)_ij, determine the minimum value D of the matrix_m, i≠j；

(6) relational matrix is built：According to minimum value D determined by step (5)_m, obtain the relation whether associated with dynamic tracking chain Matrix R_j：

(7) number of people detection is merged with tracking result：According to relational matrix R determined by step (6)_jIf R_j<1, then currently without Number of people target is associated with dynamic tracking chain, shows that unmanned head's mark or the number of people target previously tracked have left field in scene Scape；If R_j=1, then the number of people target in present frame and the dynamic tracking chain associate, use weight as w₁, 0<w₁<1 detection knot Fruit and weight are w₂, w₂=1-w₁Tracking result, obtain current number of people target location；If R_j>1, show in present frame in the presence of more Personal head's mark associates with dynamic tracking chain, now using number of people feature histogram q^rNumber of people target differentiation is carried out, and respectively Weight is used as w₁, 0<w₁<1 testing result and weight are w₂, w₂=1-w₁Tracking result, it is determined that current each number of people target Position.

2. video pedestrian detection according to claim 1 and tracking, it is characterised in that the step 2) foreground moving The concrete operation step of Object Segmentation is as follows：

(1) current frame image I_t(x, y) and previous frame image I_t-1(x, y) subtracts each other, and obtains difference image D (x, y)：

D (x, y)=I_t(x,y)-I_t-1(x,y)；

(2) difference image multi-scale wavelet transformation：

<mrow> <mi>E</mi> <mo>=</mo> <msqrt> <mrow> <msup> <mrow> <mo>(</mo> <mi>D</mi> <mo>&CircleTimes;</mo> <mi>h</mi> <mo>)</mo> </mrow> <mn>2</mn> </msup> <mo>+</mo> <msup> <mrow> <mo>(</mo> <mi>D</mi> <mo>&CircleTimes;</mo> <mi>v</mi> <mo>)</mo> </mrow> <mn>2</mn> </msup> </mrow> </msqrt> <mo>;</mo> </mrow>

Wherein, D is difference image, and h, v are respectively horizontal, the filter operator in vertical direction,For convolution；

(3) determination in foreground moving object region：Determine difference image multi-scale wavelet transformation E threshold value T₁, E values are higher than T₁ All pixels composition region, be defined as foreground moving object region.

3. video pedestrian detection according to claim 1 and tracking, it is characterised in that the step 3) sample learning It is as follows with the concrete operation step of training：

(1) according to step 2), the number of people Haar features of collection different human body Moving Objects, the data set of composition number of people training sample Close D_i={ H_i, and Haar features of human body limb and trunk, form the tag set C of the non-number of people_i={ T_i}；

(2) selection sort device, to above-mentioned data acquisition system D_iWith tag set C_iSample set (the D of composition_i, C_i) exercise supervision Practise, and adjust parameter in grader, classifying quality is reached optimal.

4. video pedestrian detection according to claim 1 and tracking, it is characterised in that the step 4) number of people target The concrete operation step of detection is as follows：

(1) according to step 2), the Haar features of foreground moving object is gathered, form test data set AD_i={ AH_i}；

(2) grader and its parameter according to determined by step 3), to test data set AD_iDiscriminant classification is carried out, determines the number of people Target.

5. video pedestrian detection according to claim 1 and tracking, it is characterised in that the step 5) number of people target The concrete operation step of tracking is as follows：

(1) color space conversion：By the red R of RGB color space, green G, blue B three-components, determine that the tone of HSV color spaces divides Measure H, saturation degree component S and luminance component V：

Wherein,

V=max (R, G, B)

(2) number of people feature histogram is built：The number of people target according to determined by step 4), using the tone in HSV color spaces Component H, saturation degree component S, establish the color histogram of each component m levelsAnd using the luminance component in HSV color spaces V, establish n level gray gradient histogramsAnd then establish confluent colours (H, S) histogramIt is straight with brightness (V) shade of gray Fang TuNumber of people feature histogram q^r：

<mrow> <msup> <mi>q</mi> <mi>r</mi> </msup> <mo>=</mo> <msubsup> <mi>Cq</mi> <mi>c</mi> <mrow> <mi>m</mi> <mo>&times;</mo> <mi>m</mi> </mrow> </msubsup> <msubsup> <mi>q</mi> <mi>v</mi> <mi>n</mi> </msubsup> <mo>,</mo> <mi>r</mi> <mo>=</mo> <mn>0</mn> <mo>,</mo> <mo>...</mo> <mo>,</mo> <msup> <mi>m</mi> <mn>2</mn> </msup> <mo>&times;</mo> <mi>n</mi> <mo>-</mo> <mn>1</mn> </mrow>

Wherein, C is normalization coefficient,

(3) number of people target following：The number of people feature histogram q built according to step (2)^r, using particle filter, in scene Number of people target following.